mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-12 06:30:05 +08:00
Previous technical text about charsets themselves were not relevant to identify a language. In particular the special characters different between ISO-8859-1 and ISO-8859-15 were used by themselves, out of a char sequence context. Therefore without language understanding, they could have as well been representing the ISO-8859-15 letters or the ISO-8859-1 symbols at the corresponding codepoints. Replacing with text from this Wikipedia page: https://fr.wikipedia.org/wiki/Œuf_(cuisine) This uses some of these same characters (in particular 'œ') but in contextual character sequences, making it relevant for our algorithm. |
||
|---|---|---|
| .. | ||
| bg | ||
| el | ||
| en | ||
| fr | ||
| he | ||
| hu | ||
| ja | ||
| ko | ||
| ru | ||
| th | ||
| zh | ||
| CMakeLists.txt | ||
| uchardet-tests.c | ||