mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
ISO-8859-2 and Windows-1250 are absolutely similar for all letters in the Hungarian alphabet. So for most texts, it is not an error to return one charset or the other. What could make the difference is for instance that Windows-1250 has some symbols where ISO-8859-2 has control characters, like quotes, dashes, the euro symbol… Since control characters have a negative impact on confidence now, texts with such symbols would tend towards Windows-1250 decision. The new test file has such quote symbols. |
||
|---|---|---|
| .. | ||
| iso-8859-2.txt | ||
| windows-1250.txt | ||