3 Commits

Author SHA1 Message Date
Jehan
2bade77bf9 tests: update Window-1250 test file for Hungarian.
ISO-8859-2 and Windows-1250 are absolutely similar for all letters in
the Hungarian alphabet. So for most texts, it is not an error to return
one charset or the other.
What could make the difference is for instance that Windows-1250 has
some symbols where ISO-8859-2 has control characters, like quotes,
dashes, the euro symbol…
Since control characters have a negative impact on confidence now,
texts with such symbols would tend towards Windows-1250 decision.
The new test file has such quote symbols.
2015-12-12 18:12:08 +01:00
Jehan
15afc5c593 test: add a Hungarian Windows-1250 test but skip it for now.
Text from: https://hu.wikipedia.org/wiki/Magyar_nyelv
2015-12-03 21:18:55 +01:00
Jehan
0efcdfa546 Reorganize test files in language subdirectories.
I realize that the language information a text has been written in is
very important since it would completely change the character
distribution. Our test files should take this into account, and we
should create several test files in different languages for encoding
used in various languages.
2015-11-17 21:12:39 +01:00