3 Commits

Author SHA1 Message Date
Jehan
a8e9de307b Add UTF-16 test files without BOM...
... and disable the tests for now for these since uchardet is not able
to detect UTF-16 without a BOM as for now.
2015-11-28 19:50:18 +01:00
Jehan
a76c0786b3 Adding test files for main Japanese encoding...
... taken from the following Japanese Wikipedia pages:
https://ja.wikipedia.org/wiki/Extended_Unix_Code
https://ja.wikipedia.org/wiki/ISO/IEC_2022
https://ja.wikipedia.org/wiki/UTF-8
2015-11-17 21:24:47 +01:00
Jehan
0efcdfa546 Reorganize test files in language subdirectories.
I realize that the language information a text has been written in is
very important since it would completely change the character
distribution. Our test files should take this into account, and we
should create several test files in different languages for encoding
used in various languages.
2015-11-17 21:12:39 +01:00