uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2026-02-09 11:16:51 +08:00

Author	SHA1	Message	Date
Jehan	41d309e8a2	script, src: regenerate Russian models and add UTF-8/Russian support. This fixes the broken Russian test in Windows-1251 which once again gets a much better score with Russian. Also this adds UTF-8 support. Same as Bulgarian, I wonder why I had not regenerated this earlier. The new UTF-8 test comes from the 'Сурки' page of Wikipedia in Russian. Note that now this broke the test zh:gb18030 (the score for KOI8-R / ru (0.766388) beats GB18030 / zh (0.700000)). I think I'll have to look a bit closer at our GB18030 dedicated prober.	2022-12-17 21:41:11 +01:00
Jehan	942ac05ff5	Add some Russian test files. Texts from: IBM855: https://ru.wikipedia.org/wiki/CP855 IBM866: https://ru.wikipedia.org/wiki/Альтернативная_кодировка MAC-CYRILLIC: https://ru.wikipedia.org/wiki/MacCyrillic	2015-11-27 18:17:20 +01:00
Jehan	0d70a36910	Adding some more test files for Russian and Chinese. Taken from: https://zh.wikipedia.org/wiki/EUC https://ru.wikipedia.org/wiki/КОИ-8 And rename a file s/utf8.txt/utf-8.txt/ to fix a build test.	2015-11-18 19:27:38 +01:00
Jehan	0efcdfa546	Reorganize test files in language subdirectories. I realize that the language information a text has been written in is very important since it would completely change the character distribution. Our test files should take this into account, and we should create several test files in different languages for encoding used in various languages.	2015-11-17 21:12:39 +01:00