uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2026-01-01 03:12:24 +08:00

History

Pedro López-Cabanillas 0d86c111a7 fix for gb18030 encoding test The gb18030 test fails, reporting the sample text as Macedonian language encoded with windows-1251. This is because 1: the Macedonian language model is very optimistic and reports high confidence with the given sample, and 2: the original sample text is extremely short and lacks language variety. By simply adding a good amount of real Chinese literature to the sample file, the test no longer fails. This text has been extracted from Wikipedia: https://zh.wikipedia.org/wiki/%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD		2025-06-07 23:33:41 +00:00
..
big5.txt	Reorganize test files in language subdirectories.	2015-11-17 21:12:39 +01:00
euc-tw.txt	Adding some more test files for Russian and Chinese.	2015-11-18 19:27:38 +01:00
gb18030.txt	fix for gb18030 encoding test	2025-06-07 23:33:41 +00:00
utf-8.txt	Adding some more test files for Russian and Chinese.	2015-11-18 19:27:38 +01:00