Pedro López-Cabanillas 0d86c111a7 fix for gb18030 encoding test
The gb18030 test fails, reporting the sample text as Macedonian language
encoded with windows-1251. This is because 1: the Macedonian language
model is very optimistic and reports high confidence with the given
sample, and 2: the original sample text is extremely short and lacks
language variety.

By simply adding a good amount of real Chinese literature to the sample
file, the test no longer fails.

This text has been extracted from Wikipedia:
https://zh.wikipedia.org/wiki/%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD
2025-06-07 23:33:41 +00:00
..
big5.txt Reorganize test files in language subdirectories. 2015-11-17 21:12:39 +01:00
euc-tw.txt Adding some more test files for Russian and Chinese. 2015-11-18 19:27:38 +01:00
gb18030.txt fix for gb18030 encoding test 2025-06-07 23:33:41 +00:00
utf-8.txt Adding some more test files for Russian and Chinese. 2015-11-18 19:27:38 +01:00