Jehan 41d309e8a2 script, src: regenerate Russian models and add UTF-8/Russian support.
This fixes the broken Russian test in Windows-1251 which once again gets
a much better score with Russian. Also this adds UTF-8 support.

Same as Bulgarian, I wonder why I had not regenerated this earlier.

The new UTF-8 test comes from the 'Сурки' page of Wikipedia in Russian.

Note that now this broke the test zh:gb18030 (the score for KOI8-R / ru
(0.766388) beats GB18030 / zh (0.700000)). I think I'll have to look a
bit closer at our GB18030 dedicated prober.
2022-12-17 21:41:11 +01:00
..
codepoints.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
db.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
ibm852.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
ibm855.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
ibm862.py Issue #22: Hebrew CP862 support. 2022-12-16 23:27:52 +01:00
ibm865.py script, src, test: add IBM865 support for Danish. 2022-11-30 19:57:52 +01:00
ibm866.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
iso-8859-1.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
iso-8859-2.py BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
iso-8859-3.py LangModels: add Esperanto ISO-8859-3 language model. 2015-12-04 01:35:56 +01:00
iso-8859-4.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
iso-8859-5.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
iso-8859-6.py LangModels: add Arabic support. 2015-12-13 18:42:16 +01:00
iso-8859-7.py LangModels: retraining Greek models with my training script. 2015-12-13 18:02:11 +01:00
iso-8859-8.py script, src: generate the Hebrew models. 2022-12-14 00:23:13 +01:00
iso-8859-9.py script: forgot to commit ISO-8859-9 and Turkish files. 2015-12-04 02:40:54 +01:00
iso-8859-10.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
iso-8859-11.py LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
iso-8859-13.py LangModels: add support for Lithuanian / ISO-8859-13. 2016-09-20 23:09:24 +02:00
iso-8859-15.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
iso-8859-16.py LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
koi8-r.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
mac-centraleurope.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
mac-cyrillic.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
tis-620.py LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
viscii.py LangModels: add Windows-1258 support for Vietnamese. 2016-02-13 02:32:57 +01:00
windows-1250.py BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
windows-1251.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
windows-1252.py Adding French Windows-1252 support. 2015-12-03 21:22:30 +01:00
windows-1253.py LangModels: retraining Greek models with my training script. 2015-12-13 18:02:11 +01:00
windows-1255.py script, src: generate the Hebrew models. 2022-12-14 00:23:13 +01:00
windows-1256.py LangModels: add Arabic support. 2015-12-13 18:42:16 +01:00
windows-1257.py LangModels: Estonian models created. 2016-09-27 00:14:29 +02:00
windows-1258.py LangModels: add Windows-1258 support for Vietnamese. 2016-02-13 02:32:57 +01:00