Jehan ffb94e4a9d script, src, test: Bulgarian language models added.
Not sure why we had the Bulgarian support but haven't recently updated
it (i.e. never with the model generation script, or so it seems),
especially with generic language models, allowing to have
UTF-8/Bulgarian support. Maybe I tested it some time ago and it was
getting bad results? Anyway now with all the recents updates on the
confidence computation, I get very good detection scores.

So adding support for UTF-8/Bulgarian and rebuilding other models too.

Also adding a test for ISO-8859-5/Bulgarian (we already had support, but
no test files).

The 2 new test files are text from page 'Мармоти' on Wikipedia in
Bulgarian language.
2022-12-17 18:41:00 +01:00
..
codepoints.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
db.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
ibm852.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
ibm862.py Issue #22: Hebrew CP862 support. 2022-12-16 23:27:52 +01:00
ibm865.py script, src, test: add IBM865 support for Danish. 2022-11-30 19:57:52 +01:00
iso-8859-1.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
iso-8859-2.py BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
iso-8859-3.py LangModels: add Esperanto ISO-8859-3 language model. 2015-12-04 01:35:56 +01:00
iso-8859-4.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
iso-8859-5.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
iso-8859-6.py LangModels: add Arabic support. 2015-12-13 18:42:16 +01:00
iso-8859-7.py LangModels: retraining Greek models with my training script. 2015-12-13 18:02:11 +01:00
iso-8859-8.py script, src: generate the Hebrew models. 2022-12-14 00:23:13 +01:00
iso-8859-9.py script: forgot to commit ISO-8859-9 and Turkish files. 2015-12-04 02:40:54 +01:00
iso-8859-10.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
iso-8859-11.py LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
iso-8859-13.py LangModels: add support for Lithuanian / ISO-8859-13. 2016-09-20 23:09:24 +02:00
iso-8859-15.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
iso-8859-16.py LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
mac-centraleurope.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
tis-620.py LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
viscii.py LangModels: add Windows-1258 support for Vietnamese. 2016-02-13 02:32:57 +01:00
windows-1250.py BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
windows-1251.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
windows-1252.py Adding French Windows-1252 support. 2015-12-03 21:22:30 +01:00
windows-1253.py LangModels: retraining Greek models with my training script. 2015-12-13 18:02:11 +01:00
windows-1255.py script, src: generate the Hebrew models. 2022-12-14 00:23:13 +01:00
windows-1256.py LangModels: add Arabic support. 2015-12-13 18:42:16 +01:00
windows-1257.py LangModels: Estonian models created. 2016-09-27 00:14:29 +02:00
windows-1258.py LangModels: add Windows-1258 support for Vietnamese. 2016-02-13 02:32:57 +01:00