Jehan ffb94e4a9d script, src, test: Bulgarian language models added.
Not sure why we had the Bulgarian support but haven't recently updated
it (i.e. never with the model generation script, or so it seems),
especially with generic language models, allowing to have
UTF-8/Bulgarian support. Maybe I tested it some time ago and it was
getting bad results? Anyway now with all the recents updates on the
confidence computation, I get very good detection scores.

So adding support for UTF-8/Bulgarian and rebuilding other models too.

Also adding a test for ISO-8859-5/Bulgarian (we already had support, but
no test files).

The 2 new test files are text from page 'Мармоти' on Wikipedia in
Bulgarian language.
2022-12-17 18:41:00 +01:00
..
ar.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
bg.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
cs.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
da.py script, src, test: add IBM865 support for Danish. 2022-11-30 19:57:52 +01:00
de.py scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
el.py LangModels: update the Greek language models. 2016-05-25 17:39:10 +02:00
en.py script, src: add English language model. 2022-12-14 00:24:53 +01:00
eo.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
es.py scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
et.py script: forgot to commit the Estonian description. 2016-09-27 00:51:19 +02:00
fi.py LangModels: add Finnish support. 2016-09-21 18:27:39 +02:00
fr.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
ga.py LangModels: added support for Irish Gaelic. 2016-09-27 00:49:05 +02:00
he.py Issue #22: Hebrew CP862 support. 2022-12-16 23:27:52 +01:00
hi.py src: add Hindi/UTF-8 support. 2022-12-14 00:23:13 +01:00
hr.py LangModels: new Croatian models. 2016-09-26 01:32:49 +02:00
hu.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
it.py LangModels: add Italian support. 2016-09-21 18:52:09 +02:00
ko.py script, src: remove generated statistics data for Korean. 2022-12-14 00:24:53 +01:00
lt.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
lv.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
mt.py LangModels: support for Maltese / ISO-8859-3. 2016-09-21 02:11:31 +02:00
no.py script, src: update Norwegian model with the new language features. 2022-12-14 00:24:53 +01:00
pl.py LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
pt.py LangModels: add support for Portuguese / ISO-8859-1. 2016-09-21 00:01:07 +02:00
ro.py LangModels: Romanian support added. 2016-09-28 19:57:50 +02:00
sk.py script: regenerate Slovak and Slovene with better alphabet support. 2022-12-14 00:24:53 +01:00
sl.py src, script: add concept of alphabet_mapping in language models. 2022-12-14 00:24:53 +01:00
sv.py LangModels: add Swedish support. 2016-09-28 22:42:13 +02:00
th.py script, src: regenerate the Thai model. 2022-12-14 00:24:53 +01:00
tr.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
vi.py script, src: regenerate the Vietnamese model. 2022-12-14 00:24:53 +01:00