4 Commits

Author SHA1 Message Date
Jehan
e6e51d9fe8 src: all language models now rebuilt after the fix. 2022-12-15 14:31:55 +01:00
Jehan
6bb1b3e101 scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
Jehan
eb8308d50a src, script: regenerate all existing language models.
Now making sure that we have a generic language model working with UTF-8
for all 26 supported models which had single-byte encoding support until
now.
2022-12-14 00:23:13 +01:00
Jehan
26e1cebad1 LangModels: add support for Czech.
Encodings: Windows-1250, ISO-8859-2, IBM852 and Mac-CentralEurope.
Other encodings are known to have been used for Czech: Kamenicky,
KOI-8 CS2 and Cork. But these are uncommon enough that I decided not
to support them (especially since I can't find them supported in iconv
either, or at least not under an alias which I could recognize).
This web page, which contents was made under the Public Domain, is a
good reference for encodings which were used historically for Czech and
Slovak: http://luki.sdf-eu.org/txt/cs-encodings-faq.html
2016-09-21 03:33:50 +02:00