4 Commits

Author SHA1 Message Date
Jehan
e6e51d9fe8 src: all language models now rebuilt after the fix. 2022-12-15 14:31:55 +01:00
Jehan
6bb1b3e101 scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
Jehan
eb8308d50a src, script: regenerate all existing language models.
Now making sure that we have a generic language model working with UTF-8
for all 26 supported models which had single-byte encoding support until
now.
2022-12-14 00:23:13 +01:00
Jehan
a3a271dfd5 LangModels: Estonian models created.
Encodings: ISO-8859-4, ISO-8859-13, ISO-8859-13, Windows-1252 and
Windows-1257.
Test text from https://et.wikipedia.org/wiki/Anton_Tšehhov
Windows-1257 and ISO-8859-13 are very close so I added quotation marks
(Jutumärgid) which are on codepoints only present in ISO-8859-13,
making both encoding apart.
2016-09-27 00:14:29 +02:00