3 Commits

Author SHA1 Message Date
Jehan
314f062c70 script, src: regenerate the Thai model.
With all the changes we made, regenerate the Thai model which is of poor
quality. This new one is much better.
2022-12-14 00:24:53 +01:00
Jehan
198190461e script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
Jehan
fb3c47a073 LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models.
ISO-8859-11 is basically exactly identical to TIS-620, with the added
non-breaking space character.
Basically our detection will always return TIS-620 except for
exceptional cases when a text has a non-breaking space.
2015-12-04 03:14:52 +01:00