uchardet/script/BuildLangModelLogs
Jehan b70b1ebf88 Rebuild a bunch of language models.
Adding generic language model (see coming commit), which uses the same
data as specific single-byte encoding statistics model, except that it
applies it to unicode code points.
For this to work, instead of the CharToOrderMap which was mapping
directly from encoded byte (always 256 values) to order, now we add an
array of frequent characters, ordered by generic unicode code points to
the order of frequency (which can be used on the same sequence mapping
array).

This of course means that each prober where we will want to use these
generic models will have to implement their own byte to code point
decoder, as this is per-encoding logics anyway. This will come in a
subsequent commit.
2022-12-14 00:23:13 +01:00
..
LangArabicModel.log Rebuild a bunch of language models. 2022-12-14 00:23:13 +01:00
LangCroatianModel.log LangModels: new Croatian models. 2016-09-26 01:32:49 +02:00
LangCzechModel.log LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
LangDanishModel.log Rebuild a bunch of language models. 2022-12-14 00:23:13 +01:00
LangEsperantoModel.log LangModels: add Esperanto ISO-8859-3 language model. 2015-12-04 01:35:56 +01:00
LangEstonianModel.log LangModels: Estonian models created. 2016-09-27 00:14:29 +02:00
LangFinnishModel.log LangModels: add Finnish support. 2016-09-21 18:27:39 +02:00
LangFrenchModel.log Rebuild a bunch of language models. 2022-12-14 00:23:13 +01:00
LangGermanModel.log Rebuild a bunch of language models. 2022-12-14 00:23:13 +01:00
LangGreekModel.log LangModels: update the Greek language models. 2016-05-25 17:39:10 +02:00
LangHungarianModel.log BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
LangIrishModel.log LangModels: added support for Irish Gaelic. 2016-09-27 00:49:05 +02:00
LangItalianModel.log Rebuild a bunch of language models. 2022-12-14 00:23:13 +01:00
LangLatvianModel.log LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
LangLithuanianModel.log LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
LangMalteseModel.log LangModels: support for Maltese / ISO-8859-3. 2016-09-21 02:11:31 +02:00
LangPolishModel.log LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
LangPortugueseModel.log LangModels: add support for Portuguese / ISO-8859-1. 2016-09-21 00:01:07 +02:00
LangRomanianModel.log LangModels: Romanian support added. 2016-09-28 19:57:50 +02:00
LangSlovakModel.log LangModels: add support for Slovak. 2016-09-21 13:42:20 +02:00
LangSloveneModel.log LangModels: add Slovene support. 2016-09-28 22:13:17 +02:00
LangSpanishModel.log Rebuild a bunch of language models. 2022-12-14 00:23:13 +01:00
LangSwedishModel.log LangModels: add Swedish support. 2016-09-28 22:42:13 +02:00
LangThaiModel.log BuildLangModel: forgot to add logs for Thai models generation. 2015-12-04 03:26:52 +01:00
LangTurkishModel.log LangModels: adding Turkish models for ISO-8859-3 and ISO-8859-9. 2015-12-04 02:35:09 +01:00
LangVietnameseModel.log LangModels: add VISCII encoding support and retrain Vietnamese model. 2016-02-13 03:51:18 +01:00