uchardet/script/BuildLangModelLogs
Jehan 5f9ec3aef0 LangModels: add support for Slovak.
Encodings are the same as Czech (Windows-1250, ISO-8859-2 and
Mac-CentralEurope) since the resource I found indicate they used the
same encodings historically.
Also it is to be noted that the test examples' encoding were already
properly detected through Czech's models so the languages are definitely
very close, even statistically. Nevertheless adding the right models
will work better and these get better scores. This will take all its
meaning when uchardet will also be used as a language detector (in some
not-too-far future, hopefully!).
Test text taken from: https://sk.wikipedia.org/wiki/Jupiter
2016-09-21 13:42:20 +02:00
..
LangArabicModel.log LangModels: add Arabic support. 2015-12-13 18:42:16 +01:00
LangCzechModel.log LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
LangDanishModel.log LangModels: add Danish support (Windows-1252, ISO-8859-1 and ISO-8859-15). 2016-02-19 19:10:41 +01:00
LangEsperantoModel.log LangModels: add Esperanto ISO-8859-3 language model. 2015-12-04 01:35:56 +01:00
LangFrenchModel.log Adding French Windows-1252 support. 2015-12-03 21:22:30 +01:00
LangGermanModel.log LangModels: adding German models for ISO-8859-1 and Windows-1252. 2015-12-03 23:58:41 +01:00
LangGreekModel.log LangModels: update the Greek language models. 2016-05-25 17:39:10 +02:00
LangHungarianModel.log BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
LangLatvianModel.log LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
LangLithuanianModel.log LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
LangMalteseModel.log LangModels: support for Maltese / ISO-8859-3. 2016-09-21 02:11:31 +02:00
LangPortugueseModel.log LangModels: add support for Portuguese / ISO-8859-1. 2016-09-21 00:01:07 +02:00
LangSlovakModel.log LangModels: add support for Slovak. 2016-09-21 13:42:20 +02:00
LangSpanishModel.log LangModels: adding Spanish support. 2015-12-12 18:54:35 +01:00
LangThaiModel.log BuildLangModel: forgot to add logs for Thai models generation. 2015-12-04 03:26:52 +01:00
LangTurkishModel.log LangModels: adding Turkish models for ISO-8859-3 and ISO-8859-9. 2015-12-04 02:35:09 +01:00
LangVietnameseModel.log LangModels: add VISCII encoding support and retrain Vietnamese model. 2016-02-13 03:51:18 +01:00