uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2026-02-07 10:19:59 +08:00

History

Jehan 5f9ec3aef0 LangModels: add support for Slovak. Encodings are the same as Czech (Windows-1250, ISO-8859-2 and Mac-CentralEurope) since the resource I found indicate they used the same encodings historically. Also it is to be noted that the test examples' encoding were already properly detected through Czech's models so the languages are definitely very close, even statistically. Nevertheless adding the right models will work better and these get better scores. This will take all its meaning when uchardet will also be used as a language detector (in some not-too-far future, hopefully!). Test text taken from: https://sk.wikipedia.org/wiki/Jupiter		2016-09-21 13:42:20 +02:00
..
ibm852.txt	LangModels: add support for Slovak.	2016-09-21 13:42:20 +02:00
iso-8859-2.txt	LangModels: add support for Slovak.	2016-09-21 13:42:20 +02:00
mac-centraleurope.txt	LangModels: add support for Slovak.	2016-09-21 13:42:20 +02:00
utf-8.txt	LangModels: add support for Slovak.	2016-09-21 13:42:20 +02:00
windows-1250.txt	LangModels: add support for Slovak.	2016-09-21 13:42:20 +02:00