uchardet

coffee/uchardet

Fork 0

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-07 17:26:41 +08:00

Commit Graph

Author	SHA1	Message	Date
Jehan	eb8308d50a	src, script: regenerate all existing language models. Now making sure that we have a generic language model working with UTF-8 for all 26 supported models which had single-byte encoding support until now.	2022-12-14 00:23:13 +01:00
Jehan	5f9ec3aef0	LangModels: add support for Slovak. Encodings are the same as Czech (Windows-1250, ISO-8859-2 and Mac-CentralEurope) since the resource I found indicate they used the same encodings historically. Also it is to be noted that the test examples' encoding were already properly detected through Czech's models so the languages are definitely very close, even statistically. Nevertheless adding the right models will work better and these get better scores. This will take all its meaning when uchardet will also be used as a language detector (in some not-too-far future, hopefully!). Test text taken from: https://sk.wikipedia.org/wiki/Jupiter	2016-09-21 13:42:20 +02:00

Author

SHA1

Message

Date

Jehan

eb8308d50a

src, script: regenerate all existing language models.

Now making sure that we have a generic language model working with UTF-8
for all 26 supported models which had single-byte encoding support until
now.

2022-12-14 00:23:13 +01:00

Jehan

5f9ec3aef0

LangModels: add support for Slovak.

Encodings are the same as Czech (Windows-1250, ISO-8859-2 and
Mac-CentralEurope) since the resource I found indicate they used the
same encodings historically.
Also it is to be noted that the test examples' encoding were already
properly detected through Czech's models so the languages are definitely
very close, even statistically. Nevertheless adding the right models
will work better and these get better scores. This will take all its
meaning when uchardet will also be used as a language detector (in some
not-too-far future, hopefully!).
Test text taken from: https://sk.wikipedia.org/wiki/Jupiter

2016-09-21 13:42:20 +02:00

2 Commits