Jehan 5f9ec3aef0 LangModels: add support for Slovak.
Encodings are the same as Czech (Windows-1250, ISO-8859-2 and
Mac-CentralEurope) since the resource I found indicate they used the
same encodings historically.
Also it is to be noted that the test examples' encoding were already
properly detected through Czech's models so the languages are definitely
very close, even statistically. Nevertheless adding the right models
will work better and these get better scores. This will take all its
meaning when uchardet will also be used as a language detector (in some
not-too-far future, hopefully!).
Test text taken from: https://sk.wikipedia.org/wiki/Jupiter
2016-09-21 13:42:20 +02:00
..
ar tests: add test files for Arabic. 2015-12-13 18:42:59 +01:00
bg Reorganize test files in language subdirectories. 2015-11-17 21:12:39 +01:00
cs test: adding test files for Czech. 2016-09-21 03:44:22 +02:00
da LangModels: add Danish support (Windows-1252, ISO-8859-1 and ISO-8859-15). 2016-02-19 19:10:41 +01:00
de LangModels: adding German models for ISO-8859-1 and Windows-1252. 2015-12-03 23:58:41 +01:00
el Add Greek test files. 2015-11-18 02:57:09 +01:00
en Add an ASCII test file for English... 2015-11-28 17:49:13 +01:00
eo LangModels: add Esperanto ISO-8859-3 language model. 2015-12-04 01:35:56 +01:00
es tests: test files for Spanish. 2015-12-12 18:55:43 +01:00
fr test: update UTF-16 and UTF-32 tests after label changing. 2015-12-04 19:46:51 +01:00
he Add Hebrew test files. 2015-11-18 03:16:18 +01:00
hu tests: update Window-1250 test file for Hungarian. 2015-12-12 18:12:08 +01:00
ja Add UTF-16 test files without BOM... 2015-11-28 19:50:18 +01:00
ko README, test: update README and rename EUC-KR test to UHC. 2016-09-19 01:44:32 +02:00
lt LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
lv LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
mt LangModels: support for Maltese / ISO-8859-3. 2016-09-21 02:11:31 +02:00
pt LangModels: add support for Portuguese / ISO-8859-1. 2016-09-21 00:01:07 +02:00
ru Add some Russian test files. 2015-11-27 18:17:20 +01:00
sk LangModels: add support for Slovak. 2016-09-21 13:42:20 +02:00
th LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
tr LangModels: adding Turkish models for ISO-8859-3 and ISO-8859-9. 2015-12-04 02:35:09 +01:00
vi LangModels: add VISCII encoding support and retrain Vietnamese model. 2016-02-13 03:51:18 +01:00
zh Adding some more test files for Russian and Chinese. 2015-11-18 19:27:38 +01:00
CMakeLists.txt cmake: hardcode less 2016-03-22 01:23:04 +03:00
uchardet-tests.c Add automatic testing against every test file. 2015-11-18 18:18:27 +01:00