Jehan 0fe51d3851 Issue #21: Greek CP737 support.
It actually breaks "zh:big5" so I'm going to hold-off a bit. Adding more
language and charset support is slowly starting to show the limitations
of our legacy multi-byte charset supports, since I haven't really
touched these since the original implementation of Mozilla.

It might be time to start reviewing these parts of the code.

The test file contents comes from 'Μαρμότα' page on Wikipedia in Greek
(though since 2 letters are missing in this encoding, despite its
popularity for Greek, I had to be careful in choosing pieces of text
without such letters).
2022-12-18 22:33:12 +01:00
..
ar tests: add test files for Arabic. 2015-12-13 18:42:59 +01:00
be script, src, test: adding Belarusian support. 2022-12-17 19:13:03 +01:00
bg script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
cs test: adding test files for Czech. 2016-09-21 03:44:22 +02:00
da script, src, test: add IBM865 support for Danish. 2022-11-30 19:57:52 +01:00
de test: 4 new tests for UTF-8. 2022-12-14 00:23:13 +01:00
el Issue #21: Greek CP737 support. 2022-12-18 22:33:12 +01:00
en test: finally add English/UTF-8 test file. 2022-12-14 21:45:29 +01:00
eo test: 4 new tests for UTF-8. 2022-12-14 00:23:13 +01:00
es tests: test files for Spanish. 2015-12-12 18:55:43 +01:00
et LangModels: Estonian models created. 2016-09-27 00:14:29 +02:00
fi LangModels: add Finnish support. 2016-09-21 18:27:39 +02:00
fr test: update UTF-16 and UTF-32 tests after label changing. 2015-12-04 19:46:51 +01:00
ga LangModels: added support for Irish Gaelic. 2016-09-27 00:49:05 +02:00
he test: adding 2 tests for Hebrew/IBM862 recognition. 2022-12-16 23:35:17 +01:00
hi src: add Hindi/UTF-8 support. 2022-12-14 00:23:13 +01:00
hr LangModels: new Croatian models. 2016-09-26 01:32:49 +02:00
hu test: 4 new tests for UTF-8. 2022-12-14 00:23:13 +01:00
it LangModels: add Italian support. 2016-09-21 18:52:09 +02:00
ja Add UTF-16 test files without BOM... 2015-11-28 19:50:18 +01:00
ko src, test: fix the new Johab prober and add a test. 2022-12-14 00:23:13 +01:00
lt LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
lv LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
mk src, script: add Macedonian support. 2022-12-17 22:47:54 +01:00
mt test: update the Maltese / ISO-8859-3 test file. 2022-11-29 14:59:17 +01:00
no Add tests for norwegian 2022-11-30 19:09:21 +01:00
pl LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
pt LangModels: add support for Portuguese / ISO-8859-1. 2016-09-21 00:01:07 +02:00
ro LangModels: Romanian support added. 2016-09-28 19:57:50 +02:00
ru script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
sk LangModels: add support for Slovak. 2016-09-21 13:42:20 +02:00
sl LangModels: add Slovene support. 2016-09-28 22:13:17 +02:00
sr script, src, test: add Serbian support. 2022-12-17 22:47:54 +01:00
sv LangModels: add Swedish support. 2016-09-28 22:42:13 +02:00
th LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
tr test: 4 new tests for UTF-8. 2022-12-14 00:23:13 +01:00
uk script, src, test: add Ukrainian support. 2022-12-17 21:40:56 +01:00
vi LangModels: add VISCII encoding support and retrain Vietnamese model. 2016-02-13 03:51:18 +01:00
zh Adding some more test files for Russian and Chinese. 2015-11-18 19:27:38 +01:00
CMakeLists.txt test: add ability to have several tests per charsets. 2022-12-16 23:10:34 +01:00
uchardet-tests.c src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. 2022-12-14 00:24:53 +01:00