uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2026-07-30 16:26:27 +08:00

History

Jehan ffb94e4a9d script, src, test: Bulgarian language models added. Not sure why we had the Bulgarian support but haven't recently updated it (i.e. never with the model generation script, or so it seems), especially with generic language models, allowing to have UTF-8/Bulgarian support. Maybe I tested it some time ago and it was getting bad results? Anyway now with all the recents updates on the confidence computation, I get very good detection scores. So adding support for UTF-8/Bulgarian and rebuilding other models too. Also adding a test for ISO-8859-5/Bulgarian (we already had support, but no test files). The 2 new test files are text from page 'Мармоти' on Wikipedia in Bulgarian language.		2022-12-17 18:41:00 +01:00
..
ar	tests: add test files for Arabic.	2015-12-13 18:42:59 +01:00
bg	script, src, test: Bulgarian language models added.	2022-12-17 18:41:00 +01:00
cs	test: adding test files for Czech.	2016-09-21 03:44:22 +02:00
da	script, src, test: add IBM865 support for Danish.	2022-11-30 19:57:52 +01:00
de	test: 4 new tests for UTF-8.	2022-12-14 00:23:13 +01:00
el	Add Greek test files.	2015-11-18 02:57:09 +01:00
en	test: finally add English/UTF-8 test file.	2022-12-14 21:45:29 +01:00
eo	test: 4 new tests for UTF-8.	2022-12-14 00:23:13 +01:00
es	tests: test files for Spanish.	2015-12-12 18:55:43 +01:00
et	LangModels: Estonian models created.	2016-09-27 00:14:29 +02:00
fi	LangModels: add Finnish support.	2016-09-21 18:27:39 +02:00
fr	test: update UTF-16 and UTF-32 tests after label changing.	2015-12-04 19:46:51 +01:00
ga	LangModels: added support for Irish Gaelic.	2016-09-27 00:49:05 +02:00
he	test: adding 2 tests for Hebrew/IBM862 recognition.	2022-12-16 23:35:17 +01:00
hi	src: add Hindi/UTF-8 support.	2022-12-14 00:23:13 +01:00
hr	LangModels: new Croatian models.	2016-09-26 01:32:49 +02:00
hu	test: 4 new tests for UTF-8.	2022-12-14 00:23:13 +01:00
it	LangModels: add Italian support.	2016-09-21 18:52:09 +02:00
ja	Add UTF-16 test files without BOM...	2015-11-28 19:50:18 +01:00
ko	src, test: fix the new Johab prober and add a test.	2022-12-14 00:23:13 +01:00
lt	LangModels: add support for Latvian \| Lithuanian / ISO-8859-4 \| ISO-8859-10.	2016-09-21 00:27:16 +02:00
lv	LangModels: add support for Latvian \| Lithuanian / ISO-8859-4 \| ISO-8859-10.	2016-09-21 00:27:16 +02:00
mt	test: update the Maltese / ISO-8859-3 test file.	2022-11-29 14:59:17 +01:00
no	Add tests for norwegian	2022-11-30 19:09:21 +01:00
pl	LangModels: add Polish support.	2016-09-21 17:30:15 +02:00
pt	LangModels: add support for Portuguese / ISO-8859-1.	2016-09-21 00:01:07 +02:00
ro	LangModels: Romanian support added.	2016-09-28 19:57:50 +02:00
ru	Add some Russian test files.	2015-11-27 18:17:20 +01:00
sk	LangModels: add support for Slovak.	2016-09-21 13:42:20 +02:00
sl	LangModels: add Slovene support.	2016-09-28 22:13:17 +02:00
sv	LangModels: add Swedish support.	2016-09-28 22:42:13 +02:00
th	LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models.	2015-12-04 03:14:52 +01:00
tr	test: 4 new tests for UTF-8.	2022-12-14 00:23:13 +01:00
vi	LangModels: add VISCII encoding support and retrain Vietnamese model.	2016-02-13 03:51:18 +01:00
zh	Adding some more test files for Russian and Chinese.	2015-11-18 19:27:38 +01:00
CMakeLists.txt	test: add ability to have several tests per charsets.	2022-12-16 23:10:34 +01:00
uchardet-tests.c	src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/.	2022-12-14 00:24:53 +01:00