uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-09 10:26:40 +08:00

Author	SHA1	Message	Date
Jehan	b7aebfdfda	LangModels: add support for Latvian \| Lithuanian / ISO-8859-4 \| ISO-8859-10. Just realizing that these 2 language can also be encoded with these charsets (even though ISO-8859-13 would appear to be more common… maybe?). Anyway now the models are updated and can recognize texts using these encoding for these languages. Added some test files as well, which work great.	2016-09-21 00:27:16 +02:00
Jehan	7cb3dd9ddd	LangModels: add support for Lithuanian / ISO-8859-13. Test text extracted from https://lt.wikipedia.org/wiki/Vincent_van_Gogh.	2016-09-20 23:09:24 +02:00
Jehan	178c6119b8	LangModels: add Windows-1258 support for Vietnamese. I was planning on adding VISCII support as well, but Python encode() method does not have any support for it apparently, so I cannot generate the proper statistics data with the current version of the string.	2016-02-13 02:32:57 +01:00
Jehan	9c3c37517c	LangModels: add Arabic support. Models constructed for ISO-8859-6 and Windows-1256.	2015-12-13 18:42:16 +01:00
Jehan	ad2f7212e2	LangModels: retraining Greek models with my training script. This fixes our Greek/Windows-1253 test.	2015-12-13 18:02:11 +01:00
Jehan	6b2722885a	BuildLangModel: forgot to add charset/language files.	2015-12-12 18:18:08 +01:00
Jehan	fb3c47a073	LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. ISO-8859-11 is basically exactly identical to TIS-620, with the added non-breaking space character. Basically our detection will always return TIS-620 except for exceptional cases when a text has a non-breaking space.	2015-12-04 03:14:52 +01:00
Jehan	ffcd85f709	script: forgot to commit ISO-8859-9 and Turkish files.	2015-12-04 02:40:54 +01:00
Jehan	f0e122b506	LangModels: add Esperanto ISO-8859-3 language model.	2015-12-04 01:35:56 +01:00
Jehan	0270b1e856	Adding French Windows-1252 support.	2015-12-03 21:22:30 +01:00
Jehan	0314f98ece	BuildLangModel.py: some in-progress script to build language models.	2015-11-29 01:30:04 +01:00

11 Commits