Jehan 0fe51d3851 Issue #21: Greek CP737 support.
It actually breaks "zh:big5" so I'm going to hold-off a bit. Adding more
language and charset support is slowly starting to show the limitations
of our legacy multi-byte charset supports, since I haven't really
touched these since the original implementation of Mozilla.

It might be time to start reviewing these parts of the code.

The test file contents comes from 'Μαρμότα' page on Wikipedia in Greek
(though since 2 letters are missing in this encoding, despite its
popularity for Greek, I had to be careful in choosing pieces of text
without such letters).
2022-12-18 22:33:12 +01:00
..
codepoints.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
cp737.py Issue #21: Greek CP737 support. 2022-12-18 22:33:12 +01:00
db.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
ibm852.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
ibm855.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
ibm862.py Issue #22: Hebrew CP862 support. 2022-12-16 23:27:52 +01:00
ibm865.py script, src, test: add IBM865 support for Danish. 2022-11-30 19:57:52 +01:00
ibm866.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
iso-8859-1.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
iso-8859-2.py BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
iso-8859-3.py LangModels: add Esperanto ISO-8859-3 language model. 2015-12-04 01:35:56 +01:00
iso-8859-4.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
iso-8859-5.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
iso-8859-6.py LangModels: add Arabic support. 2015-12-13 18:42:16 +01:00
iso-8859-7.py LangModels: retraining Greek models with my training script. 2015-12-13 18:02:11 +01:00
iso-8859-8.py script, src: generate the Hebrew models. 2022-12-14 00:23:13 +01:00
iso-8859-9.py script: forgot to commit ISO-8859-9 and Turkish files. 2015-12-04 02:40:54 +01:00
iso-8859-10.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
iso-8859-11.py LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
iso-8859-13.py LangModels: add support for Lithuanian / ISO-8859-13. 2016-09-20 23:09:24 +02:00
iso-8859-15.py BuildLangModel.py: some in-progress script to build language models. 2015-11-29 01:30:04 +01:00
iso-8859-16.py LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
koi8-r.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
mac-centraleurope.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
mac-cyrillic.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
tis-620.py LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models. 2015-12-04 03:14:52 +01:00
viscii.py LangModels: add Windows-1258 support for Vietnamese. 2016-02-13 02:32:57 +01:00
windows-1250.py BuildLangModel: forgot to add charset/language files. 2015-12-12 18:18:08 +01:00
windows-1251.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
windows-1252.py Adding French Windows-1252 support. 2015-12-03 21:22:30 +01:00
windows-1253.py LangModels: retraining Greek models with my training script. 2015-12-13 18:02:11 +01:00
windows-1255.py script, src: generate the Hebrew models. 2022-12-14 00:23:13 +01:00
windows-1256.py LangModels: add Arabic support. 2015-12-13 18:42:16 +01:00
windows-1257.py LangModels: Estonian models created. 2016-09-27 00:14:29 +02:00
windows-1258.py LangModels: add Windows-1258 support for Vietnamese. 2016-02-13 02:32:57 +01:00