Jehan 0fe51d3851 Issue #21: Greek CP737 support.
It actually breaks "zh:big5" so I'm going to hold-off a bit. Adding more
language and charset support is slowly starting to show the limitations
of our legacy multi-byte charset supports, since I haven't really
touched these since the original implementation of Mozilla.

It might be time to start reviewing these parts of the code.

The test file contents comes from 'Μαρμότα' page on Wikipedia in Greek
(though since 2 letters are missing in this encoding, despite its
popularity for Greek, I had to be careful in choosing pieces of text
without such letters).
2022-12-18 22:33:12 +01:00
..
ar.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
be.py script, src, test: adding Belarusian support. 2022-12-17 19:13:03 +01:00
bg.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
cs.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
da.py script, src, test: add IBM865 support for Danish. 2022-11-30 19:57:52 +01:00
de.py scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
el.py Issue #21: Greek CP737 support. 2022-12-18 22:33:12 +01:00
en.py script, src: add English language model. 2022-12-14 00:24:53 +01:00
eo.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
es.py scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
et.py script: forgot to commit the Estonian description. 2016-09-27 00:51:19 +02:00
fi.py LangModels: add Finnish support. 2016-09-21 18:27:39 +02:00
fr.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
ga.py LangModels: added support for Irish Gaelic. 2016-09-27 00:49:05 +02:00
he.py Issue #22: Hebrew CP862 support. 2022-12-16 23:27:52 +01:00
hi.py src: add Hindi/UTF-8 support. 2022-12-14 00:23:13 +01:00
hr.py LangModels: new Croatian models. 2016-09-26 01:32:49 +02:00
hu.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
it.py LangModels: add Italian support. 2016-09-21 18:52:09 +02:00
ko.py script, src: remove generated statistics data for Korean. 2022-12-14 00:24:53 +01:00
lt.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
lv.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
mk.py src, script: add Macedonian support. 2022-12-17 22:47:54 +01:00
mt.py LangModels: support for Maltese / ISO-8859-3. 2016-09-21 02:11:31 +02:00
no.py script, src: update Norwegian model with the new language features. 2022-12-14 00:24:53 +01:00
pl.py LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
pt.py LangModels: add support for Portuguese / ISO-8859-1. 2016-09-21 00:01:07 +02:00
ro.py LangModels: Romanian support added. 2016-09-28 19:57:50 +02:00
ru.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
sk.py script: regenerate Slovak and Slovene with better alphabet support. 2022-12-14 00:24:53 +01:00
sl.py src, script: add concept of alphabet_mapping in language models. 2022-12-14 00:24:53 +01:00
sr.py script, src, test: add Serbian support. 2022-12-17 22:47:54 +01:00
sv.py LangModels: add Swedish support. 2016-09-28 22:42:13 +02:00
th.py script, src: regenerate the Thai model. 2022-12-14 00:24:53 +01:00
tr.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
uk.py script, src, test: add Ukrainian support. 2022-12-17 21:40:56 +01:00
vi.py script, src: regenerate the Vietnamese model. 2022-12-14 00:24:53 +01:00