Jehan d40e5868d5 script, src, test: adding Catalan support.
For UTF-8, ISO-8859-1 and WINDOWS-1252 support.

The test for UTF-8 and ISO-8859-1 is taken from 'Marmota' page on
Wikipedia in Catalan. The test for WINDOWS-1252 is taken from the
'Unió_Europea' page. ISO-8859-1 and WINDOWS-1252 being very similar,
regarding most letters (in particular the ones used in Catalan), I
differentiated the test with a text containing the '€' symbol, which is
on an unused spot in ISO-8859-1.
2022-12-20 01:46:15 +01:00
..
ar.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
be.py script, src, test: adding Belarusian support. 2022-12-17 19:13:03 +01:00
bg.py script, src, test: Bulgarian language models added. 2022-12-17 18:41:00 +01:00
ca.py script, src, test: adding Catalan support. 2022-12-20 01:46:15 +01:00
cs.py LangModels: add support for Czech. 2016-09-21 03:33:50 +02:00
da.py script, src, test: add IBM865 support for Danish. 2022-11-30 19:57:52 +01:00
de.py scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
el.py Issue #21: Greek CP737 support. 2022-12-18 22:33:12 +01:00
en.py script, src: add English language model. 2022-12-14 00:24:53 +01:00
eo.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
es.py scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
et.py script: forgot to commit the Estonian description. 2016-09-27 00:51:19 +02:00
fi.py LangModels: add Finnish support. 2016-09-21 18:27:39 +02:00
fr.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
ga.py LangModels: added support for Irish Gaelic. 2016-09-27 00:49:05 +02:00
he.py Issue #22: Hebrew CP862 support. 2022-12-16 23:27:52 +01:00
hi.py src: add Hindi/UTF-8 support. 2022-12-14 00:23:13 +01:00
hr.py LangModels: new Croatian models. 2016-09-26 01:32:49 +02:00
hu.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
it.py LangModels: add Italian support. 2016-09-21 18:52:09 +02:00
ko.py script, src: remove generated statistics data for Korean. 2022-12-14 00:24:53 +01:00
lt.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
lv.py LangModels: add support for Latvian | Lithuanian / ISO-8859-4 | ISO-8859-10. 2016-09-21 00:27:16 +02:00
mk.py src, script: add Macedonian support. 2022-12-17 22:47:54 +01:00
mt.py LangModels: support for Maltese / ISO-8859-3. 2016-09-21 02:11:31 +02:00
no.py script, src: update Norwegian model with the new language features. 2022-12-14 00:24:53 +01:00
pl.py LangModels: add Polish support. 2016-09-21 17:30:15 +02:00
pt.py LangModels: add support for Portuguese / ISO-8859-1. 2016-09-21 00:01:07 +02:00
ro.py LangModels: Romanian support added. 2016-09-28 19:57:50 +02:00
ru.py script, src: regenerate Russian models and add UTF-8/Russian support. 2022-12-17 21:41:11 +01:00
sk.py script: regenerate Slovak and Slovene with better alphabet support. 2022-12-14 00:24:53 +01:00
sl.py src, script: add concept of alphabet_mapping in language models. 2022-12-14 00:24:53 +01:00
sr.py script, src, test: add Serbian support. 2022-12-17 22:47:54 +01:00
sv.py LangModels: add Swedish support. 2016-09-28 22:42:13 +02:00
th.py script, src: regenerate the Thai model. 2022-12-14 00:24:53 +01:00
tr.py script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
uk.py script, src, test: add Ukrainian support. 2022-12-17 21:40:56 +01:00
vi.py script, src: regenerate the Vietnamese model. 2022-12-14 00:24:53 +01:00