2 Commits

Author SHA1 Message Date
Jehan
198190461e script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
Jehan
923d264470 LangModels: add Danish support (Windows-1252, ISO-8859-1 and ISO-8859-15).
Test for ISO-8859-1 is disabled for now since the difference is not big
enough, as for characters used in Danish, between ISO-8859-1 and
ISO-8859-15. Therefore the first to be declared "wins".
Let's see to improve this later.
Test contents from:
https://da.wikipedia.org/wiki/Eurosymbol
https://da.wikipedia.org/wiki/Dansk_%28sprog%29
2016-02-19 19:10:41 +01:00