3 Commits

Author SHA1 Message Date
Jehan
388777be51 script, src, test: add IBM865 support for Danish.
Newly added IBM865 charset (for Norwegian) can also be used for Danish

By the way, I fixed `script/charsets/ibm865.py` as Danish uses the 'da'
ISO 639-1 code by the way, not 'dk' (which is sometimes used for other
codes for Denmark, such as ISO 3166 country code and internet TLD) but
not for the language itself.

For the test, adding some text from the top article of the day on the
Danish Wikipedia, which was about Jimi Hendrix. And that's cool! 🎸 ;-)
2022-11-30 19:57:52 +01:00
Jehan
198190461e script: move the Wikipedia title syntax cleaning to BuildLangModel.py. 2016-02-21 16:20:22 +01:00
Jehan
923d264470 LangModels: add Danish support (Windows-1252, ISO-8859-1 and ISO-8859-15).
Test for ISO-8859-1 is disabled for now since the difference is not big
enough, as for characters used in Danish, between ISO-8859-1 and
ISO-8859-15. Therefore the first to be declared "wins".
Let's see to improve this later.
Test contents from:
https://da.wikipedia.org/wiki/Eurosymbol
https://da.wikipedia.org/wiki/Dansk_%28sprog%29
2016-02-19 19:10:41 +01:00