uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2026-07-30 16:26:27 +08:00

History

Jehan 2a04e57c8f test: update the Maltese / ISO-8859-3 test file. Taken from the page: https://mt.wikipedia.org/wiki/Lingwa_Maltija The old test was fine but had some French words in it, which lowered the confidence for Maltese. Technically it should not be a huge issue in the end, i.e. that if there are enough actual Maltese words, the stats should still weigh in favor of Maltese likeness (which they mostly did anyway), but since I am making some other changes, this was just not enough. In particular I was changing some of the UTF-8 confidence logics and the file ended up detected as UTF-8 (even though it has illegal sequence and cannot be! Cf. #9). So the real long-term solution is to actually fix our UTF-8 detector, which I'll do at some point, but for the time being, let's have definite non-questionable Maltese in there to simplify testing at this early stage of uchardet rewriting.	2022-11-29 14:59:17 +01:00
..
iso-8859-3.txt	test: update the Maltese / ISO-8859-3 test file.	2022-11-29 14:59:17 +01:00
utf-8.txt	LangModels: support for Maltese / ISO-8859-3.	2016-09-21 02:11:31 +02:00

Jehan 2a04e57c8f test: update the Maltese / ISO-8859-3 test file.

Taken from the page: https://mt.wikipedia.org/wiki/Lingwa_Maltija
The old test was fine but had some French words in it, which lowered the
confidence for Maltese.
Technically it should not be a huge issue in the end, i.e. that if there
are enough actual Maltese words, the stats should still weigh in favor
of Maltese likeness (which they mostly did anyway), but since I am
making some other changes, this was just not enough. In particular I was
changing some of the UTF-8 confidence logics and the file ended up
detected as UTF-8 (even though it has illegal sequence and cannot be!
Cf. #9).

So the real long-term solution is to actually fix our UTF-8 detector,
which I'll do at some point, but for the time being, let's have definite
non-questionable Maltese in there to simplify testing at this early
stage of uchardet rewriting.

2022-11-29 14:59:17 +01:00

iso-8859-3.txt

test: update the Maltese / ISO-8859-3 test file.

2022-11-29 14:59:17 +01:00

utf-8.txt

LangModels: support for Maltese / ISO-8859-3.

2016-09-21 02:11:31 +02:00