test: update the Maltese / ISO-8859-3 test file.

Taken from the page: https://mt.wikipedia.org/wiki/Lingwa_Maltija
The old test was fine but had some French words in it, which lowered the
confidence for Maltese.
Technically it should not be a huge issue in the end, i.e. that if there
are enough actual Maltese words, the stats should still weigh in favor
of Maltese likeness (which they mostly did anyway), but since I am
making some other changes, this was just not enough. In particular I was
changing some of the UTF-8 confidence logics and the file ended up
detected as UTF-8 (even though it has illegal sequence and cannot be!
Cf. #9).

So the real long-term solution is to actually fix our UTF-8 detector,
which I'll do at some point, but for the time being, let's have definite
non-questionable Maltese in there to simplify testing at this early
stage of uchardet rewriting.
This commit is contained in:
Jehan 2021-05-23 16:41:27 +02:00
parent 45bd32d102
commit 2a04e57c8f

View File

@ -1,4 +1 @@
Franza (Franåi¿:France), uffiåjalment ir-Repubblika Franåi¿a (Franåi¿: Il-Malti huwa l-ilsien nazzjonali tar-Repubblika ta' Malta. Huwa l-ilsien uffiåjali flimkien mal-Ingli¿; kif ukoll wie±ed mill-ilsna uffiåjali tal-Unjoni Ewropea. Dan l-ilsien g±andu sisien u g±erq semitiku, ta' djalett G±arbi li õej mit-Tramuntana tal-Afrika, g±al±ekk qatt ma kellu rabta mill-qrib mal-G±arbi Klassiku. I¿da tul i¿-¿minijiet, min±abba proåess tal-Latinizzazzjoni ta' Malta, bdew de±lin bosta elementi lingwistiåi mill-Isqalli, djalett ta' art li wkoll g±addiet minn ¿mien ta' ±akma G±arbija. Wara l-Isqalli beda die±el ukoll it-Taljan, fuq kollox fi¿-¿mien tad-da±la tal-Kavallieri tal-Ordni ta' San Õwann sa meta l-Ingli¿ ±a post it-Taljan b±ala l-ilsien uffiåjali fil-Kostituzzjoni Kolonjali tal-1934. Il-Malti huwa l-ilsien wa±dieni ta' g±ajn semitika li jinkiteb b'ittri Latini. L-alfabett Malti mag±mul minn 30 ittra (24 konsonanti u 6 vokali) li jidhru f'din l-ordni:
République française), hi pajji¿ fl-Ewropa tal-Punent. Il-belt belt kapitali
tag±ha hi Pariõi. Hi membru tal-Unjoni Ewropea. Franza hi maqsuma f'22 régions
li huma suddivi¿i f' départements.