From 2a04e57c8f19904a002f09021640a4b5537cb8e5 Mon Sep 17 00:00:00 2001 From: Jehan Date: Sun, 23 May 2021 16:41:27 +0200 Subject: [PATCH] test: update the Maltese / ISO-8859-3 test file. Taken from the page: https://mt.wikipedia.org/wiki/Lingwa_Maltija The old test was fine but had some French words in it, which lowered the confidence for Maltese. Technically it should not be a huge issue in the end, i.e. that if there are enough actual Maltese words, the stats should still weigh in favor of Maltese likeness (which they mostly did anyway), but since I am making some other changes, this was just not enough. In particular I was changing some of the UTF-8 confidence logics and the file ended up detected as UTF-8 (even though it has illegal sequence and cannot be! Cf. #9). So the real long-term solution is to actually fix our UTF-8 detector, which I'll do at some point, but for the time being, let's have definite non-questionable Maltese in there to simplify testing at this early stage of uchardet rewriting. --- test/mt/iso-8859-3.txt | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/test/mt/iso-8859-3.txt b/test/mt/iso-8859-3.txt index 255269b..d98884a 100644 --- a/test/mt/iso-8859-3.txt +++ b/test/mt/iso-8859-3.txt @@ -1,4 +1 @@ -Franza (Franċiż:France), uffiċjalment ir-Repubblika Franċiża (Franċiż: -République française), hi pajjiż fl-Ewropa tal-Punent. Il-belt belt kapitali -tagħha hi Pariġi. Hi membru tal-Unjoni Ewropea. Franza hi maqsuma f'22 régions -li huma suddiviżi f' départements. +Il-Malti huwa l-ilsien nazzjonali tar-Repubblika ta' Malta. Huwa l-ilsien uffiċjali flimkien mal-Ingliż; kif ukoll wieħed mill-ilsna uffiċjali tal-Unjoni Ewropea. Dan l-ilsien għandu sisien u għerq semitiku, ta' djalett Għarbi li ġej mit-Tramuntana tal-Afrika, għalħekk qatt ma kellu rabta mill-qrib mal-Għarbi Klassiku. Iżda tul iż-żminijiet, minħabba proċess tal-Latinizzazzjoni ta' Malta, bdew deħlin bosta elementi lingwistiċi mill-Isqalli, djalett ta' art li wkoll għaddiet minn żmien ta' ħakma Għarbija. Wara l-Isqalli beda dieħel ukoll it-Taljan, fuq kollox fiż-żmien tad-daħla tal-Kavallieri tal-Ordni ta' San Ġwann sa meta l-Ingliż ħa post it-Taljan bħala l-ilsien uffiċjali fil-Kostituzzjoni Kolonjali tal-1934. Il-Malti huwa l-ilsien waħdieni ta' għajn semitika li jinkiteb b'ittri Latini. L-alfabett Malti magħmul minn 30 ittra (24 konsonanti u 6 vokali) li jidhru f'din l-ordni: