uchardet/script/BuildLangModelLogs/LangIrishModel.log
Jehan 5c3a2e8037 src, script: regenerate all existing language models.
Now making sure that we have a generic language model working with UTF-8
for all 26 supported models which had single-byte encoding support until
now.
2021-03-17 02:07:17 +01:00

159 lines
4.4 KiB
Plaintext

= Logs of language model for Irish (ga) =
- Generated by BuildLangModel.py
- Started: 2021-03-16 19:06:31.364099
- Maximum depth: 4
- Max number of pages: 100
== Parsed pages ==
Tracy Caldwell Dyson (revision 972597)
14 Lúnasa (revision 945830)
1969 (revision 950246)
Arcadia (revision 940778)
California (revision 977165)
Ceimic (revision 996644)
Ceimic fhisiciúil (revision 927461)
Ceimiceoir (revision 927503)
Fisiceoir (revision 880864)
IMDb (revision 941231)
Max Q (revision 910451)
Medal "For Merit in Space Exploration" (revision 972605)
NASA (revision 982342)
Ollscoil California, Davis (revision 972597)
Rúisis (revision 990076)
SAM (revision 976971)
Spáinnis (revision 976986)
Spásaire (revision 948727)
Stáisiún Idirnáisiúnta Spáis (revision 810459)
Stáit Aontaithe Mheiriceá (revision 976971)
Tointeálaí spáis (revision 884452)
10 Lúnasa (revision 649045)
11 Lúnasa (revision 855483)
12 Lúnasa (revision 970783)
13 Lúnasa (revision 843084)
1598 (revision 703178)
15 Lúnasa (revision 776986)
16 Lúnasa (revision 956751)
1740 (revision 868712)
1771 (revision 776762)
17 Lúnasa (revision 777131)
1823 (revision 884394)
1832 (revision 870502)
1898 (revision 881354)
18 Lúnasa (revision 777242)
1911 (revision 884923)
1956 (revision 922906)
1962 (revision 948322)
1966 (revision 983105)
1983 (revision 950195)
19 Lúnasa (revision 648524)
1 Lúnasa (revision 970005)
2001 (revision 953347)
2004 (revision 915512)
20 Lúnasa (revision 863369)
21 Lúnasa (revision 987631)
22 Lúnasa (revision 949242)
23 Lúnasa (revision 778453)
24 Lúnasa (revision 855482)
25 Lúnasa (revision 922966)
26 Lúnasa (revision 649051)
27 Lúnasa (revision 855881)
28 Lúnasa (revision 855201)
29 Lúnasa (revision 937884)
2 Lúnasa (revision 949578)
30 Lúnasa (revision 648308)
31 Lúnasa (revision 874664)
3 Lúnasa (revision 954861)
4 Lúnasa (revision 936315)
5 Lúnasa (revision 946408)
6 Lúnasa (revision 936316)
7 Lúnasa (revision 936317)
8 Lúnasa (revision 648745)
9 Lúnasa (revision 868992)
AK Parti (revision 980611)
An Phacastáin (revision 975474)
An Tuirc (revision 975987)
Aoidh Uí Néill (revision 945830)
Aoine (revision 871416)
Bertolt Brecht (revision 996168)
Czesław Miłosz (revision 968559)
Céadaoin (revision 841385)
Dan Boyle (revision 981683)
Domhnach (revision 717663)
Déardaoin (revision 841384)
Féilire (revision 648837)
Halle Berry (revision 916135)
Henry Bagenal (revision 936900)
Iúil (revision 931127)
Luan (revision 717791)
Lúnasa (revision 970011)
Meán Fómhair (revision 931128)
Mila Kunis (revision 916248)
Pápa Pius VII (revision 972523)
Satharn (revision 717929)
Walter Scott (revision 973708)
Áth Buí (revision 923034)
10 Bealtaine (revision 974318)
11 Feabhra (revision 885848)
11 Meitheamh (revision 937886)
11 Márta (revision 956107)
11 Nollaig (revision 949777)
13 Eanáir (revision 952269)
14 Eanáir (revision 952327)
15 Meitheamh (revision 770401)
16 Nollaig (revision 922996)
17 Meán Fómhair (revision 974321)
17 Márta (revision 959908)
1882 (revision 894229)
1886 (revision 876620)
== End of Parsed pages ==
- Wikipedia parsing ended at: 2021-03-16 19:09:36.532359
42 characters appeared 213560 times.
First 31 characters:
[ 0] Char a: 15.363832178310547 %
[ 1] Char i: 10.505712680277206 %
[ 2] Char n: 8.10825997377786 %
[ 3] Char h: 7.447087469563589 %
[ 4] Char r: 6.299868889305113 %
[ 5] Char e: 6.046076044203034 %
[ 6] Char s: 5.528657051882375 %
[ 7] Char t: 4.9690953362052825 %
[ 8] Char c: 4.70593744146844 %
[ 9] Char l: 4.132328151339202 %
[10] Char o: 3.9469001685708935 %
[11] Char d: 3.2154897920958985 %
[12] Char g: 2.7795467315976774 %
[13] Char m: 2.6760629331335455 %
[14] Char á: 2.228413560591871 %
[15] Char u: 2.17550103015546 %
[16] Char b: 2.0130174189923205 %
[17] Char í: 1.7522007866641691 %
[18] Char é: 1.2207342198913653 %
[19] Char f: 1.1186551788724481 %
[20] Char ú: 1.0039333208466004 %
[21] Char ó: 0.8967035025285635 %
[22] Char p: 0.8475369919460574 %
[23] Char y: 0.2289754635699569 %
[24] Char v: 0.22101517138040833 %
[25] Char k: 0.17606293313354562 %
[26] Char w: 0.16295186364487732 %
[27] Char j: 0.09271399138415433 %
[28] Char z: 0.06836486233377037 %
[29] Char x: 0.03511893613036149 %
[30] Char q: 0.01311106948866829 %
The first 31 characters have an accumulated ratio of 0.9997986514328528.
707 sequences found.
First 512 (typical positive ratio): 0.9976732191628278
Next 512 (512-1024): 0.010039333208466004
Rest: -3.5561831257524545e-17
- Processing end: 2021-03-16 19:09:36.580170