mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-07 01:06:40 +08:00
Now making sure that we have a generic language model working with UTF-8 for all 26 supported models which had single-byte encoding support until now.
159 lines
4.4 KiB
Plaintext
159 lines
4.4 KiB
Plaintext
= Logs of language model for Irish (ga) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2021-03-16 19:06:31.364099
|
|
- Maximum depth: 4
|
|
- Max number of pages: 100
|
|
|
|
== Parsed pages ==
|
|
|
|
Tracy Caldwell Dyson (revision 972597)
|
|
14 Lúnasa (revision 945830)
|
|
1969 (revision 950246)
|
|
Arcadia (revision 940778)
|
|
California (revision 977165)
|
|
Ceimic (revision 996644)
|
|
Ceimic fhisiciúil (revision 927461)
|
|
Ceimiceoir (revision 927503)
|
|
Fisiceoir (revision 880864)
|
|
IMDb (revision 941231)
|
|
Max Q (revision 910451)
|
|
Medal "For Merit in Space Exploration" (revision 972605)
|
|
NASA (revision 982342)
|
|
Ollscoil California, Davis (revision 972597)
|
|
Rúisis (revision 990076)
|
|
SAM (revision 976971)
|
|
Spáinnis (revision 976986)
|
|
Spásaire (revision 948727)
|
|
Stáisiún Idirnáisiúnta Spáis (revision 810459)
|
|
Stáit Aontaithe Mheiriceá (revision 976971)
|
|
Tointeálaí spáis (revision 884452)
|
|
10 Lúnasa (revision 649045)
|
|
11 Lúnasa (revision 855483)
|
|
12 Lúnasa (revision 970783)
|
|
13 Lúnasa (revision 843084)
|
|
1598 (revision 703178)
|
|
15 Lúnasa (revision 776986)
|
|
16 Lúnasa (revision 956751)
|
|
1740 (revision 868712)
|
|
1771 (revision 776762)
|
|
17 Lúnasa (revision 777131)
|
|
1823 (revision 884394)
|
|
1832 (revision 870502)
|
|
1898 (revision 881354)
|
|
18 Lúnasa (revision 777242)
|
|
1911 (revision 884923)
|
|
1956 (revision 922906)
|
|
1962 (revision 948322)
|
|
1966 (revision 983105)
|
|
1983 (revision 950195)
|
|
19 Lúnasa (revision 648524)
|
|
1 Lúnasa (revision 970005)
|
|
2001 (revision 953347)
|
|
2004 (revision 915512)
|
|
20 Lúnasa (revision 863369)
|
|
21 Lúnasa (revision 987631)
|
|
22 Lúnasa (revision 949242)
|
|
23 Lúnasa (revision 778453)
|
|
24 Lúnasa (revision 855482)
|
|
25 Lúnasa (revision 922966)
|
|
26 Lúnasa (revision 649051)
|
|
27 Lúnasa (revision 855881)
|
|
28 Lúnasa (revision 855201)
|
|
29 Lúnasa (revision 937884)
|
|
2 Lúnasa (revision 949578)
|
|
30 Lúnasa (revision 648308)
|
|
31 Lúnasa (revision 874664)
|
|
3 Lúnasa (revision 954861)
|
|
4 Lúnasa (revision 936315)
|
|
5 Lúnasa (revision 946408)
|
|
6 Lúnasa (revision 936316)
|
|
7 Lúnasa (revision 936317)
|
|
8 Lúnasa (revision 648745)
|
|
9 Lúnasa (revision 868992)
|
|
AK Parti (revision 980611)
|
|
An Phacastáin (revision 975474)
|
|
An Tuirc (revision 975987)
|
|
Aoidh Uí Néill (revision 945830)
|
|
Aoine (revision 871416)
|
|
Bertolt Brecht (revision 996168)
|
|
Czesław Miłosz (revision 968559)
|
|
Céadaoin (revision 841385)
|
|
Dan Boyle (revision 981683)
|
|
Domhnach (revision 717663)
|
|
Déardaoin (revision 841384)
|
|
Féilire (revision 648837)
|
|
Halle Berry (revision 916135)
|
|
Henry Bagenal (revision 936900)
|
|
Iúil (revision 931127)
|
|
Luan (revision 717791)
|
|
Lúnasa (revision 970011)
|
|
Meán Fómhair (revision 931128)
|
|
Mila Kunis (revision 916248)
|
|
Pápa Pius VII (revision 972523)
|
|
Satharn (revision 717929)
|
|
Walter Scott (revision 973708)
|
|
Áth Buí (revision 923034)
|
|
10 Bealtaine (revision 974318)
|
|
11 Feabhra (revision 885848)
|
|
11 Meitheamh (revision 937886)
|
|
11 Márta (revision 956107)
|
|
11 Nollaig (revision 949777)
|
|
13 Eanáir (revision 952269)
|
|
14 Eanáir (revision 952327)
|
|
15 Meitheamh (revision 770401)
|
|
16 Nollaig (revision 922996)
|
|
17 Meán Fómhair (revision 974321)
|
|
17 Márta (revision 959908)
|
|
1882 (revision 894229)
|
|
1886 (revision 876620)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2021-03-16 19:09:36.532359
|
|
|
|
42 characters appeared 213560 times.
|
|
|
|
First 31 characters:
|
|
[ 0] Char a: 15.363832178310547 %
|
|
[ 1] Char i: 10.505712680277206 %
|
|
[ 2] Char n: 8.10825997377786 %
|
|
[ 3] Char h: 7.447087469563589 %
|
|
[ 4] Char r: 6.299868889305113 %
|
|
[ 5] Char e: 6.046076044203034 %
|
|
[ 6] Char s: 5.528657051882375 %
|
|
[ 7] Char t: 4.9690953362052825 %
|
|
[ 8] Char c: 4.70593744146844 %
|
|
[ 9] Char l: 4.132328151339202 %
|
|
[10] Char o: 3.9469001685708935 %
|
|
[11] Char d: 3.2154897920958985 %
|
|
[12] Char g: 2.7795467315976774 %
|
|
[13] Char m: 2.6760629331335455 %
|
|
[14] Char á: 2.228413560591871 %
|
|
[15] Char u: 2.17550103015546 %
|
|
[16] Char b: 2.0130174189923205 %
|
|
[17] Char í: 1.7522007866641691 %
|
|
[18] Char é: 1.2207342198913653 %
|
|
[19] Char f: 1.1186551788724481 %
|
|
[20] Char ú: 1.0039333208466004 %
|
|
[21] Char ó: 0.8967035025285635 %
|
|
[22] Char p: 0.8475369919460574 %
|
|
[23] Char y: 0.2289754635699569 %
|
|
[24] Char v: 0.22101517138040833 %
|
|
[25] Char k: 0.17606293313354562 %
|
|
[26] Char w: 0.16295186364487732 %
|
|
[27] Char j: 0.09271399138415433 %
|
|
[28] Char z: 0.06836486233377037 %
|
|
[29] Char x: 0.03511893613036149 %
|
|
[30] Char q: 0.01311106948866829 %
|
|
|
|
The first 31 characters have an accumulated ratio of 0.9997986514328528.
|
|
|
|
707 sequences found.
|
|
|
|
First 512 (typical positive ratio): 0.9976732191628278
|
|
Next 512 (512-1024): 0.010039333208466004
|
|
Rest: -3.5561831257524545e-17
|
|
|
|
- Processing end: 2021-03-16 19:09:36.580170
|