mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
Now making sure that we have a generic language model working with UTF-8 for all 26 supported models which had single-byte encoding support until now.
158 lines
4.7 KiB
Plaintext
158 lines
4.7 KiB
Plaintext
= Logs of language model for Esperanto (eo) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2021-03-16 18:50:26.592918
|
|
- Maximum depth: 4
|
|
- Max number of pages: 100
|
|
|
|
== Parsed pages ==
|
|
|
|
Vikipedio:Ĉefpaĝo (revision 7070684)
|
|
1-a de marto (revision 7133709)
|
|
10-a de marto (revision 7140053)
|
|
1812 (revision 6759865)
|
|
1836 (revision 6759900)
|
|
1870 (revision 6759944)
|
|
2-a de marto (revision 7134407)
|
|
2013 (revision 7120546)
|
|
2021 (revision 7133381)
|
|
20a jarcento (revision 6911173)
|
|
4-a de aprilo (revision 7095124)
|
|
7-a de februaro (revision 7126938)
|
|
7-a de marto (revision 7140031)
|
|
9-a de junio (revision 7096958)
|
|
Advokato (revision 7015897)
|
|
Alĝerio (revision 7136438)
|
|
Amazona arbaro (revision 7057380)
|
|
Anglio (revision 6910536)
|
|
Antikva Egiptio (revision 6715674)
|
|
Batao (revision 6348833)
|
|
Biero en Germanio (revision 5158902)
|
|
Bjalistoko (revision 7095427)
|
|
Charles Dickens (revision 7139853)
|
|
David Copperfield (romano) (revision 6728487)
|
|
Decembro de 2020 (revision 7115650)
|
|
Demotika lingvo (revision 6581652)
|
|
Duolingo (revision 6996800)
|
|
Eduko (revision 7064206)
|
|
Ekvatora Gvineo (revision 7111153)
|
|
El Greco (revision 7130251)
|
|
Emmanuel Macron (revision 7076767)
|
|
Esperantisto (revision 6583368)
|
|
Esperanto (revision 7125932)
|
|
Esperanto kaj Libera Scio (revision 7106401)
|
|
Eŭropa Kosma Agentejo (revision 6998003)
|
|
Fabriko (revision 6775703)
|
|
Februaro de 2021 (revision 7139991)
|
|
Fluganta Spagetmonstro (revision 7072467)
|
|
Fondaĵo Vikimedio (revision 7097854)
|
|
Francaj Armitaj Fortoj (revision 6521662)
|
|
Francio (revision 7035760)
|
|
Grandduklando Flandrensis (revision 7064691)
|
|
Hieroglifoj (revision 6475302)
|
|
Honkongo (revision 7022513)
|
|
Infanlaboro (revision 7043683)
|
|
Internacia Fonetika Alfabeto (revision 6826202)
|
|
Irlanda lingvo (revision 7108415)
|
|
Januaro de 2021 (revision 7119168)
|
|
Kreismo (revision 7029678)
|
|
Landport (revision 6722661)
|
|
Libera scio (revision 6432924)
|
|
Listen to Wikipedia (revision 6980163)
|
|
Listo de originalaj romanoj en Esperanto (revision 7134297)
|
|
Marto de 2021 (revision 7140759)
|
|
Metroo de Parizo (revision 7129616)
|
|
Monda Komerca Organizaĵo (revision 7135765)
|
|
Mutzig (revision 7085274)
|
|
Namacu (revision 6342288)
|
|
Ngozi Okonjo-Iweala (revision 7138302)
|
|
Niĝerio (revision 7135950)
|
|
Novelo (revision 7099911)
|
|
Oktobrofesto (revision 6860497)
|
|
Oseta Vikipedio (revision 7061966)
|
|
Portsmouth (revision 6756801)
|
|
Rolulo (revision 7078410)
|
|
Romano (revision 7102617)
|
|
San-Marino (revision 7075794)
|
|
Sismo (revision 6757493)
|
|
Slovaka Vikipedio (revision 6973132)
|
|
Strasburgo (revision 7139993)
|
|
Svahila Vikipedio (revision 6655220)
|
|
Telegram (aplikaĵo) (revision 6982939)
|
|
Teodoro Obiang Nguema Mbasogo (revision 6521358)
|
|
Verkisto (revision 6694998)
|
|
Vikio (revision 6761946)
|
|
Vikipedio (revision 7075981)
|
|
Vikipedio en Esperanto (revision 7075983)
|
|
Ĉeĥa Vikipedio (revision 5571847)
|
|
Ĉinio (revision 7133172)
|
|
Ĵurnalisto (revision 7129724)
|
|
-771 (revision 6917193)
|
|
-86 (revision 7120146)
|
|
1058 (revision 6758857)
|
|
11-a de marto (revision 7140194)
|
|
1101 (revision 6758901)
|
|
1105 (revision 6758905)
|
|
1131 (revision 6758935)
|
|
1157 (revision 6758962)
|
|
12-a de marto (revision 7141381)
|
|
1290 (revision 6759097)
|
|
13-a de marto (revision 7142227)
|
|
1389 (revision 6759315)
|
|
14-a de marto (revision 7142231)
|
|
1420 (revision 6759383)
|
|
1445 (revision 6759438)
|
|
1456 (revision 6759463)
|
|
1457 (revision 6759465)
|
|
1459 (revision 6759469)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2021-03-16 18:54:42.162702
|
|
|
|
55 characters appeared 738091 times.
|
|
|
|
First 32 characters:
|
|
[ 0] Char a: 12.443858548607151 %
|
|
[ 1] Char o: 9.828462886012701 %
|
|
[ 2] Char e: 9.238969178597218 %
|
|
[ 3] Char i: 8.570894374812863 %
|
|
[ 4] Char n: 7.557604685601098 %
|
|
[ 5] Char r: 6.426172382538196 %
|
|
[ 6] Char t: 5.784923539238386 %
|
|
[ 7] Char l: 5.684935868341437 %
|
|
[ 8] Char s: 5.134326255163659 %
|
|
[ 9] Char k: 4.062778166919797 %
|
|
[10] Char d: 3.544278415534128 %
|
|
[11] Char j: 3.39619369427347 %
|
|
[12] Char u: 2.807783864049284 %
|
|
[13] Char m: 2.731370522062998 %
|
|
[14] Char p: 2.685847680028614 %
|
|
[15] Char g: 1.6155189536249595 %
|
|
[16] Char v: 1.417033942969092 %
|
|
[17] Char c: 1.328968921176386 %
|
|
[18] Char b: 1.1882003709569686 %
|
|
[19] Char f: 1.1564969631115947 %
|
|
[20] Char h: 0.6592683016050866 %
|
|
[21] Char z: 0.6408423893530744 %
|
|
[22] Char ĝ: 0.5576548149211953 %
|
|
[23] Char ŭ: 0.44980903438735875 %
|
|
[24] Char ĉ: 0.3391180762263732 %
|
|
[25] Char w: 0.15404604581277917 %
|
|
[26] Char y: 0.13819434189009214 %
|
|
[27] Char ŝ: 0.12938783971082157 %
|
|
[28] Char ĵ: 0.1166522827131072 %
|
|
[29] Char á: 0.04579381133220701 %
|
|
[30] Char é: 0.039155063535526106 %
|
|
[31] Char ĥ: 0.031025984600814804 %
|
|
|
|
The first 32 characters have an accumulated ratio of 0.9990556719970846.
|
|
|
|
1066 sequences found.
|
|
|
|
First 512 (typical positive ratio): 0.995442680189542
|
|
Next 512 (512-1024): 0.004498090343873587
|
|
Rest: 6.983124116715766e-05
|
|
|
|
- Processing end: 2021-03-16 18:54:42.252378
|