mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-07 01:06:40 +08:00
Now making sure that we have a generic language model working with UTF-8 for all 26 supported models which had single-byte encoding support until now.
161 lines
4.7 KiB
Plaintext
161 lines
4.7 KiB
Plaintext
= Logs of language model for Estonian (et) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2021-03-16 18:58:31.291439
|
|
- Maximum depth: 4
|
|
- Max number of pages: 100
|
|
|
|
== Parsed pages ==
|
|
|
|
Harilik pohl (revision 5703478)
|
|
A-vitamiin (revision 5556956)
|
|
Aasta keskmine sademete hulk (revision 5284375)
|
|
Aasta keskmine õhutemperatuur (revision 5542687)
|
|
Ahm (revision 5513665)
|
|
Ain Raal (revision 5662146)
|
|
Alalehed (revision 4983554)
|
|
Alamliik (revision 5278935)
|
|
Alaska (revision 5844590)
|
|
Aleksander Heintalu (revision 5754094)
|
|
Aleuudid (revision 4704649)
|
|
Ameerika jänes (revision 5843342)
|
|
Ameerika valgejänes (revision 5411720)
|
|
Anneli Sihvart (revision 3546469)
|
|
Arbutiin (revision 4451788)
|
|
Baribal (revision 5793838)
|
|
Bensoehape (revision 5172889)
|
|
Binaarne nomenklatuur (revision 5719069)
|
|
C-vitamiin (revision 5487089)
|
|
Droog (revision 5051359)
|
|
E-vitamiin (revision 5553995)
|
|
Eesti (revision 5807277)
|
|
Eesti Entsüklopeediakirjastus (revision 5697753)
|
|
Eesti köök (revision 5622964)
|
|
Ellips (revision 5425749)
|
|
Emakakael (botaanika) (revision 3521516)
|
|
Euraasia (revision 5843444)
|
|
Fenoloogia (revision 3512905)
|
|
Folaadid (revision 5695132)
|
|
Fosfor (revision 5817280)
|
|
Fotosüntees (revision 5849350)
|
|
Fruktoos (revision 5580398)
|
|
Glükoos (revision 5398752)
|
|
Gneiss (revision 4333338)
|
|
Graniit (revision 5788916)
|
|
Gröönimaa (revision 5704662)
|
|
Halljänes (revision 5844682)
|
|
Haned (revision 5655933)
|
|
Happeline keskkond (revision 2966453)
|
|
Heilongjiang (revision 5573413)
|
|
Hendrik Relve (revision 5776793)
|
|
Hiina (revision 5842572)
|
|
Holland (revision 5563481)
|
|
Hunt (revision 5833431)
|
|
Hõimkond (revision 5594301)
|
|
Hüdrofiilsus (revision 4309797)
|
|
Ida-Euroopa (revision 5852084)
|
|
Ida-sinilind (revision 3944751)
|
|
Ida-vöötorav (revision 5772003)
|
|
Igihaljus (revision 5718075)
|
|
Ilves (revision 5810469)
|
|
Imetaja (revision 5817468)
|
|
Immuunsus (revision 5465129)
|
|
Indiaanlased (revision 5715264)
|
|
Indrek Rohtmets (revision 5460729)
|
|
Itaalia (revision 5821960)
|
|
Jaapan (revision 5848576)
|
|
Jilin (revision 5551781)
|
|
Jood (revision 5506157)
|
|
Juurestik (revision 3341159)
|
|
Jääkaru (revision 5798648)
|
|
Jõhvikas (revision 5765158)
|
|
Kaalium (revision 5506158)
|
|
Kaheidulehelised (revision 4551109)
|
|
Kaheli õiekate (revision 3063362)
|
|
Kahesuguline õis (revision 3383221)
|
|
Kaitsestaatus (revision 5622492)
|
|
Kajakas (revision 5799897)
|
|
Kalorsus (revision 5843070)
|
|
Kaltsium (revision 5506160)
|
|
Kanada (revision 5846973)
|
|
Kanalised (revision 4824603)
|
|
Kanarbikulaadsed (revision 4318215)
|
|
Kanarbikulised (revision 5479568)
|
|
Karboksüülhapped (revision 5328337)
|
|
Karoteen (revision 5479578)
|
|
Kasvuperiood (revision 5279042)
|
|
Katteseemnetaimed (revision 5315975)
|
|
Kaukasus (revision 4476003)
|
|
Kesk-Euroopa (revision 5381871)
|
|
Kimalane (revision 5643935)
|
|
Kiudained (toit) (revision 5762236)
|
|
Klass (bioloogia) (revision 3489567)
|
|
Kliima (revision 5719219)
|
|
Korea (revision 5555270)
|
|
Kroom (revision 5506123)
|
|
Kroonlehed (revision 3543291)
|
|
Kuusepüü (revision 5715613)
|
|
Kvertsetiin (revision 5610539)
|
|
Laanemets (revision 5751227)
|
|
Laanepüü (revision 5747330)
|
|
Laiuskraad (revision 4993978)
|
|
Leesikas (revision 5842030)
|
|
Lehed (revision 5725384)
|
|
Leheroots (revision 5532086)
|
|
Liik (bioloogia) (revision 5791564)
|
|
Liiv (revision 5675176)
|
|
Liivakivi (revision 5548801)
|
|
Linnaeus (revision 5635181)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2021-03-16 19:01:52.570995
|
|
|
|
55 characters appeared 482798 times.
|
|
|
|
First 34 characters:
|
|
[ 0] Char a: 12.61500669016856 %
|
|
[ 1] Char i: 10.380117564695794 %
|
|
[ 2] Char e: 10.063007717513328 %
|
|
[ 3] Char s: 8.719795856652263 %
|
|
[ 4] Char t: 6.619538606207979 %
|
|
[ 5] Char l: 6.04559256666349 %
|
|
[ 6] Char u: 5.504372429049002 %
|
|
[ 7] Char n: 5.077278696266347 %
|
|
[ 8] Char k: 4.702380705802427 %
|
|
[ 9] Char o: 4.470606754791859 %
|
|
[10] Char d: 4.163438953765343 %
|
|
[11] Char r: 3.6719290469306007 %
|
|
[12] Char m: 3.5747869709485123 %
|
|
[13] Char v: 2.4621063053285224 %
|
|
[14] Char p: 1.8848462503987176 %
|
|
[15] Char g: 1.8341003898110597 %
|
|
[16] Char h: 1.7551853984482124 %
|
|
[17] Char j: 1.7216309926718836 %
|
|
[18] Char ä: 1.033972800218725 %
|
|
[19] Char õ: 0.9384877319292955 %
|
|
[20] Char b: 0.8972696655744226 %
|
|
[21] Char ü: 0.6507897712915132 %
|
|
[22] Char f: 0.34610748180398426 %
|
|
[23] Char c: 0.30426803756436444 %
|
|
[24] Char ö: 0.24275162697442823 %
|
|
[25] Char y: 0.1056342404069611 %
|
|
[26] Char x: 0.05550975770405014 %
|
|
[27] Char w: 0.035211413468987034 %
|
|
[28] Char z: 0.025476493274620024 %
|
|
[29] Char q: 0.019884092311898558 %
|
|
[30] Char š: 0.017605706734493517 %
|
|
[31] Char é: 0.009527794232784725 %
|
|
[32] Char ō: 0.009113542309620172 %
|
|
[33] Char ž: 0.00869929038645562 %
|
|
|
|
The first 34 characters have an accumulated ratio of 0.9996603134230051.
|
|
|
|
869 sequences found.
|
|
|
|
First 512 (typical positive ratio): 0.9973685549586747
|
|
Next 512 (512-1024): 8.69929038645562e-05
|
|
Rest: -3.122502256758253e-17
|
|
|
|
- Processing end: 2021-03-16 19:01:52.649852
|