mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
Encodings: ISO-8859-4, ISO-8859-13, ISO-8859-13, Windows-1252 and Windows-1257. Test text from https://et.wikipedia.org/wiki/Anton_Tšehhov Windows-1257 and ISO-8859-13 are very close so I added quotation marks (Jutumärgid) which are on codepoints only present in ISO-8859-13, making both encoding apart.
160 lines
4.6 KiB
Plaintext
160 lines
4.6 KiB
Plaintext
= Logs of language model for Estonian (et) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2016-09-26 23:45:22.351942
|
|
- Maximum depth: 5
|
|
- Max number of pages: 100
|
|
|
|
== Parsed pages ==
|
|
|
|
Harilik pohl (revision 4248853)
|
|
A-vitamiin (revision 4330862)
|
|
Aasta keskmine sademete hulk (revision 4266801)
|
|
Aasta keskmine õhutemperatuur (revision 3902142)
|
|
Ahm (revision 4343671)
|
|
Ain Raal (revision 4464651)
|
|
Alalehed (revision 2892741)
|
|
Alamliik (revision 3522810)
|
|
Alaska (revision 4216575)
|
|
Aleksander Heintalu (revision 4445156)
|
|
Aleuudid (revision 4335893)
|
|
Ameerika jänes (revision 4325220)
|
|
Ameerika valgejänes (revision 4355263)
|
|
Anneli Sihvart (revision 4211078)
|
|
Arbutiin (revision 4451788)
|
|
Baribal (revision 4268462)
|
|
Bensoehape (revision 3810308)
|
|
Binaarne nomenklatuur (revision 3970950)
|
|
C-vitamiin (revision 4444353)
|
|
Droog (revision 4352968)
|
|
E-vitamiin (revision 4336726)
|
|
Eesti (revision 4474984)
|
|
Eesti Entsüklopeediakirjastus (revision 4012421)
|
|
Eesti köök (revision 4314947)
|
|
Ellips (revision 4272113)
|
|
Emakakael (botaanika) (revision 3521516)
|
|
Euraasia (revision 3710768)
|
|
Fenoloogia (revision 3512905)
|
|
Folaadid (revision 4266628)
|
|
Fosfor (revision 4270122)
|
|
Fotosüntees (revision 4380600)
|
|
Fruktoos (revision 4285660)
|
|
Glükoos (revision 4047315)
|
|
Gneiss (revision 4333338)
|
|
Graniit (revision 4435351)
|
|
Gröönimaa (revision 4331557)
|
|
Halljänes (revision 4051603)
|
|
Haned (revision 4127680)
|
|
Happeline keskkond (revision 2966453)
|
|
Heilongjiang (revision 4342364)
|
|
Hendrik Relve (revision 4342591)
|
|
Hiina (revision 4448121)
|
|
Holland (revision 4307885)
|
|
Hunt (revision 4427752)
|
|
Hõimkond (revision 3489569)
|
|
Hüdrofiilsus (revision 4309797)
|
|
Ida-Euroopa (revision 4337624)
|
|
Ida-sinilind (revision 4248853)
|
|
Ida-vöötorav (revision 3520679)
|
|
Igihaljus (revision 3536500)
|
|
Ilves (revision 4404632)
|
|
Imetaja (revision 4289188)
|
|
Indiaanlased (revision 4479868)
|
|
Indrek Rohtmets (revision 4218674)
|
|
Itaalia (revision 4404119)
|
|
Jaapan (revision 4465542)
|
|
Jilin (revision 3894473)
|
|
Jood (revision 4025060)
|
|
Juurestik (revision 3341159)
|
|
Jääkaru (revision 4372399)
|
|
Jõhvikas (revision 4391549)
|
|
Kaalium (revision 4486067)
|
|
Kaheidulehelised (revision 4031352)
|
|
Kaheli õiekate (revision 3063362)
|
|
Kahesuguline õis (revision 3383221)
|
|
Kaitsestaatus (revision 3527096)
|
|
Kajakas (revision 4456839)
|
|
Kalorsus (revision 3843290)
|
|
Kaltsium (revision 4339861)
|
|
Kanada (revision 4434682)
|
|
Kanalised (revision 3616579)
|
|
Kanarbikulaadsed (revision 4318215)
|
|
Kanarbikulised (revision 3534760)
|
|
Karboksüülhapped (revision 3659011)
|
|
Karoteen (revision 4347634)
|
|
Kasvuperiood (revision 4231717)
|
|
Katteseemnetaimed (revision 4176294)
|
|
Kaukasus (revision 4476003)
|
|
Kesk-Euroopa (revision 3580746)
|
|
Kimalane (revision 4261145)
|
|
Kiudained (toit) (revision 3538655)
|
|
Klass (bioloogia) (revision 3489567)
|
|
Kliima (revision 4160781)
|
|
Korea (revision 4329396)
|
|
Kroom (revision 4030460)
|
|
Kroonlehed (revision 3543291)
|
|
Kuusepüü (revision 4028988)
|
|
Kvertsetiin (revision 4448461)
|
|
Laanemets (revision 4001157)
|
|
Laanepüü (revision 4475093)
|
|
Laiuskraad (revision 3990366)
|
|
Leesikas (revision 4420533)
|
|
Lehed (revision 4471821)
|
|
Leheroots (revision 3595351)
|
|
Liik (bioloogia) (revision 4320981)
|
|
Liiv (revision 4399494)
|
|
Liivakivi (revision 4330598)
|
|
Linnaeus (revision 4276836)
|
|
Linnud (revision 4479668)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2016-09-26 23:47:54.476445
|
|
|
|
55 characters appeared 433559 times.
|
|
|
|
First 33 characters:
|
|
[ 0] Char a: 12.486881831538499 %
|
|
[ 1] Char i: 10.26503889897338 %
|
|
[ 2] Char e: 10.177622884082673 %
|
|
[ 3] Char s: 8.710233209320991 %
|
|
[ 4] Char t: 6.56634967789851 %
|
|
[ 5] Char l: 6.051540851418146 %
|
|
[ 6] Char u: 5.423944607308348 %
|
|
[ 7] Char n: 5.131020230233947 %
|
|
[ 8] Char k: 4.663033174262327 %
|
|
[ 9] Char o: 4.526950195936424 %
|
|
[10] Char d: 4.167368224393911 %
|
|
[11] Char r: 3.6740097656835635 %
|
|
[12] Char m: 3.552688330769284 %
|
|
[13] Char v: 2.4700213811730354 %
|
|
[14] Char p: 1.9229216784797456 %
|
|
[15] Char g: 1.865259399528092 %
|
|
[16] Char h: 1.8043680329551455 %
|
|
[17] Char j: 1.6860450365463524 %
|
|
[18] Char ä: 1.0247740215287884 %
|
|
[19] Char b: 0.9255949017319443 %
|
|
[20] Char õ: 0.9246723052687178 %
|
|
[21] Char ü: 0.6536595941959457 %
|
|
[22] Char f: 0.37342091849090897 %
|
|
[23] Char c: 0.34851081398379463 %
|
|
[24] Char ö: 0.24333481717597835 %
|
|
[25] Char y: 0.1287022066200909 %
|
|
[26] Char x: 0.06781084004714467 %
|
|
[27] Char w: 0.04082489349777078 %
|
|
[28] Char q: 0.020989069538401926 %
|
|
[29] Char š: 0.018913227496142396 %
|
|
[30] Char z: 0.017529332801302706 %
|
|
[31] Char ō: 0.010379210211297655 %
|
|
[32] Char ž: 0.009687262863877812 %
|
|
|
|
The first 33 characters have an accumulated ratio of 0.9995410082595447.
|
|
|
|
853 sequences found.
|
|
|
|
First 512 (typical positive ratio): 0.9972721312183132
|
|
Next 512 (512-1024): 9.687262863877811e-05
|
|
Rest: -5.204170427930421e-18
|
|
|
|
- Processing end: 2016-09-26 23:47:54.561846
|