mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-07 01:06:40 +08:00
I built models for ISO-8859-1, ISO-8859-4, ISO-8859-9, ISO-8859-13, ISO-8859-15 and WINDOWS-1252, which all contain Finnish letters. Nevertheless most texts in these encoding end up the same (same codepoints for the Finnish glyphs) so I keep only tests for ISO-8859-1 and UTF-8. Models for other encoding may still be useful when processing texts with some symbols, etc.
157 lines
4.9 KiB
Plaintext
157 lines
4.9 KiB
Plaintext
= Logs of language model for Finnish (fi) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2016-09-21 18:12:24.181917
|
|
- Maximum depth: 5
|
|
- Max number of pages: 100
|
|
|
|
== Parsed pages ==
|
|
|
|
Yhdistynyt kuningaskunta (revision 15843357)
|
|
1. toukokuuta (revision 15910178)
|
|
1700-luku (revision 15493702)
|
|
1707 (revision 15106709)
|
|
1800-luku (revision 15708929)
|
|
2014 (revision 15891601)
|
|
409 (revision 12809782)
|
|
5. marraskuuta (revision 15421719)
|
|
927 (revision 12785964)
|
|
Aasia (revision 15948161)
|
|
Abhasia (revision 15730328)
|
|
Adolf Hitler (revision 15951829)
|
|
Afrikka (revision 15934209)
|
|
Agatha Christie (revision 15760740)
|
|
Aikavyöhyke (revision 15800313)
|
|
Ajoneuvon kansallisuustunnus (revision 15897445)
|
|
Akrotiri ja Dhekelia (revision 14625383)
|
|
Alamaat (revision 15913741)
|
|
Alan Turing (revision 15904871)
|
|
Alankomaat (revision 15936643)
|
|
Albania (revision 15767604)
|
|
Alec Guinness (revision 15363805)
|
|
Alexander Fleming (revision 15023225)
|
|
Alfred Hitchcock (revision 15892843)
|
|
Alfred Tennyson (revision 15856114)
|
|
Allen Jones (revision 12871703)
|
|
Andorra (revision 15913862)
|
|
Andrew Lloyd Webber (revision 14978349)
|
|
Anglit (revision 15902350)
|
|
Anguilla (revision 15854041)
|
|
Anne Brontë (revision 14287992)
|
|
Anthony Eden (revision 14391831)
|
|
Antigua ja Barbuda (revision 15196967)
|
|
Arabian Lawrence (revision 15736417)
|
|
Argentiina (revision 15676474)
|
|
Armenia (revision 15634470)
|
|
Arthur Conan Doyle (revision 15402837)
|
|
Arts and Crafts (revision 15806930)
|
|
Aurinko (revision 15934252)
|
|
Australia (revision 15934255)
|
|
Avara luonto (revision 15815943)
|
|
Azerbaidžan (revision 15946891)
|
|
BBC (revision 15866026)
|
|
BKT (revision 15656549)
|
|
Bahama (revision 15516869)
|
|
Bangladesh (revision 15883994)
|
|
Bank of England (revision 14481173)
|
|
Barbados (revision 15839821)
|
|
Barbara Hepworth (revision 15106880)
|
|
Bath (revision 15869900)
|
|
Beatrix Potter (revision 15057380)
|
|
Belfast (revision 15715934)
|
|
Belgia (revision 15932391)
|
|
Belize (revision 15665086)
|
|
Ben Nevis (revision 15610196)
|
|
Bengalin kieli (revision 15551820)
|
|
Benjamin Britten (revision 15081615)
|
|
Bermuda (revision 15632621)
|
|
Bertrand Russell (revision 14631969)
|
|
Bhutan (revision 15377394)
|
|
Big Ben (revision 14897401)
|
|
Big Brother (revision 14641391)
|
|
Birmingham (revision 15855259)
|
|
Black Sabbath (revision 15839917)
|
|
Bosnia ja Hertsegovina (revision 15934266)
|
|
Botswana (revision 15524955)
|
|
Bristol (revision 15891889)
|
|
Bristolin kanaali (revision 15849713)
|
|
Bristolin kansainvälinen lentoasema (revision 14452870)
|
|
Britannia (provinssi) (revision 14557442)
|
|
Britannian avoin golfturnaus (revision 14293265)
|
|
Britannian kuninkaallinen perhe (revision 15522149)
|
|
Britannian talous (revision 15470242)
|
|
Britannian väestö (revision 15661241)
|
|
Brittein saaret (revision 15805422)
|
|
Brittiläinen Antarktiksen alue (revision 15836227)
|
|
Brittiläinen Intia (revision 15593126)
|
|
Brittiläinen Intian valtameren alue (revision 14272903)
|
|
Brittiläinen imperiumi (revision 15906600)
|
|
Brittiläinen kansainyhteisö (revision 15894379)
|
|
Brittiläinen keittiö (revision 13393533)
|
|
Brittiläinen kulttuuri (revision 15951407)
|
|
Brittiläiset Neitsytsaaret (revision 15910520)
|
|
Brittiläiset merentakaiset alueet (revision 15836213)
|
|
Brunei (revision 15580824)
|
|
Bruttokansantuote (revision 15656549)
|
|
Bulgaria (revision 15944101)
|
|
Burma (revision 15627218)
|
|
Cambridge (revision 14641664)
|
|
Cambridgen yliopisto (revision 15493340)
|
|
Canterburyn tarinoita (revision 15232140)
|
|
Cardiff (revision 15840398)
|
|
Caymansaaret (revision 15914575)
|
|
Channel 4 (revision 15882475)
|
|
Charles Babbage (revision 15203616)
|
|
Charles Chaplin (revision 15674652)
|
|
Charles Darwin (revision 15894085)
|
|
Charles Dickens (revision 15699592)
|
|
Charles Dickensin joulutarina (revision 15116247)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2016-09-21 18:15:05.189221
|
|
|
|
61 characters appeared 940364 times.
|
|
|
|
First 30 characters:
|
|
[ 0] Char a: 12.508773198463574 %
|
|
[ 1] Char i: 10.969475649854738 %
|
|
[ 2] Char n: 8.815841525196626 %
|
|
[ 3] Char t: 8.80169806585535 %
|
|
[ 4] Char e: 7.8206949649284745 %
|
|
[ 5] Char s: 7.595782058862313 %
|
|
[ 6] Char l: 5.963541777439374 %
|
|
[ 7] Char o: 5.439808414613916 %
|
|
[ 8] Char u: 5.0102938861972595 %
|
|
[ 9] Char k: 4.589712068943515 %
|
|
[10] Char r: 3.1231523112326713 %
|
|
[11] Char ä: 3.041800834570443 %
|
|
[12] Char m: 3.0392486313810396 %
|
|
[13] Char v: 2.156292669647073 %
|
|
[14] Char h: 1.996141919512019 %
|
|
[15] Char j: 1.9248929138078446 %
|
|
[16] Char p: 1.6324529650220552 %
|
|
[17] Char y: 1.6323466232224966 %
|
|
[18] Char d: 1.1981530556252684 %
|
|
[19] Char b: 0.6835650875618378 %
|
|
[20] Char g: 0.5793501239945382 %
|
|
[21] Char c: 0.5056552569005194 %
|
|
[22] Char ö: 0.38931732818355447 %
|
|
[23] Char f: 0.215023118707224 %
|
|
[24] Char w: 0.2106631049253268 %
|
|
[25] Char z: 0.06593191572625068 %
|
|
[26] Char x: 0.024458613898447838 %
|
|
[27] Char š: 0.010421496356729947 %
|
|
[28] Char ž: 0.007869293167326695 %
|
|
[29] Char q: 0.007762951367768225 %
|
|
|
|
The first 30 characters have an accumulated ratio of 0.9996012182516557.
|
|
|
|
919 sequences found.
|
|
|
|
First 512 (typical positive ratio): 0.9985378147555799
|
|
Next 512 (512-1024): 1.0634179955846884e-06
|
|
Rest: 3.881443777498106e-17
|
|
|
|
- Processing end: 2016-09-21 18:15:05.307164
|