uchardet/script/BuildLangModelLogs/LangRomanianModel.log
Jehan eb8308d50a src, script: regenerate all existing language models.
Now making sure that we have a generic language model working with UTF-8
for all 26 supported models which had single-byte encoding support until
now.
2022-12-14 00:23:13 +01:00

156 lines
4.9 KiB
Plaintext

= Logs of language model for Romanian (ro) =
- Generated by BuildLangModel.py
- Started: 2021-03-16 19:59:20.080997
- Maximum depth: 4
- Max number of pages: 100
== Parsed pages ==
The Loving Kind (revision 12020391)
12 ianuarie (revision 13977250)
13 decembrie (revision 13958824)
2007 (revision 13956975)
2008 (revision 13894929)
2009 (revision 13949957)
21 noiembrie (revision 13705857)
25 ianuarie (revision 13882659)
31 ianuarie (revision 13887860)
4 Music (revision 13955370)
Billboard (revision 13092896)
Biology (revision 10112430)
Bulgaria (revision 13779617)
CD (revision 13258410)
Call The Shots (revision 13085752)
Call the Shots (revision 13085752)
Can't Speak French (revision 12018260)
Casă de discuri (revision 10611348)
Channel 4 (revision 13980413)
Chemistry (revision 13003795)
Cheryl Cole (revision 13707613)
Chitară (revision 13704508)
Croația (revision 13662573)
Dance (revision 12713318)
Descărcare digitală (revision 10785925)
Digital Spy (revision 12038314)
Discografia formației Girls Aloud (revision 13332557)
Estonia (revision 13885094)
Europa (revision 13985083)
Fascination Records (revision 9653126)
Gen muzical (revision 13743085)
Girls A Live (revision 10112444)
Girls Aloud (revision 12017377)
Good Morning Television (revision 13079309)
Heat World (revision 12994549)
I'll Stand By You (cântec de Girls Aloud) (revision 10112432)
ITunes (revision 13985408)
I Think We're Alone Now (revision 10112427)
Irlanda (revision 13830248)
Jewels & Stone (revision 8842892)
Jump (cântec de Girls Aloud) (revision 10112438)
Lady GaGa (revision 13982113)
Life Got Cold (revision 10112437)
Limba engleză (revision 13983069)
Long Hot Summer (revision 10112429)
Love Machine (revision 10112433)
MSN Search (revision 13651565)
MTV (revision 12996766)
Mixed Up (revision 10112443)
Muzică electronică (revision 13450013)
Muzică pop (revision 13648051)
Nadine Coyle (revision 10316187)
Neil Tennant (revision 13355922)
No Good Advice (revision 10112436)
Out Of Control (revision 10112484)
Out of Control (revision 10112484)
Pet Shop Boys (revision 13165657)
Poker Face (revision 13083515)
PopJustice (revision 12061987)
Regatul Unit (revision 13957992)
Regatul Unit al Marii Britanii și Irlandei de Nord (revision 13957992)
Regatul Unit al Marii Britanii și al Irlandei de Nord (revision 13957992)
Republica Irlanda (revision 13830248)
Romanian Top 100 (revision 13882522)
România (revision 13906545)
Sarah Harding (revision 10139259)
Sarah Hearding (revision 12017812)
See the Day (revision 10112431)
Sexy! No No No... (revision 12017812)
Slant Magazine (revision 12008416)
Slovenia (revision 13726273)
Something Kinda Ooooh (revision 10112426)
Sound of the Underground (album) (revision 10112476)
Sound of the Underground (cântec) (revision 10112434)
Tangled Up (revision 13010794)
The Guardian (revision 12369330)
The Paul O'Grady Show (revision 12720320)
The Promise (revision 12178852)
The Show (revision 10112441)
The Sound of Girls Aloud (revision 10112480)
Times Online (revision 12014967)
Tonalitate (revision 12509051)
Turneul Out of Control (revision 10112484)
UK Mix (revision 13757304)
UK Singles Chart (revision 10226705)
Ungaria (revision 13960307)
Uniunea Europeană (revision 13689726)
Untouchable (revision 12020867)
Utah Saints (revision 12270967)
Wake Me Up (revision 10112439)
What Will The Neighbours Say? (revision 10112478)
Whole Lotta History (revision 12369785)
Wideboys (revision 12030035)
Wikimedia Commons (revision 13278756)
Xenomania (revision 12020867)
== End of Parsed pages ==
- Wikipedia parsing ended at: 2021-03-16 20:04:01.198792
63 characters appeared 1198090 times.
First 33 characters:
[ 0] Char e: 11.456985702242736 %
[ 1] Char i: 11.0956605931107 %
[ 2] Char a: 10.273852548639919 %
[ 3] Char r: 7.454949127361049 %
[ 4] Char n: 7.243779682661569 %
[ 5] Char t: 6.464122060947007 %
[ 6] Char l: 5.642480948843576 %
[ 7] Char u: 5.4753816491248575 %
[ 8] Char o: 4.928594679865453 %
[ 9] Char c: 4.4603493894448665 %
[10] Char s: 3.768080862038745 %
[11] Char d: 3.7479655117729047 %
[12] Char m: 2.9085461025465533 %
[13] Char p: 2.8108906676460035 %
[14] Char ă: 2.1405737465465866 %
[15] Char g: 1.262509494278393 %
[16] Char f: 1.0879817042125384 %
[17] Char b: 1.0721231293141584 %
[18] Char ț: 1.016534650986153 %
[19] Char ș: 1.0140306654758826 %
[20] Char v: 0.9768882137402032 %
[21] Char î: 0.9654533465766345 %
[22] Char z: 0.7075428390187716 %
[23] Char h: 0.5414451335041608 %
[24] Char â: 0.45664349088966605 %
[25] Char x: 0.22627682394477877 %
[26] Char j: 0.22452403408758942 %
[27] Char k: 0.20132043502574934 %
[28] Char y: 0.16918595431061106 %
[29] Char w: 0.12970644943201262 %
[30] Char á: 0.012937258469730987 %
[31] Char é: 0.012019130449298466 %
[32] Char q: 0.007428490347135858 %
The first 33 characters have an accumulated ratio of 0.9995676451685602.
1066 sequences found.
First 512 (typical positive ratio): 0.9975318123681904
Next 512 (512-1024): 0.01016534650986153
Rest: 4.3355868061878584e-05
- Processing end: 2021-03-16 20:04:01.293047