mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
As I just rebased my branch about new language detection API, I needed to re-generate Norwegian language models. Unfortunately it doesn't detect UTF-8 Norwegian text, though not far off (it detects it as second candidate with high 91% confidence; beaten by Danish UTF-8 with 94% confidence unfortunately!). Note that I also update the alphabet list for Norwegian as there were too many letters in there (according to Wikipedia at least), so even when training a model, we had some missing characters in the training set.
235 lines
7.3 KiB
Plaintext
235 lines
7.3 KiB
Plaintext
= Logs of language model for Norwegian (no) =
|
||
|
||
- Generated by BuildLangModel.py
|
||
- Started: 2022-11-30 20:26:27.916571
|
||
- Maximum depth: 2
|
||
- Max number of pages: 200
|
||
|
||
== Parsed pages ==
|
||
|
||
Norsk (revision 22974717)
|
||
Saft (revision 22967608)
|
||
Hund (revision 23005187)
|
||
Valg i Norge (revision 22782362)
|
||
Asia (revision 23117912)
|
||
Saarloos wolfhond (revision 22789727)
|
||
Østfold (revision 23055508)
|
||
Fårehunder (revision 22264555)
|
||
Stripesjakal (revision 18745363)
|
||
12. mai (revision 23118103)
|
||
Gullsjakal (revision 23104601)
|
||
Urhund (revision 23050226)
|
||
E (revision 22904440)
|
||
Luxembourgsk (revision 22813155)
|
||
Obstruent (revision 15267134)
|
||
Gudbrandsdalen (revision 23014277)
|
||
Norges berggrunn (revision 21768509)
|
||
Riksforsamlingen (revision 22999081)
|
||
Sosiolekt (revision 21458982)
|
||
Habitat (revision 23123646)
|
||
Norsk språkhistorie (20. århundre) (revision 22891154)
|
||
Søsterart (revision 20748512)
|
||
Halvdan Koht (revision 22303367)
|
||
Plosiver (revision 21816753)
|
||
Svorsk (revision 20789512)
|
||
Skandinavia (revision 22814296)
|
||
Partisipp (revision 22785842)
|
||
H (revision 23086416)
|
||
Kreft (revision 23050449)
|
||
Kreft hos hunder (revision 21811805)
|
||
Q (revision 23024714)
|
||
Fédération Cynologique Internationale (revision 22172054)
|
||
Rosin (revision 22818749)
|
||
Tribus (biologi) (revision 21339936)
|
||
Siste istids maksimum (revision 23141296)
|
||
Laurents Hallager (revision 22655416)
|
||
Canider (revision 22229857)
|
||
Individ (revision 20992252)
|
||
Stortingsvalg 1945– (revision 22861299)
|
||
Svalbards geologi (revision 22935346)
|
||
Riksmålsvernet (revision 22966421)
|
||
Magedreining (hund) (revision 21661370)
|
||
Stortinget (revision 23071662)
|
||
Bokmål (revision 22928969)
|
||
Recessiv (revision 21780786)
|
||
Synkopetida (revision 22906353)
|
||
Artskompleks (revision 20848344)
|
||
Homogenitet (revision 22857280)
|
||
Pyometra (hund) (revision 22374115)
|
||
Den norske språkstriden (revision 22428585)
|
||
Gruppe (biologi) (revision 21969525)
|
||
Stående fuglehunder (revision 22264516)
|
||
Samnorsk (revision 22785915)
|
||
Fastlands-Norge (revision 23141642)
|
||
Drivende hunder (revision 22264618)
|
||
Sibir (revision 22369404)
|
||
Norges demografi (revision 23034159)
|
||
FCI (revision 22172054)
|
||
Vannhunder (revision 22264145)
|
||
Prednisolon (revision 21804718)
|
||
Midtvesten (revision 22423559)
|
||
Buskerud (revision 22915767)
|
||
Sogn og Fjordane (revision 22811825)
|
||
Transport i Norge (revision 23131810)
|
||
Ustemt palatal frikativ (revision 19011330)
|
||
Anatolsk gjeterhund (revision 22303224)
|
||
Norges fylker (revision 23129287)
|
||
Tonelag (revision 22751959)
|
||
Statsforvalter (revision 23133685)
|
||
Sjokolade (revision 22988920)
|
||
Nasaler (revision 16002502)
|
||
Hundens pels (revision 22900550)
|
||
Approksimanter (revision 16000119)
|
||
Tapper (revision 18322970)
|
||
Vakt- og vokterhunder (revision 23091054)
|
||
Saluki (revision 22267261)
|
||
Canis (revision 23079627)
|
||
Island (revision 23097723)
|
||
Flyball (revision 20457011)
|
||
Staffordshire bull terrier (revision 23135078)
|
||
Stockholm (revision 22770528)
|
||
Sahel (revision 19821400)
|
||
ISO 639-3 (revision 18859824)
|
||
Ny-guinea villhund (revision 22567866)
|
||
Rabies (revision 19440055)
|
||
Ordbog over det norske Folkesprog (revision 23096800)
|
||
Norge (revision 23141642)
|
||
Flåttbårne sykdommer (hund) (revision 21355504)
|
||
Bombehund (revision 22942055)
|
||
Læreboknormalen av 1959 (revision 18841941)
|
||
Tromøy (revision 22053767)
|
||
Vorstehhund korthåret (revision 22264532)
|
||
Tåkeskog (revision 20461967)
|
||
Vest-Telemark (revision 22923647)
|
||
Oslo (revision 23118371)
|
||
Tyrkia (revision 23034073)
|
||
Liste over Norges største tettsteder (revision 23138252)
|
||
Energi (revision 22979461)
|
||
Jakt med hund (revision 22890790)
|
||
Sogn fogderi (revision 22425444)
|
||
Integrated Taxonomic Information System (revision 20457376)
|
||
Tadsjikistan (revision 22864814)
|
||
Befolkningstetthet (revision 22253839)
|
||
Tøddel (revision 21641445)
|
||
Den lille istid (revision 22782643)
|
||
Norsk språkhistorie (1400–1800) (revision 21342667)
|
||
Unionen mellom Sverige og Norge (revision 22922743)
|
||
Fylkeskommune (revision 22011606)
|
||
ĸ (revision 17096887)
|
||
Degas (revision 22751270)
|
||
Gløgg (revision 22902469)
|
||
Antistoff (revision 20746889)
|
||
Norges statsminister (revision 22948566)
|
||
Lørdag (revision 23031303)
|
||
Ş (revision 12094187)
|
||
Hallingdal (revision 22811584)
|
||
1969 (revision 22958238)
|
||
Juli (revision 22359558)
|
||
Shar pei (revision 22891357)
|
||
Dyr (revision 23101991)
|
||
Ƙ (revision 15223100)
|
||
PhyloCode (revision 22857413)
|
||
Y-kromosom (revision 22783781)
|
||
Høst (revision 23087627)
|
||
Geit (revision 21989005)
|
||
Guatemala (revision 22780680)
|
||
USA (revision 22781448)
|
||
Tamhund (revision 23005187)
|
||
Populasjonsdynamikk (revision 20640003)
|
||
Christoffer Oftedahl (revision 19783269)
|
||
Mellomnorsk (revision 22546096)
|
||
1000 (revision 20456192)
|
||
Servicehund (revision 22337757)
|
||
Himalayaulv (revision 21791662)
|
||
Ø (bokstav) (revision 22617366)
|
||
Ǩ (revision 15223173)
|
||
Bordeaux dogge (revision 22266230)
|
||
Frøplanter (revision 21763501)
|
||
Ustemt bilabial plosiv (revision 22354758)
|
||
Digraf (revision 19954081)
|
||
12. århundre (revision 23123540)
|
||
Sametingsvalget 1993 (revision 21890290)
|
||
Førerhund (revision 20465384)
|
||
Grenada (revision 22948831)
|
||
Aserbajdsjans administrative inndeling (revision 22782483)
|
||
Verneområder i Norge (revision 22076171)
|
||
Pelsdyroppdrett (revision 22827568)
|
||
Kretahund (revision 22201230)
|
||
Etne (revision 22659600)
|
||
Koreansk chejudo (revision 22199018)
|
||
Riesenschnauzer (revision 23103775)
|
||
Italias regioner (revision 22182270)
|
||
Dingo (revision 23050226)
|
||
Firfisle (revision 21650282)
|
||
Dominans (revision 21160764)
|
||
CITES (revision 22637082)
|
||
Helligdager i Norge (revision 22095322)
|
||
Bunad (revision 23086915)
|
||
Barnekreftforeningen (revision 19888945)
|
||
Guttorm Hansen (revision 22098933)
|
||
Albania (revision 22939774)
|
||
Medier i Norge (revision 21776331)
|
||
Finsk (revision 22908244)
|
||
Anders Lysgaard (revision 22858529)
|
||
Bakverk (revision 15226081)
|
||
Ć (revision 15785421)
|
||
Vatikanstaten (revision 22782366)
|
||
Steinalderen i Norge (revision 23106147)
|
||
Johnny Depp (revision 22764203)
|
||
Sverre Steen (revision 22112509)
|
||
Fjellrev (revision 22812483)
|
||
Bayersk viltsporhund (revision 22805751)
|
||
Ń (revision 15222385)
|
||
Utdannelse i Norge (revision 22814897)
|
||
Espen Berntsen (revision 21025561)
|
||
Nederland (revision 23024484)
|
||
Liste over hundegrupper (revision 18570830)
|
||
|
||
== End of Parsed pages ==
|
||
|
||
- Wikipedia parsing ended at: 2022-11-30 20:29:27.551046
|
||
|
||
62 characters appeared 1228749 times.
|
||
|
||
Most Frequent characters:
|
||
[ 0] Char e: 15.049208585317261 %
|
||
[ 1] Char r: 8.84924423132796 %
|
||
[ 2] Char n: 8.422550089562636 %
|
||
[ 3] Char t: 7.726394894319344 %
|
||
[ 4] Char s: 6.64798099530498 %
|
||
[ 5] Char a: 6.28020856985438 %
|
||
[ 6] Char i: 5.99455218274847 %
|
||
[ 7] Char l: 5.422262805503809 %
|
||
[ 8] Char o: 5.386942329149403 %
|
||
[ 9] Char d: 4.534774799409806 %
|
||
[10] Char g: 3.86091870674971 %
|
||
[11] Char k: 3.6487516978650643 %
|
||
[12] Char m: 3.216197937902696 %
|
||
[13] Char v: 2.4669806445417253 %
|
||
[14] Char f: 2.0122091655822305 %
|
||
[15] Char u: 1.8136332155712844 %
|
||
[16] Char p: 1.6869189720602011 %
|
||
[17] Char b: 1.4243755233981878 %
|
||
[18] Char h: 1.3665117937023752 %
|
||
[19] Char å: 1.1134902246105591 %
|
||
[20] Char y: 0.8473658981614633 %
|
||
[21] Char ø: 0.792431977564173 %
|
||
[22] Char j: 0.7630525029928814 %
|
||
[23] Char c: 0.2926553755079353 %
|
||
[24] Char æ: 0.20012223814627725 %
|
||
[25] Char w: 0.05932863424507365 %
|
||
[26] Char z: 0.028565638710591017 %
|
||
[27] Char x: 0.023194322029967063 %
|
||
[28] Char é: 0.017171936660782636 %
|
||
[29] Char q: 0.009521879570197005 %
|
||
|
||
The first 30 characters have an accumulated ratio of 0.9995751776807141.
|
||
|
||
967 sequences found.
|
||
|
||
First 442 (typical positive ratio): 0.9950425176429516
|
||
Next 157 (599-442): 0.0039580060347621515
|
||
Rest: 0.0009994763222862524
|
||
|
||
- Processing end: 2022-11-30 20:29:27.623923
|