mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
The previous model was most obviously wrong: all letters had the same probability, even non-ASCII ones! Anyway this new model does make unit tests a tiny bit better though the English detection is still weak (I have more concepts which I want to experiment to get this better).
253 lines
9.4 KiB
Plaintext
253 lines
9.4 KiB
Plaintext
= Logs of language model for English (en) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2022-12-03 20:28:44.618364
|
|
- Maximum depth: 2
|
|
- Max number of pages: 200
|
|
|
|
== Parsed pages ==
|
|
|
|
Marmot (revision 1116705550)
|
|
Hibernate (revision 1115607389)
|
|
JSTOR (identifier) (revision 1122926070)
|
|
Thirteen-lined ground squirrel (revision 1124658433)
|
|
French Alps (revision 1117472036)
|
|
INaturalist (revision 1122751314)
|
|
Texas antelope squirrel (revision 1121470154)
|
|
Himalayas (revision 1124238550)
|
|
Vancouver Island marmot (revision 1121598871)
|
|
Mount Rainier National Park (revision 1120235066)
|
|
Olympic marmot (revision 1121472039)
|
|
Root (revision 1117256593)
|
|
Durango chipmunk (revision 1121473683)
|
|
France (revision 1125268533)
|
|
Sciuromorpha (revision 1107286064)
|
|
Alps (revision 1124362400)
|
|
Yellow-cheeked chipmunk (revision 1121299976)
|
|
Washington ground squirrel (revision 1121468941)
|
|
Hopi chipmunk (revision 1121297258)
|
|
Mexican prairie dog (revision 1121472442)
|
|
Antelope squirrel (revision 1089053714)
|
|
Deosai National Park (revision 1125376855)
|
|
Eutamias (revision 1010406609)
|
|
Eastern chipmunk (revision 1120765340)
|
|
Golden-mantled ground squirrel (revision 1121777526)
|
|
Tuolumne Meadows (revision 1094508214)
|
|
Cascade Range (revision 1114533492)
|
|
Mammal Species of the World (revision 1093112025)
|
|
Franklin's ground squirrel (revision 1121361872)
|
|
Ladakh (revision 1124124745)
|
|
Groundhog (revision 1117813429)
|
|
Natural reservoir (revision 1110806364)
|
|
Neotamias (revision 1117512650)
|
|
Yosemite National Park (revision 1125019703)
|
|
Ontario (revision 1125244433)
|
|
Russet ground squirrel (revision 1121469545)
|
|
Bat (revision 1125180714)
|
|
Wayback Machine (revision 1125067302)
|
|
Long-eared chipmunk (revision 1121298477)
|
|
Southern Idaho ground squirrel (revision 1121468339)
|
|
Moss (revision 1122019251)
|
|
Altai Mountains (revision 1124752508)
|
|
Townsend's ground squirrel (revision 1121468829)
|
|
Richardson's ground squirrel (revision 1122297225)
|
|
Utah prairie dog (revision 1125084849)
|
|
Yersinia pestis (revision 1121719480)
|
|
European ground squirrel (revision 1121469378)
|
|
Spermophilus relictus (revision 1121469745)
|
|
Least chipmunk (revision 1120765536)
|
|
Panamint chipmunk (revision 1121299808)
|
|
Catalogue of Life (revision 1118132647)
|
|
Gray marmot (revision 1122462225)
|
|
Columbian ground squirrel (revision 1124139650)
|
|
Alberni-Clayoquot Regional District (revision 1109499216)
|
|
La Tania (revision 1115267378)
|
|
Populus tremuloides (revision 1120966005)
|
|
Paradise River Waterfalls (revision 1054159583)
|
|
Long-tongued nectar bat (revision 1123039710)
|
|
Happy Isles (revision 1113517959)
|
|
Tourism in France (revision 1120671901)
|
|
Otospermophilus (revision 1093268410)
|
|
History of Canada (revision 1123782373)
|
|
California chipmunk (revision 1121299691)
|
|
Mexican ground squirrel (revision 1121470340)
|
|
White-tailed antelope squirrel (revision 1121470211)
|
|
Sedentism (revision 1110063134)
|
|
Terabyte (revision 1123174616)
|
|
Tamias (revision 1121473202)
|
|
RECAP US Federal Court Documents (collection) (revision 1122929164)
|
|
Belding's ground squirrel (revision 1121468288)
|
|
Cannibalism (revision 1125092745)
|
|
Yellow-pine chipmunk (revision 1121473478)
|
|
Monoclonal antibody therapy (revision 1114372687)
|
|
Menzbier's marmot (revision 1121471953)
|
|
Black-footed ferret (revision 1123500226)
|
|
Floods in Bihar (revision 1119748410)
|
|
Mammal (revision 1124779293)
|
|
Alaska marmot (revision 1124026979)
|
|
Sierra Madre ground squirrel (revision 1121471267)
|
|
Computer security (revision 1125370428)
|
|
Kedarnath Temple (revision 1122647471)
|
|
Frog Creek Cabin (revision 1048164755)
|
|
Outline of botany (revision 1100540741)
|
|
Agriculture in Nepal (revision 1088978356)
|
|
Plant evolution (revision 1116709561)
|
|
Little ground squirrel (revision 1121469707)
|
|
Dicranales (revision 1110407415)
|
|
Ultrasound (revision 1117397225)
|
|
White-tailed prairie dog (revision 1121472368)
|
|
Espíritu Santo antelope squirrel (revision 1121470113)
|
|
Brown County, Wisconsin (revision 1122831345)
|
|
Timeline of audio formats (revision 1120236679)
|
|
List of mountain peaks of Uttarakhand (revision 1121014571)
|
|
Antiviral drug (revision 1118217791)
|
|
California ground squirrel (revision 1121359049)
|
|
Red-tailed chipmunk (revision 1121297616)
|
|
Bobak marmot (revision 1121471769)
|
|
National Register of Historic Places listings in the Northern Mariana Islands (revision 1115478435)
|
|
Spermophilus pallidicauda (revision 1121469669)
|
|
Yellow-bellied marmot (revision 1121472145)
|
|
Sexually transmitted infection (revision 1122774900)
|
|
List of Yosemite destinations (revision 1119350249)
|
|
Baitarani River (revision 1118320499)
|
|
Baja California rock squirrel (revision 1121471079)
|
|
Years of Lead (Italy) (revision 1123769084)
|
|
Snow leopard (revision 1122462489)
|
|
Coyote (revision 1125069820)
|
|
Villard-Reculas (revision 1077275360)
|
|
Vancouver Island (revision 1121908258)
|
|
Sciurotamias (revision 1120570732)
|
|
Canada 2021 Census (revision 1114664828)
|
|
Time in Canada (revision 1120998431)
|
|
Forrest's rock squirrel (revision 1121471379)
|
|
Via Lattea (revision 1110201667)
|
|
Phylogenetic tree (revision 1117394267)
|
|
Hibernation (revision 1115607389)
|
|
Altai wapiti (revision 1111750851)
|
|
Alpine chipmunk (revision 1121473423)
|
|
Schist (revision 1116202480)
|
|
Rodent (revision 1123634696)
|
|
Nepalese literature (revision 1117603265)
|
|
Unification of Nepal (revision 1125350055)
|
|
CBC News (revision 1124984918)
|
|
Harris's antelope squirrel (revision 1121470079)
|
|
Alpine meadow (revision 1114658726)
|
|
Himalayan marmot (revision 1113552191)
|
|
Merriam's ground squirrel (revision 1121468396)
|
|
Heliscomyidae (revision 1010405407)
|
|
Siberian chipmunk (revision 1121472776)
|
|
1980 eruption of Mount St. Helens (revision 1123425632)
|
|
Tarbagan marmot (revision 1121488248)
|
|
Uinta chipmunk (revision 1121367930)
|
|
Asia Minor ground squirrel (revision 1121357197)
|
|
San Bernardino National Forest (revision 1113614977)
|
|
British Columbia (revision 1124903693)
|
|
List of Web archiving initiatives (revision 1120507741)
|
|
2011 Kashgar attacks (revision 1124413350)
|
|
Genus (revision 1125331312)
|
|
IUCN Red List (revision 1123293379)
|
|
Attack rate (revision 1118026995)
|
|
Atlas of Living Australia (revision 1069034125)
|
|
Riparian zone (revision 1100819694)
|
|
Natural History Museum of Los Angeles County (revision 1118638991)
|
|
Flying squirrel typhus (revision 1108887986)
|
|
New Scientist (revision 1121186695)
|
|
Sonoma chipmunk (revision 1121298317)
|
|
Basic reproduction number (revision 1122698892)
|
|
Homeothermic (revision 1082125124)
|
|
Library Genesis (revision 1123879366)
|
|
Ecological succession (revision 1116584234)
|
|
Taurus ground squirrel (revision 1121469893)
|
|
Edmund Jaeger (revision 1042985886)
|
|
Wolverine (revision 1123904337)
|
|
Puget Sound (revision 1124438931)
|
|
List of highest points of European countries (revision 1125124917)
|
|
Amburiq Mosque (revision 1101963105)
|
|
Mohave ground squirrel (revision 1121470764)
|
|
Kali Gandaki Gorge (revision 1091465924)
|
|
Palmer's chipmunk (revision 1121473732)
|
|
Citizen Science Association (revision 1076637865)
|
|
Alpha male (revision 1123599649)
|
|
Thermotogota (revision 1108216914)
|
|
Gray-footed chipmunk (revision 1121473564)
|
|
ISSN (identifier) (revision 1117323780)
|
|
The Daily Excelsior (revision 1073376573)
|
|
National Center for Biotechnology Information (revision 1117911694)
|
|
Haridwar (revision 1124587996)
|
|
Ground squirrel (revision 1106618817)
|
|
ISBN (identifier) (revision 1124259962)
|
|
Breton language (revision 1123193740)
|
|
Notocitellus (revision 1092528025)
|
|
Wayback Machine (Peabody's Improbable History) (revision 1125111405)
|
|
Social animal (revision 1118899517)
|
|
Conservation status (revision 1124721586)
|
|
Doi (identifier) (revision 1121872952)
|
|
Drop (liquid) (revision 1115117361)
|
|
Monogamy in animals (revision 1115061008)
|
|
Grand Slam (tennis) (revision 1125138113)
|
|
Synonym (taxonomy) (revision 1115465643)
|
|
Encyclopedia of Life (revision 1123215390)
|
|
Algonquian languages (revision 1118973728)
|
|
Circulatory system (revision 1123361226)
|
|
Kenneth Oppel (revision 1115838353)
|
|
Red-cheeked ground squirrel (revision 1121469468)
|
|
Prairie dog (revision 1125350300)
|
|
Zygomasseteric system (revision 1093682242)
|
|
Black-tailed prairie dog (revision 1120101763)
|
|
Scenic Beach State Park (revision 1085870429)
|
|
Fashion capital (revision 1122240170)
|
|
Herbivory (revision 1124405692)
|
|
Artemisia tridentata (revision 1097902309)
|
|
ARKive (revision 1028182358)
|
|
Emblem of Uttarakhand (revision 1085229611)
|
|
Northern Italy (revision 1122409316)
|
|
Bibcode (identifier) (revision 1119780351)
|
|
Squirrel (revision 1121741651)
|
|
Birch Bay State Park (revision 1068937174)
|
|
Whistling (revision 1124843854)
|
|
Gobiomyidae (revision 1090208761)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2022-12-03 20:32:27.933336
|
|
|
|
58 characters appeared 2027474 times.
|
|
|
|
Most Frequent characters:
|
|
[ 0] Char e: 11.847648847778073 %
|
|
[ 1] Char a: 8.861519309248848 %
|
|
[ 2] Char t: 8.523956410785045 %
|
|
[ 3] Char i: 7.880199696765532 %
|
|
[ 4] Char n: 7.477629799445023 %
|
|
[ 5] Char o: 7.206405606187798 %
|
|
[ 6] Char s: 6.8668698094278895 %
|
|
[ 7] Char r: 6.763489938711914 %
|
|
[ 8] Char l: 4.301066252884131 %
|
|
[ 9] Char h: 4.232754649381447 %
|
|
[10] Char d: 3.7247333381340524 %
|
|
[11] Char c: 3.556839693135399 %
|
|
[12] Char u: 2.763981190387645 %
|
|
[13] Char m: 2.7244739020081146 %
|
|
[14] Char p: 2.17398595493703 %
|
|
[15] Char f: 2.1424195821993277 %
|
|
[16] Char g: 2.0356364619225698 %
|
|
[17] Char b: 1.575457934355755 %
|
|
[18] Char y: 1.572005362337569 %
|
|
[19] Char w: 1.3260835897279077 %
|
|
[20] Char v: 1.1594230061643207 %
|
|
[21] Char k: 0.6102667654431081 %
|
|
[22] Char x: 0.2356133790125052 %
|
|
[23] Char z: 0.13746168878121248 %
|
|
[24] Char j: 0.1346503087092609 %
|
|
[25] Char q: 0.1320855409243226 %
|
|
|
|
The first 26 characters have an accumulated ratio of 0.9996665801879581.
|
|
|
|
863 sequences found.
|
|
|
|
First 369 (typical positive ratio): 0.9950424985513596
|
|
Next 125 (494-369): 0.003963798368833871
|
|
Rest: 0.0009937030798065072
|
|
|
|
- Processing end: 2022-12-03 20:32:28.010953
|