uchardet/script/BuildLangModelLogs/LangEnglishModel.log

252 lines
9.4 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

= Logs of language model for English (en) =
- Generated by BuildLangModel.py
- Started: 2022-12-14 20:20:53.218193
- Maximum depth: 4
- Max number of pages: 200
== Parsed pages ==
Marmot (revision 1116705550)
Barcode of Life Data System (revision 1090221883)
Palmer's chipmunk (revision 1121473732)
Jacopo Ligozzi (revision 1104222073)
Olympic Peninsula (revision 1123430023)
INaturalist (revision 1122751314)
Mammal Species of the World (revision 1127351948)
Berry (revision 1112801626)
Rock squirrel (revision 1121470993)
Natural reservoir (revision 1110806364)
Onomatopoeia (revision 1120663626)
Mohave ground squirrel (revision 1121470764)
Townsend's chipmunk (revision 1121473824)
Madrid (revision 1126851882)
Otospermophilus (revision 1093268410)
Plant hormone (revision 1116921032)
Cuckoo (revision 1126465747)
Daurian ground squirrel (revision 1121469422)
Elwha River (revision 1121691243)
All rights reserved (revision 1125321157)
Long-tailed ground squirrel (revision 1121468895)
CDFG (revision 1122725741)
Don Martin (cartoonist) (revision 1116900902)
Palindromic (revision 1121604941)
EMBnet (revision 1018817077)
Ferdinando II de' Medici, Grand Duke of Tuscany (revision 1125579637)
Cloister (revision 1120569425)
Asymptomatic (revision 1111685734)
Grand Duke (revision 1126227666)
Eucalyptus oil (revision 1123039166)
Seattle (revision 1127044692)
Xerospermophilus (revision 1095542738)
Red-cheeked ground squirrel (revision 1121469468)
Roy Crane (revision 1073477180)
Round-tailed ground squirrel (revision 1121470819)
Asia Minor ground squirrel (revision 1121357197)
Ictidomys parvidens (revision 1121470382)
Hopi chipmunk (revision 1121297258)
Anecdata.org (revision 1099498174)
Himalayan marmot (revision 1113552191)
Storage organ (revision 1087238870)
Phage therapy (revision 1115823876)
Pacific County, Washington (revision 1115141058)
Agostino Carracci (revision 1118965396)
Share-alike (revision 1124025423)
Fragaria chiloensis (revision 1117621684)
Pacific Northwest (revision 1125120564)
Eastern chipmunk (revision 1120765340)
Yakima County, Washington (revision 1117226237)
United States congressional delegations from Washington (revision 1113282930)
Hygiene (revision 1121837793)
Synonym (taxonomy) (revision 1115465643)
Washington's congressional districts (revision 1126665844)
Culling (revision 1124588069)
Citizen scientists (revision 1126971493)
Accademia delle Arti del Disegno (revision 1117591379)
Lacey, Washington (revision 1118158829)
Berberis thunbergii (revision 1098470800)
James Joyce (revision 1127091935)
Interim Register of Marine and Nonmarine Genera (revision 1093112130)
Marker-assisted selection (revision 1101841526)
Blood-borne disease (revision 1104089084)
Research in Computational Molecular Biology (revision 1098389228)
Eggplant (revision 1127383368)
Purr (revision 1125642484)
Blastomycosis (revision 1125999120)
NatureServe (revision 1122446327)
Xerinae (revision 1093432948)
Baja California rock squirrel (revision 1121471079)
Lodgepole chipmunk (revision 1121296771)
Honey (revision 1127398567)
Bouba/kiki effect (revision 1127022127)
Ferdinando I de' Medici, Grand Duke of Tuscany (revision 1125114864)
Medici (revision 1123423946)
Bristlecone pine (revision 1108725770)
Morphology (biology) (revision 1126240066)
Albanian language (revision 1127442244)
Taurus ground squirrel (revision 1121469893)
World Environment Day (revision 1119598477)
The New York Times (revision 1127291077)
Rat Genome Database (revision 1121949622)
Geobotanical prospecting (revision 992549326)
Pre-exposure prophylaxis (revision 1121706582)
Least chipmunk (revision 1120765536)
EcoHealth Alliance (revision 1124297887)
InterPro (revision 1123732177)
Gunnison's prairie dog (revision 1121472300)
EMBOSS (revision 1108898594)
Black-capped marmot (revision 1121471697)
Speckled ground squirrel (revision 1121469813)
National Gallery of Art (revision 1124058120)
Ground squirrel (revision 1106618817)
Texas antelope squirrel (revision 1121470154)
Skamania County, Washington (revision 1115141102)
Zebrafish Information Network (revision 1084187264)
Merriam's chipmunk (revision 1121301344)
Stamen (revision 1107327988)
Plant stem (revision 1125685714)
Uinta chipmunk (revision 1121367930)
Public Lab (revision 1123308321)
Sierra Madre ground squirrel (revision 1121471267)
Scripps Research (revision 1120793534)
Morbillivirus (revision 1123109002)
Conservation status (revision 1126423906)
Korean language (revision 1127097954)
Flatiron Institute (revision 1114126605)
Espíritu Santo antelope squirrel (revision 1121470113)
Pietre dure (revision 1124553077)
List of biological databases (revision 1116920095)
Needle sharing (revision 1066293994)
ISCB Africa ASBCB Conference on Bioinformatics (revision 1003545343)
Northern Idaho ground squirrel (revision 1123076448)
Animal track (revision 1112366053)
HMMER (revision 1090926305)
RERO (identifier) (revision 1068185782)
Catalogue of Life (revision 1118132647)
Francesco I de' Medici, Grand Duke of Tuscany (revision 1123286810)
Whip-poor-will (revision 1120975767)
Doi (identifier) (revision 1127429235)
Wildlife Conservation Society (revision 1125787985)
Panamint chipmunk (revision 1121299808)
Bioblitz (revision 1113263878)
Habitat loss (revision 1117935852)
Sciuromorpha (revision 1107286064)
Yellow-bellied marmot (revision 1121472145)
Allen's chipmunk (revision 1121299548)
Hood Canal (revision 1124856006)
Computer vision (revision 1126383414)
Vibrio cholerae (revision 1123125512)
Phulwara oil (revision 1039287034)
Neah Bay, Washington (revision 1117347476)
Chelan County, Washington (revision 1115437018)
Columbia River (revision 1121152264)
Philippine Genome Center (revision 1086509191)
Thirteen-lined ground squirrel (revision 1127159966)
Cat massage (revision 1120597363)
Swiss French (revision 1126844735)
Probabilistic risk analysis (revision 1118087495)
Kingdom (biology) (revision 1126766133)
Norfloxacin (revision 1126442196)
Tropical ground squirrel (revision 1121471157)
Cannabis culture (revision 1123260879)
Fontarrón (revision 962928722)
Heuristic algorithm (revision 1124780994)
Spotted ground squirrel (revision 1122239672)
Hand washing (revision 1126772691)
Human skin (revision 1125889832)
Slovenia (revision 1127365628)
Australia Bioinformatics Resource (revision 1023592097)
Utah prairie dog (revision 1125084849)
Research center (revision 1122565049)
Australian Wildlife Conservancy (revision 1126004200)
Catholicism (revision 1126878543)
White-tailed prairie dog (revision 1121472368)
Rabbit (revision 1125928365)
Cathedral (revision 1117971650)
Columbia Plateau (revision 1111592488)
Pablo de Olavide University (revision 1100528254)
Plant habit (revision 1101707375)
Anti-fascism (revision 1126769811)
Coral-billed ground-cuckoo (revision 1119603104)
Alpine marmot (revision 1121471662)
Homozygous (revision 1125746174)
COVID-19 vaccination in the Republic of Ireland (revision 1125658338)
Music of North Korea (revision 1109275365)
Eastern Washington (revision 1111432324)
Tarbagan marmot (revision 1121488248)
VIAF (identifier) (revision 1122669300)
Duke of Florence (revision 1010655117)
Accademia della Crusca (revision 1118884925)
Mobile robot (revision 1125548051)
Hyperlocal (revision 1116240164)
Oregon Trail (revision 1124389602)
Cane rat (revision 1089272788)
Federal Way, Washington (revision 1122923555)
Rubens (revision 1121190866)
Pala d'Oro (revision 1072202795)
Archduke Rainer of Austria (18951930) (revision 1081133439)
Bioinformatics (revision 1125974897)
Renal tubular acidosis (revision 1105330876)
Brain morphometry (revision 1053832132)
Ethnologue (revision 1127241433)
OregonLive.com (revision 1114379550)
Yangban (revision 1121415587)
Belize Inlet (revision 982557553)
Canebrake Ecological Reserve (revision 1121247294)
Glycogen (revision 1110998630)
Richardson's ground squirrel (revision 1122297225)
Cluster analysis (revision 1116924542)
Genomics (revision 1126520756)
Spermophilus brevicauda (revision 1010428942)
Endosperm (revision 1112721337)
Relational database (revision 1116718100)
Snow (revision 1126528822)
Roadless area conservation (revision 1103267389)
MinneapolisSaint Paul (revision 1124710168)
== End of Parsed pages ==
- Wikipedia parsing ended at: 2022-12-14 20:24:17.046830
59 characters appeared 2235074 times.
Most Frequent characters:
[ 0] Char e: 11.901753588471792 %
[ 1] Char a: 8.660205433914046 %
[ 2] Char t: 8.534616750944265 %
[ 3] Char i: 7.941079355761822 %
[ 4] Char n: 7.5567520359504865 %
[ 5] Char o: 7.4230651871034254 %
[ 6] Char s: 6.903216627279455 %
[ 7] Char r: 6.589625220462499 %
[ 8] Char l: 4.254847937920624 %
[ 9] Char h: 4.180219536355396 %
[10] Char c: 3.813967680712137 %
[11] Char d: 3.744797711395685 %
[12] Char u: 2.734361367901018 %
[13] Char m: 2.5771853638850435 %
[14] Char p: 2.266099466952772 %
[15] Char f: 2.170576902599198 %
[16] Char g: 1.9969361193410151 %
[17] Char b: 1.540888579080603 %
[18] Char y: 1.515833480233764 %
[19] Char w: 1.324385680295149 %
[20] Char v: 1.0713739231900152 %
[21] Char k: 0.5591761167639192 %
[22] Char x: 0.22384046344774267 %
[23] Char j: 0.18035197044930057 %
[24] Char z: 0.16464779242208535 %
[25] Char q: 0.12464911676302441 %
The first 26 characters have an accumulated ratio of 0.9995445340959629.
The first 5 characters have an accumulated ratio of 0.4459440716504241.
All characters whose order is over 18 have an accumulated ratio of 0.036484250633312364.
972 sequences found.
First 373 (typical positive ratio): 0.9950190506759622
Next 160 (533-373): 0.003986976910237083
Rest: 0.0009939724138007255
- Processing end: 2022-12-14 20:24:17.102402