mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
English detection is still quite crappy so I don't add a unit test yet. Though I believe the detection being bad is mostly because of too much shortcutting we are doing to go "fast". I should probably review this whole part of the logics as well.
182 lines
6.2 KiB
Plaintext
182 lines
6.2 KiB
Plaintext
= Logs of language model for English (en) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2021-03-19 23:26:14.143096
|
|
- Maximum depth: 4
|
|
- Max number of pages: 100
|
|
|
|
== Parsed pages ==
|
|
|
|
Marmot (revision 1000529225)
|
|
Alashan ground squirrel (revision 1010437381)
|
|
Alaska (revision 1012870556)
|
|
Alaska marmot (revision 1010409368)
|
|
Allen's chipmunk (revision 1010890232)
|
|
Alpine chipmunk (revision 1010409470)
|
|
Alpine marmot (revision 1012720679)
|
|
Alps (revision 1007908369)
|
|
Altai Mountains (revision 1006577543)
|
|
Ancient Greece (revision 1012778875)
|
|
Animal (revision 1013060732)
|
|
Animal Diversity Web (revision 996899740)
|
|
Antelope squirrel (revision 1010441265)
|
|
Apennine Mountains (revision 1009656710)
|
|
Arctic ground squirrel (revision 1010409925)
|
|
Asia Minor ground squirrel (revision 1010437585)
|
|
BNF (identifier) (revision 1010501260)
|
|
Baja California rock squirrel (revision 1010410301)
|
|
Barcode of Life Data System (revision 997241036)
|
|
Bat (revision 1012442106)
|
|
Bear (revision 1012937821)
|
|
Belding's ground squirrel (revision 1010410588)
|
|
Bibcode (identifier) (revision 1009103296)
|
|
Black-capped marmot (revision 992988317)
|
|
Black-tailed prairie dog (revision 1010411000)
|
|
Black Hills (revision 1011995885)
|
|
Bobak marmot (revision 1010411082)
|
|
Brokpa (revision 1001820104)
|
|
Brooks Range (revision 1009930357)
|
|
Buller's chipmunk (revision 1010411572)
|
|
California chipmunk (revision 1010411807)
|
|
California ground squirrel (revision 1010411812)
|
|
Callospermophilus (revision 1010416079)
|
|
Carpathian Mountains (revision 1011395807)
|
|
Cascade Range (revision 1011474213)
|
|
Cascade golden-mantled ground squirrel (revision 1010416079)
|
|
Chordate (revision 1008964469)
|
|
Cliff chipmunk (revision 1010412814)
|
|
Colorado chipmunk (revision 1010412919)
|
|
Daurian ground squirrel (revision 1010413422)
|
|
Deosai National Park (revision 1006913741)
|
|
Doi (identifier) (revision 1010427488)
|
|
Durango chipmunk (revision 1010413819)
|
|
EPPO Code (revision 998151320)
|
|
Eastern chipmunk (revision 999177830)
|
|
Encyclopedia of Life (revision 994178741)
|
|
Espíritu Santo antelope squirrel (revision 1010414324)
|
|
Ethnology (revision 1011057083)
|
|
Eulipotyphla (revision 1012652578)
|
|
Eurasian Steppe (revision 1013064344)
|
|
European ground squirrel (revision 1010414381)
|
|
Eutamias (revision 1010406609)
|
|
Extinction (revision 1011028396)
|
|
Fauna Europaea (revision 963073975)
|
|
Flower (revision 1010385350)
|
|
Forest-steppe marmot (revision 1010436539)
|
|
Forrest's rock squirrel (revision 1010437668)
|
|
France (revision 1012524494)
|
|
Franklin's ground squirrel (revision 1010415067)
|
|
French Alps (revision 1006041101)
|
|
GND (identifier) (revision 1010440981)
|
|
Gallo-Romance languages (revision 1012668074)
|
|
Genus (revision 1007184632)
|
|
Global Biodiversity Information Facility (revision 1010489511)
|
|
Gold (revision 1012856700)
|
|
Gold-digging ant (revision 1007959560)
|
|
Golden-mantled ground squirrel (revision 1010416079)
|
|
Gray-collared chipmunk (revision 1010416642)
|
|
Gray-footed chipmunk (revision 1010416658)
|
|
Gray marmot (revision 1010416479)
|
|
Ground squirrel (revision 1010442953)
|
|
Groundhog Day (revision 1012802985)
|
|
Gunnison's prairie dog (revision 1010416998)
|
|
Harris's antelope squirrel (revision 1010417210)
|
|
Herbivore (revision 1006902225)
|
|
Herodotus (revision 1012927818)
|
|
Hibernate (revision 1009048926)
|
|
Hibernation (revision 1009048926)
|
|
Himalayan marmot (revision 1010417424)
|
|
Hoary marmot (revision 1010417525)
|
|
Hopi chipmunk (revision 1010417623)
|
|
INaturalist (revision 1009815294)
|
|
ISBN (identifier) (revision 1009586768)
|
|
Ictidomys (revision 1010406819)
|
|
Ictidomys parvidens (revision 1010426310)
|
|
Integrated Taxonomic Information System (revision 999235988)
|
|
Interim Register of Marine and Nonmarine Genera (revision 995182351)
|
|
JSTOR (identifier) (revision 1011078319)
|
|
Jacopo Ligozzi (revision 1006687935)
|
|
Johann Friedrich Blumenbach (revision 1006564504)
|
|
Kazakhstan (revision 1012748504)
|
|
LCCN (identifier) (revision 1006934344)
|
|
Ladakh (revision 1010799326)
|
|
Latin (revision 1012971392)
|
|
Least chipmunk (revision 1010419221)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2021-03-19 23:29:33.380471
|
|
|
|
59 characters appeared 59 times.
|
|
|
|
Most Frequent characters:
|
|
[ 0] Char m: 1.694915254237288 %
|
|
[ 1] Char a: 1.694915254237288 %
|
|
[ 2] Char r: 1.694915254237288 %
|
|
[ 3] Char o: 1.694915254237288 %
|
|
[ 4] Char t: 1.694915254237288 %
|
|
[ 5] Char s: 1.694915254237288 %
|
|
[ 6] Char e: 1.694915254237288 %
|
|
[ 7] Char l: 1.694915254237288 %
|
|
[ 8] Char i: 1.694915254237288 %
|
|
[ 9] Char v: 1.694915254237288 %
|
|
[10] Char y: 1.694915254237288 %
|
|
[11] Char g: 1.694915254237288 %
|
|
[12] Char u: 1.694915254237288 %
|
|
[13] Char n: 1.694915254237288 %
|
|
[14] Char d: 1.694915254237288 %
|
|
[15] Char q: 1.694915254237288 %
|
|
[16] Char h: 1.694915254237288 %
|
|
[17] Char w: 1.694915254237288 %
|
|
[18] Char p: 1.694915254237288 %
|
|
[19] Char c: 1.694915254237288 %
|
|
[20] Char b: 1.694915254237288 %
|
|
[21] Char f: 1.694915254237288 %
|
|
[22] Char k: 1.694915254237288 %
|
|
[23] Char x: 1.694915254237288 %
|
|
[24] Char z: 1.694915254237288 %
|
|
[25] Char j: 1.694915254237288 %
|
|
[26] Char á: 1.694915254237288 %
|
|
[27] Char ö: 1.694915254237288 %
|
|
[28] Char ä: 1.694915254237288 %
|
|
[29] Char í: 1.694915254237288 %
|
|
[30] Char ç: 1.694915254237288 %
|
|
[31] Char ô: 1.694915254237288 %
|
|
[32] Char à: 1.694915254237288 %
|
|
[33] Char ü: 1.694915254237288 %
|
|
[34] Char æ: 1.694915254237288 %
|
|
[35] Char é: 1.694915254237288 %
|
|
[36] Char ï: 1.694915254237288 %
|
|
[37] Char û: 1.694915254237288 %
|
|
[38] Char ó: 1.694915254237288 %
|
|
[39] Char µ: 1.694915254237288 %
|
|
[40] Char è: 1.694915254237288 %
|
|
[41] Char ì: 1.694915254237288 %
|
|
[42] Char î: 1.694915254237288 %
|
|
[43] Char ë: 1.694915254237288 %
|
|
[44] Char ð: 1.694915254237288 %
|
|
[45] Char ý: 1.694915254237288 %
|
|
[46] Char š: 1.694915254237288 %
|
|
[47] Char ñ: 1.694915254237288 %
|
|
[48] Char œ: 1.694915254237288 %
|
|
[49] Char ê: 1.694915254237288 %
|
|
[50] Char â: 1.694915254237288 %
|
|
[51] Char ø: 1.694915254237288 %
|
|
[52] Char þ: 1.694915254237288 %
|
|
[53] Char å: 1.694915254237288 %
|
|
[54] Char ß: 1.694915254237288 %
|
|
[55] Char ã: 1.694915254237288 %
|
|
[56] Char ž: 1.694915254237288 %
|
|
[57] Char õ: 1.694915254237288 %
|
|
[58] Char ú: 1.694915254237288 %
|
|
|
|
The first 59 characters have an accumulated ratio of 0.9999999999999989.
|
|
|
|
920 sequences found.
|
|
|
|
First 378 (typical positive ratio): 0.9950109024233114
|
|
Next 182 (560-378): 0.003993012537786833
|
|
Rest: 0.000996085038901806
|
|
|
|
- Processing end: 2021-03-19 23:29:33.474226
|