mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-08 01:36:41 +08:00
I was planning on adding VISCII support as well, but Python encode() method does not have any support for it apparently, so I cannot generate the proper statistics data with the current version of the string.
100 lines
2.9 KiB
Plaintext
100 lines
2.9 KiB
Plaintext
= Logs of language model for Vietnamese (vi) =
|
|
|
|
- Generated by BuildLangModel.py
|
|
- Started: 2016-02-13 02:13:44.503931
|
|
- Maximum depth: 3
|
|
- Max number of pages: 40
|
|
|
|
== Parsed pages ==
|
|
|
|
Chữ_Quốc_ngữ (revision 22887853)
|
|
1651 (revision 21455247)
|
|
1773 (revision 21354755)
|
|
1815 (revision 21361292)
|
|
1838 (revision 21361314)
|
|
1865 (revision 21361338)
|
|
1869 (revision 21361342)
|
|
1888 (revision 21389506)
|
|
1902 (revision 21354811)
|
|
1918 (revision 21354828)
|
|
1919 (revision 21354829)
|
|
1938 (revision 21354849)
|
|
1945 (revision 21354857)
|
|
22 tháng 2 (revision 21376086)
|
|
26 tháng 11 (revision 22579845)
|
|
28 tháng 12 (revision 22475308)
|
|
A (revision 22549334)
|
|
ASCII (revision 22528409)
|
|
Alexandre de Rhodes (revision 22859954)
|
|
Antonio Barbosa (revision 22145269)
|
|
B (revision 22836557)
|
|
BBC (revision 22863903)
|
|
Biên khảo (revision 22531516)
|
|
Bán nguyên âm (revision 22655600)
|
|
Bình luận (revision 22117664)
|
|
Bảng chữ cái Bồ Đào Nha (revision 22887853)
|
|
Bảng chữ cái Hy Lạp (revision 21362081)
|
|
Bảng chữ cái Latinh (revision 22442448)
|
|
Bắc Kỳ (revision 22393289)
|
|
Bồ Đào Nha (revision 22620858)
|
|
C (revision 21341881)
|
|
Cao Xuân Dục (revision 22620201)
|
|
Chính tả (revision 22187359)
|
|
Chính tả tiếng Việt (revision 20897580)
|
|
Chữ Hán (revision 22889609)
|
|
Chữ Nôm (revision 22781506)
|
|
Chữ cái (revision 22169220)
|
|
Công giáo (revision 22173119)
|
|
D (revision 21447691)
|
|
|
|
== End of Parsed pages ==
|
|
|
|
- Wikipedia parsing ended at: 2016-02-13 02:16:03.731928
|
|
|
|
49 characters appeared 190798 times.
|
|
|
|
First 33 characters:
|
|
[ 0] Char n: 13.15212947724819 %
|
|
[ 1] Char h: 10.371702009455026 %
|
|
[ 2] Char t: 8.20134382959989 %
|
|
[ 3] Char c: 7.433516074591977 %
|
|
[ 4] Char i: 7.238545477415906 %
|
|
[ 5] Char g: 6.529418547364228 %
|
|
[ 6] Char a: 4.203922472981897 %
|
|
[ 7] Char u: 3.328127129215191 %
|
|
[ 8] Char m: 3.0540152412499086 %
|
|
[ 9] Char o: 3.037767691485236 %
|
|
[10] Char đ: 2.5948909317707733 %
|
|
[11] Char r: 2.4643864191448546 %
|
|
[12] Char à: 2.3878657008983324 %
|
|
[13] Char v: 2.269939936477322 %
|
|
[14] Char l: 2.2327278063711358 %
|
|
[15] Char á: 2.0482394993658217 %
|
|
[16] Char p: 1.9214037882996675 %
|
|
[17] Char b: 1.7998092223188922 %
|
|
[18] Char ư: 1.6813593433893437 %
|
|
[19] Char s: 1.6069350831769726 %
|
|
[20] Char y: 1.4952986928584158 %
|
|
[21] Char e: 1.4544177611924654 %
|
|
[22] Char d: 1.3139550729043281 %
|
|
[23] Char k: 1.2489648738456378 %
|
|
[24] Char â: 1.1278944223734 %
|
|
[25] Char ê: 0.977997672931582 %
|
|
[26] Char ô: 0.8260044654556128 %
|
|
[27] Char ó: 0.7091269300516777 %
|
|
[28] Char q: 0.60011111227581 %
|
|
[29] Char ơ: 0.4192916068302603 %
|
|
[30] Char í: 0.4166710342875712 %
|
|
[31] Char ă: 0.37998301868992335 %
|
|
[32] Char x: 0.34329500309227556 %
|
|
|
|
The first 33 characters have an accumulated ratio of 0.9887105734860954.
|
|
|
|
852 sequences found.
|
|
|
|
First 512 (typical positive ratio): 0.990048941203513
|
|
Next 512 (512-1024): 1.0482290170756506e-05
|
|
Rest: -1.5612511283791264e-17
|
|
|
|
- Processing end: 2016-02-13 02:16:03.877897
|