mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-07 01:06:40 +08:00
Adding generic language model (see coming commit), which uses the same data as specific single-byte encoding statistics model, except that it applies it to unicode code points. For this to work, instead of the CharToOrderMap which was mapping directly from encoded byte (always 256 values) to order, now we add an array of frequent characters, ordered by generic unicode code points to the order of frequency (which can be used on the same sequence mapping array). This of course means that each prober where we will want to use these generic models will have to implement their own byte to code point decoder, as this is per-encoding logics anyway. This will come in a subsequent commit. |
||
|---|---|---|
| .. | ||
| LangArabicModel.log | ||
| LangCroatianModel.log | ||
| LangCzechModel.log | ||
| LangDanishModel.log | ||
| LangEsperantoModel.log | ||
| LangEstonianModel.log | ||
| LangFinnishModel.log | ||
| LangFrenchModel.log | ||
| LangGermanModel.log | ||
| LangGreekModel.log | ||
| LangHungarianModel.log | ||
| LangIrishModel.log | ||
| LangItalianModel.log | ||
| LangLatvianModel.log | ||
| LangLithuanianModel.log | ||
| LangMalteseModel.log | ||
| LangPolishModel.log | ||
| LangPortugueseModel.log | ||
| LangRomanianModel.log | ||
| LangSlovakModel.log | ||
| LangSloveneModel.log | ||
| LangSpanishModel.log | ||
| LangSwedishModel.log | ||
| LangThaiModel.log | ||
| LangTurkishModel.log | ||
| LangVietnameseModel.log | ||