mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-07 01:06:40 +08:00
The Hebrew Model had never been regenerated by my scripts. I now added the base generation files. Note that I added 2 charsets: ISO-8859-8 and WINDOWS-1255 but they are nearly identical. One of the difference is that the generic currency sign is replaced by the sheqel sign (Israel currency) in Windows-1255. And though this one lost the "double low line", apparently some Yiddish characters were added. Basically it looks like most Hebrew text would work fine with the same confidence on both charsets and detecting both is likely irrelevant. So I keep the charset file for ISO-8859-8, but won't actually use it. The good part is now that Hebrew is also recognized in UTF-8 text thanks to the new code and newly generated language model. |
||
|---|---|---|
| .. | ||
| LangArabicModel.log | ||
| LangCroatianModel.log | ||
| LangCzechModel.log | ||
| LangDanishModel.log | ||
| LangEsperantoModel.log | ||
| LangEstonianModel.log | ||
| LangFinnishModel.log | ||
| LangFrenchModel.log | ||
| LangGermanModel.log | ||
| LangGreekModel.log | ||
| LangHebrewModel.log | ||
| LangHungarianModel.log | ||
| LangIrishModel.log | ||
| LangItalianModel.log | ||
| LangLatvianModel.log | ||
| LangLithuanianModel.log | ||
| LangMalteseModel.log | ||
| LangPolishModel.log | ||
| LangPortugueseModel.log | ||
| LangRomanianModel.log | ||
| LangSlovakModel.log | ||
| LangSloveneModel.log | ||
| LangSpanishModel.log | ||
| LangSwedishModel.log | ||
| LangThaiModel.log | ||
| LangTurkishModel.log | ||
| LangVietnameseModel.log | ||