mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
Not sure why we had the Bulgarian support but haven't recently updated it (i.e. never with the model generation script, or so it seems), especially with generic language models, allowing to have UTF-8/Bulgarian support. Maybe I tested it some time ago and it was getting bad results? Anyway now with all the recents updates on the confidence computation, I get very good detection scores. So adding support for UTF-8/Bulgarian and rebuilding other models too. Also adding a test for ISO-8859-5/Bulgarian (we already had support, but no test files). The 2 new test files are text from page 'Мармоти' on Wikipedia in Bulgarian language.
4 lines
247 B
Plaintext
4 lines
247 B
Plaintext
Мармотите (Marmota) са бозайници - род гризачи от семейство катерицови (Sciuridae), включващ 14 вида, включващи групата на лалугерите (Spermophilus citellus).
|
||
|
||
За разлика от родствената катерица, мармотът и лалугерът водят наземен начин на живот.
|