uchardet/test/bg/iso-8859-5.txt
Jehan ffb94e4a9d script, src, test: Bulgarian language models added.
Not sure why we had the Bulgarian support but haven't recently updated
it (i.e. never with the model generation script, or so it seems),
especially with generic language models, allowing to have
UTF-8/Bulgarian support. Maybe I tested it some time ago and it was
getting bad results? Anyway now with all the recents updates on the
confidence computation, I get very good detection scores.

So adding support for UTF-8/Bulgarian and rebuilding other models too.

Also adding a test for ISO-8859-5/Bulgarian (we already had support, but
no test files).

The 2 new test files are text from page 'Мармоти' on Wikipedia in
Bulgarian language.
2022-12-17 18:41:00 +01:00

4 lines
247 B
Plaintext
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

Мармотите (Marmota) са бозайници - род гризачи от семейство катерицови (Sciuridae), включващ 14 вида, включващи групата на лалугерите (Spermophilus citellus).
За разлика от родствената катерица, мармотът и лалугерът водят наземен начин на живот.