uchardet/ibm852.txt at d40e5868d5ec1f08f1e6e0d25e04dae68c586ba1 - uchardet - CoffeeCat

coffee/uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-13 15:10:06 +08:00

Jehan 5f9ec3aef0 LangModels: add support for Slovak.

Encodings are the same as Czech (Windows-1250, ISO-8859-2 and
Mac-CentralEurope) since the resource I found indicate they used the
same encodings historically.
Also it is to be noted that the test examples' encoding were already
properly detected through Czech's models so the languages are definitely
very close, even statistically. Nevertheless adding the right models
will work better and these get better scores. This will take all its
meaning when uchardet will also be used as a language detector (in some
not-too-far future, hopefully!).
Test text taken from: https://sk.wikipedia.org/wiki/Jupiter

2016-09-21 13:42:20 +02:00

4 lines

219 B

Plaintext

Raw Blame History

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

 Jupiter je piata plan‚ta v poradˇ od Slnka, najv„źçia a najhmotnejçia plan‚ta
 naçej slneźnej sŁstavy. Je pomenovaně po rˇmskom bohovi Jupiterovi. Symbolom
 plan‚ty je çtylizovan‚ zn zornenie Jupiterovho bo§sk‚ho blesku.