mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-06 16:56:40 +08:00
It actually breaks "zh:big5" so I'm going to hold-off a bit. Adding more language and charset support is slowly starting to show the limitations of our legacy multi-byte charset supports, since I haven't really touched these since the original implementation of Mozilla. It might be time to start reviewing these parts of the code. The test file contents comes from 'Μαρμότα' page on Wikipedia in Greek (though since 2 letters are missing in this encoding, despite its popularity for Greek, I had to be careful in choosing pieces of text without such letters).
2 lines
206 B
Plaintext
2 lines
206 B
Plaintext
† £˜¨£æ«˜ œå¤˜ šâ¤¦ª «¨à¡« ¡é¤ ˜§¦«œ¢¦ç£œ¤¦ ˜§æ ›œ¡˜«â©©œ¨˜ œå›ž «¦¬ šâ¤¦¬ª Marmota, §¦¬ ©¬¤˜¤«é¤«˜ ©«ž¤ „¬¨˜©å˜ ¡˜ «ž <20>樜 ˜ €£œ¨ ¡ã. ’˜ › ᦨ˜ œå›ž «à¤ œå¤˜ ઠœ§å «¦ §¢œå©«¦¤ ¡á«¦ ¡¦ ¡¨çठ©«œ§é¤.
|