mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-12 06:30:05 +08:00
Contains text taken from Wikipedia on EUC-KR page in Korean. https://ko.wikipedia.org/wiki/EUC-KR I added it as a simili-subtitle file because as the original Mozilla paper says: "The input text may contain extraneous noises which have no relation to its encoding, e.g. HTML tags, non-native words". Therefore I feel it is important to have test files a little noisy if possible, in order to test our resistance to noise in our algorithm. |
||
|---|---|---|
| .. | ||
| big5.txt | ||
| euc-kr.smi | ||
| gb18030.txt | ||
| shift_jis.txt | ||
| utf8.txt | ||