Jehan
5ee1c3ee39
LangModels: adding Turkish models for ISO-8859-3 and ISO-8859-9.
2015-12-04 02:35:09 +01:00
Jehan
f0e122b506
LangModels: add Esperanto ISO-8859-3 language model.
2015-12-04 01:35:56 +01:00
Jehan
b56a3c7b84
README: add German support.
2015-12-04 00:07:03 +01:00
Jehan
90728e4068
README: update with Windows-1252 support information.
2015-12-03 21:25:53 +01:00
Jehan
60f641bf37
Update README to mark independence with original Mozilla code.
2015-12-03 20:32:57 +01:00
Jehan
e4260f4a39
Release: version 0.0.4.
2015-12-03 19:48:58 +01:00
Jehan
ba56d91808
Update uchardet URL in various places.
2015-12-03 19:48:29 +01:00
Jehan
d1bc09e4d7
Update authors.
...
I think I deserved being listed in the authors by now. ;-)
2015-12-03 19:44:13 +01:00
Jehan
683255278d
Re-enable Hungarian language models.
...
Now that we have at least one model for ISO-8859-1, the risk of
detecting all ISO-8859-1 texts as ISO-8859-2 is lessened.
2015-12-02 22:24:36 +01:00
Jehan
92efc0b0b0
Update README: Unicode is "International".
2015-11-28 19:44:13 +01:00
Jehan
0289c2a232
Differentiate ASCII and detection failure.
...
The lib used to return "" for both properly detected ASCII and
detection failure. And the tool would return "ascii/unknown".
Make a proper distinction between the 2 cases.
2015-11-28 17:04:52 +01:00
Jehan
4dbc6e7ab3
Update README with French support.
2015-11-28 02:20:57 +01:00
Jehan
b67370230b
Update README and manual...
...
... to indicate several files can be specified on command line.
2015-11-27 18:27:11 +01:00
Jehan
c61e65aeb3
s/MACCYRILLIC/MAC-CYRILLIC/
...
Write encoding names in README same as what uchardet returns.
2015-11-27 18:19:02 +01:00
Jehan
d082704fec
Add Mageia command and specify Mint compatibility.
2015-11-23 17:46:01 +01:00
Jehan
ff5fd5eff9
Release: version 0.0.3.
2015-11-19 15:18:11 +01:00
Jehan
4db0d55692
URL of related project python-chardet has changed.
2015-11-17 21:40:44 +01:00
Jehan
9172b763d1
Add TIS-620 in README (Thai language) and a test file.
...
Test text based on Thai Wikipedia page about the TIS-620 encoding:
https://th.wikipedia.org/wiki/TIS-620
2015-11-17 17:39:45 +01:00
Jehan
399c4c4d9e
Add libchardet in related projects.
...
See https://github.com/BYVoid/uchardet/issues/11
for review of differences with uchardet.
2015-11-17 17:12:44 +01:00
Jehan
dc371f3ba9
uchardet_get_charset() must return iconv-compatible names.
...
It was not clear if our naming followed any kind of rules. In particular,
iconv is a widely used encoding conversion API. We will follow its
naming.
At least 1 returned name was found invalid: x-euc-tw instead of EUC-TW.
Other names have been uppercased to follow naming from `iconv --list`
though iconv is mostly case-insensitive so it should not have been a
problem. "Just in case".
Prober names can still have free naming (only used for output display
apparently).
Finally HZ-GB-2312 is absent from my iconv list, but I can still see
this encoding in libiconv master code with this name. So I will
consider it valid.
2015-11-17 16:15:21 +01:00
Jehan
d0ccdd5db9
Release: version 0.0.2.
2015-11-16 15:56:45 +01:00
Carbo Kuo
69b7133995
Add a link to rust-uchardet on README
2014-11-20 20:06:41 +01:00
Carbo Kuo
6caa8f6580
Add README
2013-11-08 07:02:50 +08:00