Jehan
e98d257ec4
README: add ISO-8859-13 for Latvian and Lithuanian support.
2016-09-20 23:35:12 +02:00
Jehan
2a559e7b52
README, test: update README and rename EUC-KR test to UHC.
2016-09-19 01:44:32 +02:00
Jehan
8a8d6b654c
Release: version 0.0.6.
2016-07-20 01:47:50 +02:00
Jehan
771d78b7df
Update the URL links: uchardet is now a freedesktop project.
2016-07-20 01:47:50 +02:00
Jehan
20eb319359
README: make the licenses as a list.
...
This was breaking as markdown by not creating linefeeds.
2016-07-20 00:21:07 +02:00
Jehan
602c1ab0fc
README, COPYING: adding links and text of licenses GPL 2.0 and LGPL 2.1.
...
Thanks to Ilya Tumaykin for reporting the missing info.
2016-06-04 14:21:38 +02:00
Jehan
d5dba26e04
README: add Danish support for 3 charsets.
2016-02-19 19:11:56 +01:00
Jehan
1694999bce
README: update with VISCII support.
2016-02-13 03:52:06 +01:00
Jehan
178c6119b8
LangModels: add Windows-1258 support for Vietnamese.
...
I was planning on adding VISCII support as well, but Python encode()
method does not have any support for it apparently, so I cannot generate
the proper statistics data with the current version of the string.
2016-02-13 02:32:57 +01:00
Jehan
0446e24c8d
README: uchardet now available on Fedora.
...
Already in Fedora devel and soon to be added as update on Fedora 23,
if I get it correctly. See:
https://bugzilla.redhat.com/show_bug.cgi?id=1264713
https://admin.fedoraproject.org/pkgdb/package/rpms/uchardet/
2016-02-12 17:53:22 +01:00
Jehan
9c3c37517c
LangModels: add Arabic support.
...
Models constructed for ISO-8859-6 and Windows-1256.
2015-12-13 18:42:16 +01:00
Jehan
ffabb65712
LangModels: adding Spanish support.
...
With 3 charsets: ISO-8859-1, ISO-8859-15 and Windows-1252.
2015-12-12 18:54:35 +01:00
Jehan
886e03a523
Release: version 0.0.5.
2015-12-04 22:45:26 +01:00
Jehan
2856e68aac
README: reorganize support list by alphabetic order.
...
(Except for "International" and "Others")
2015-12-04 03:33:22 +01:00
Jehan
dc03ea002f
README: supports are per-language rather than per script system.
...
In particular separate "Cyrillic" into "Russian" and "Bulgarian"
(currently our only 2 supported languages using Cyrillic script).
2015-12-04 03:22:05 +01:00
Jehan
fb3c47a073
LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models.
...
ISO-8859-11 is basically exactly identical to TIS-620, with the added
non-breaking space character.
Basically our detection will always return TIS-620 except for
exceptional cases when a text has a non-breaking space.
2015-12-04 03:14:52 +01:00
Jehan
5ee1c3ee39
LangModels: adding Turkish models for ISO-8859-3 and ISO-8859-9.
2015-12-04 02:35:09 +01:00
Jehan
f0e122b506
LangModels: add Esperanto ISO-8859-3 language model.
2015-12-04 01:35:56 +01:00
Jehan
b56a3c7b84
README: add German support.
2015-12-04 00:07:03 +01:00
Jehan
90728e4068
README: update with Windows-1252 support information.
2015-12-03 21:25:53 +01:00
Jehan
60f641bf37
Update README to mark independence with original Mozilla code.
2015-12-03 20:32:57 +01:00
Jehan
e4260f4a39
Release: version 0.0.4.
2015-12-03 19:48:58 +01:00
Jehan
ba56d91808
Update uchardet URL in various places.
2015-12-03 19:48:29 +01:00
Jehan
d1bc09e4d7
Update authors.
...
I think I deserved being listed in the authors by now. ;-)
2015-12-03 19:44:13 +01:00
Jehan
683255278d
Re-enable Hungarian language models.
...
Now that we have at least one model for ISO-8859-1, the risk of
detecting all ISO-8859-1 texts as ISO-8859-2 is lessened.
2015-12-02 22:24:36 +01:00
Jehan
92efc0b0b0
Update README: Unicode is "International".
2015-11-28 19:44:13 +01:00
Jehan
0289c2a232
Differentiate ASCII and detection failure.
...
The lib used to return "" for both properly detected ASCII and
detection failure. And the tool would return "ascii/unknown".
Make a proper distinction between the 2 cases.
2015-11-28 17:04:52 +01:00
Jehan
4dbc6e7ab3
Update README with French support.
2015-11-28 02:20:57 +01:00
Jehan
b67370230b
Update README and manual...
...
... to indicate several files can be specified on command line.
2015-11-27 18:27:11 +01:00
Jehan
c61e65aeb3
s/MACCYRILLIC/MAC-CYRILLIC/
...
Write encoding names in README same as what uchardet returns.
2015-11-27 18:19:02 +01:00
Jehan
d082704fec
Add Mageia command and specify Mint compatibility.
2015-11-23 17:46:01 +01:00
Jehan
ff5fd5eff9
Release: version 0.0.3.
2015-11-19 15:18:11 +01:00
Jehan
4db0d55692
URL of related project python-chardet has changed.
2015-11-17 21:40:44 +01:00
Jehan
9172b763d1
Add TIS-620 in README (Thai language) and a test file.
...
Test text based on Thai Wikipedia page about the TIS-620 encoding:
https://th.wikipedia.org/wiki/TIS-620
2015-11-17 17:39:45 +01:00
Jehan
399c4c4d9e
Add libchardet in related projects.
...
See https://github.com/BYVoid/uchardet/issues/11
for review of differences with uchardet.
2015-11-17 17:12:44 +01:00
Jehan
dc371f3ba9
uchardet_get_charset() must return iconv-compatible names.
...
It was not clear if our naming followed any kind of rules. In particular,
iconv is a widely used encoding conversion API. We will follow its
naming.
At least 1 returned name was found invalid: x-euc-tw instead of EUC-TW.
Other names have been uppercased to follow naming from `iconv --list`
though iconv is mostly case-insensitive so it should not have been a
problem. "Just in case".
Prober names can still have free naming (only used for output display
apparently).
Finally HZ-GB-2312 is absent from my iconv list, but I can still see
this encoding in libiconv master code with this name. So I will
consider it valid.
2015-11-17 16:15:21 +01:00
Jehan
d0ccdd5db9
Release: version 0.0.2.
2015-11-16 15:56:45 +01:00
Carbo Kuo
69b7133995
Add a link to rust-uchardet on README
2014-11-20 20:06:41 +01:00
Carbo Kuo
6caa8f6580
Add README
2013-11-08 07:02:50 +08:00