Jehan
c8a3572cca
Issue #17 : update README.
...
Replace the old link to the science paper by one on archive-mozilla
website. Remove the original source link as I can't find any archived
version of it (even on archive.org, only the folder structure is saved,
not actual files themselves, so it's useless).
Also add some history, which is probably a nice touch.
Add a link to crossroad to help people who'd want to cross-compile
uchardet.
Finally add the R binding by Artem Klevtsov and QtAV as reported.
2020-04-29 16:20:00 +02:00
Jehan
59f68dbe57
Release: version 0.0.7
2020-04-23 11:48:58 +02:00
Jehan
60bf53c81e
README: update to Gitlab links.
...
Freedesktop moved its infrastructure to Gitlab a while ago.
2020-04-22 00:33:48 +02:00
Jehan
0cfb75724a
README: some small updates.
2020-04-22 00:17:23 +02:00
Jehan
bdfd6116a9
Add a mention about fd.o code of conduct.
2018-09-26 15:12:25 +02:00
Jehan
95872ef41c
Adding some information about building for Windows.
2017-12-26 03:37:42 +01:00
Jehan
056a5a6e51
README: add some applications having uchardet as dependency.
...
There are likely more (and I know some are planning support) but these
are the ones I know of and with support already in.
2017-09-21 00:06:03 +02:00
Jehan
d9d014742a
README: Gentoo also has a uchardet package.
...
And it is up-to-date with upstream URL at Freedesktop! Good!
2017-05-28 21:13:59 +02:00
Jehan
d90d01bc9e
README: adding a flatpak-builder manifest example.
...
Thanks to Sébastien Wilmet for the example.
2017-03-24 23:22:40 +01:00
Jehan
119fed7e8d
LangModels: add Swedish support.
...
Encodings: ISO-8859-1, ISO-8859-4, ISO-8859-9, ISO-8859-15 and
WINDOWS-1252.
Test text from https://sv.wikipedia.org/wiki/Mölle
2016-09-28 22:42:13 +02:00
Jehan
d62154bd6e
LangModels: add Slovene support.
...
Encodings: ISO-8859-2, ISO-8859-16, Windows-1250, IBM852 and
MAC-CENTRALEUROPE.
Test text from https://sl.wikipedia.org/wiki/Naseljivi_planet
2016-09-28 22:13:17 +02:00
Jehan
fbd2efdbe9
LangModels: Romanian support added.
...
Encodings: ISO-8859-2, ISO-8859-16, Windows-1250 and IBM852.
Test texts from https://ro.wikipedia.org/wiki/Danemarca
2016-09-28 19:57:50 +02:00
Jehan
a7525b404d
LangModels: added support for Irish Gaelic.
...
Encodings: ISO-8859-1, ISO-8859-9, ISO-8859-15 and WINDOWS-1252.
Test text from:
https://ga.wikipedia.org/wiki/Gluais_théarmaí_seoltóireachta
2016-09-27 00:49:05 +02:00
Jehan
a3a271dfd5
LangModels: Estonian models created.
...
Encodings: ISO-8859-4, ISO-8859-13, ISO-8859-13, Windows-1252 and
Windows-1257.
Test text from https://et.wikipedia.org/wiki/Anton_Tšehhov
Windows-1257 and ISO-8859-13 are very close so I added quotation marks
(Jutumärgid) which are on codepoints only present in ISO-8859-13,
making both encoding apart.
2016-09-27 00:14:29 +02:00
Jehan
3c6d31f5c2
LangModels: new Croatian models.
...
Supports: ISO-8859-2, ISO-8859-13, ISO-8859-16, IBM852, Windows-1250
and MAC-CENTRALEUROPE.
Test text from https://hr.wikipedia.org/wiki/Brekinja
2016-09-26 01:32:49 +02:00
Jehan
f262b1d65b
LangModels: add Italian support.
...
Officially supported: ISO-8859-1, ISO-8859-3, ISO-8859-9, ISO-8859-15
and WINDOWS-1252. Same as Finnish only ISO-8859-1 and UTF-8 test added
since other encoding end up similar as ISO-8859-1 for most common texts
(i.e. glyphs used in Italian are on the same codepoints on these other
encodings).
Test text from https://it.wikipedia.org/wiki/Architettura_longobarda
2016-09-21 18:52:09 +02:00
Jehan
87d0c16e0e
README: add Finnish support.
2016-09-21 18:35:26 +02:00
Jehan
ac4aa94b73
README: add Polish support…
...
… and update "Mac-CentralEurope" into "MAC-CENTRALEUROPE" (as in iconv).
2016-09-21 17:38:22 +02:00
Jehan
f314b76c0a
README: add Slovak support.
2016-09-21 13:42:31 +02:00
Jehan
5680cba0b8
README: adding Czech and Maltese support information.
2016-09-21 03:45:40 +02:00
Jehan
d810f1175b
README: update Latvian and Lithuanian support.
...
Uchardet now recognizes these langs also with ISO-8859-4 and
ISO-8859-10.
2016-09-21 00:35:23 +02:00
Jehan
9f7ed67166
README: add info on Portuguese support.
2016-09-21 00:05:12 +02:00
Jehan
e98d257ec4
README: add ISO-8859-13 for Latvian and Lithuanian support.
2016-09-20 23:35:12 +02:00
Jehan
2a559e7b52
README, test: update README and rename EUC-KR test to UHC.
2016-09-19 01:44:32 +02:00
Jehan
8a8d6b654c
Release: version 0.0.6.
2016-07-20 01:47:50 +02:00
Jehan
771d78b7df
Update the URL links: uchardet is now a freedesktop project.
2016-07-20 01:47:50 +02:00
Jehan
20eb319359
README: make the licenses as a list.
...
This was breaking as markdown by not creating linefeeds.
2016-07-20 00:21:07 +02:00
Jehan
602c1ab0fc
README, COPYING: adding links and text of licenses GPL 2.0 and LGPL 2.1.
...
Thanks to Ilya Tumaykin for reporting the missing info.
2016-06-04 14:21:38 +02:00
Jehan
d5dba26e04
README: add Danish support for 3 charsets.
2016-02-19 19:11:56 +01:00
Jehan
1694999bce
README: update with VISCII support.
2016-02-13 03:52:06 +01:00
Jehan
178c6119b8
LangModels: add Windows-1258 support for Vietnamese.
...
I was planning on adding VISCII support as well, but Python encode()
method does not have any support for it apparently, so I cannot generate
the proper statistics data with the current version of the string.
2016-02-13 02:32:57 +01:00
Jehan
0446e24c8d
README: uchardet now available on Fedora.
...
Already in Fedora devel and soon to be added as update on Fedora 23,
if I get it correctly. See:
https://bugzilla.redhat.com/show_bug.cgi?id=1264713
https://admin.fedoraproject.org/pkgdb/package/rpms/uchardet/
2016-02-12 17:53:22 +01:00
Jehan
9c3c37517c
LangModels: add Arabic support.
...
Models constructed for ISO-8859-6 and Windows-1256.
2015-12-13 18:42:16 +01:00
Jehan
ffabb65712
LangModels: adding Spanish support.
...
With 3 charsets: ISO-8859-1, ISO-8859-15 and Windows-1252.
2015-12-12 18:54:35 +01:00
Jehan
886e03a523
Release: version 0.0.5.
2015-12-04 22:45:26 +01:00
Jehan
2856e68aac
README: reorganize support list by alphabetic order.
...
(Except for "International" and "Others")
2015-12-04 03:33:22 +01:00
Jehan
dc03ea002f
README: supports are per-language rather than per script system.
...
In particular separate "Cyrillic" into "Russian" and "Bulgarian"
(currently our only 2 supported languages using Cyrillic script).
2015-12-04 03:22:05 +01:00
Jehan
fb3c47a073
LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models.
...
ISO-8859-11 is basically exactly identical to TIS-620, with the added
non-breaking space character.
Basically our detection will always return TIS-620 except for
exceptional cases when a text has a non-breaking space.
2015-12-04 03:14:52 +01:00
Jehan
5ee1c3ee39
LangModels: adding Turkish models for ISO-8859-3 and ISO-8859-9.
2015-12-04 02:35:09 +01:00
Jehan
f0e122b506
LangModels: add Esperanto ISO-8859-3 language model.
2015-12-04 01:35:56 +01:00
Jehan
b56a3c7b84
README: add German support.
2015-12-04 00:07:03 +01:00
Jehan
90728e4068
README: update with Windows-1252 support information.
2015-12-03 21:25:53 +01:00
Jehan
60f641bf37
Update README to mark independence with original Mozilla code.
2015-12-03 20:32:57 +01:00
Jehan
e4260f4a39
Release: version 0.0.4.
2015-12-03 19:48:58 +01:00
Jehan
ba56d91808
Update uchardet URL in various places.
2015-12-03 19:48:29 +01:00
Jehan
d1bc09e4d7
Update authors.
...
I think I deserved being listed in the authors by now. ;-)
2015-12-03 19:44:13 +01:00
Jehan
683255278d
Re-enable Hungarian language models.
...
Now that we have at least one model for ISO-8859-1, the risk of
detecting all ISO-8859-1 texts as ISO-8859-2 is lessened.
2015-12-02 22:24:36 +01:00
Jehan
92efc0b0b0
Update README: Unicode is "International".
2015-11-28 19:44:13 +01:00
Jehan
0289c2a232
Differentiate ASCII and detection failure.
...
The lib used to return "" for both properly detected ASCII and
detection failure. And the tool would return "ascii/unknown".
Make a proper distinction between the 2 cases.
2015-11-28 17:04:52 +01:00
Jehan
4dbc6e7ab3
Update README with French support.
2015-11-28 02:20:57 +01:00