66 Commits

Author SHA1 Message Date
Jehan
ae6302a016 Release: version 0.0.8. 2022-12-08 21:52:25 +01:00
Jehan
c218a3ccd6 README: add a section about CMake exported targets.
Since it's a new feature, we may as well write about it, even though I
would personally not recommend this in favor of more standard and
generic pkg-config (which is not dependent on which build system we are
using ourselves).
2022-11-30 23:48:16 +01:00
Jehan
6196f86c46 README: update with newly added (lang, charset) couples. 2022-11-30 20:06:52 +01:00
myd7349
143b3fe513 README: update libchardet repository link 2022-08-01 19:38:19 +08:00
Aaron Madlon-Kay
6f38ab95f5 Mention MacPorts in readme 2021-01-27 06:57:58 +00:00
Jehan
c8a3572cca Issue #17: update README.
Replace the old link to the science paper by one on archive-mozilla
website. Remove the original source link as I can't find any archived
version of it (even on archive.org, only the folder structure is saved,
not actual files themselves, so it's useless).

Also add some history, which is probably a nice touch.

Add a link to crossroad to help people who'd want to cross-compile
uchardet.

Finally add the R binding by Artem Klevtsov and QtAV as reported.
2020-04-29 16:20:00 +02:00
Jehan
59f68dbe57 Release: version 0.0.7 2020-04-23 11:48:58 +02:00
Jehan
60bf53c81e README: update to Gitlab links.
Freedesktop moved its infrastructure to Gitlab a while ago.
2020-04-22 00:33:48 +02:00
Jehan
0cfb75724a README: some small updates. 2020-04-22 00:17:23 +02:00
Jehan
bdfd6116a9 Add a mention about fd.o code of conduct. 2018-09-26 15:12:25 +02:00
Jehan
95872ef41c Adding some information about building for Windows. 2017-12-26 03:37:42 +01:00
Jehan
056a5a6e51 README: add some applications having uchardet as dependency.
There are likely more (and I know some are planning support) but these
are the ones I know of and with support already in.
2017-09-21 00:06:03 +02:00
Jehan
d9d014742a README: Gentoo also has a uchardet package.
And it is up-to-date with upstream URL at Freedesktop! Good!
2017-05-28 21:13:59 +02:00
Jehan
d90d01bc9e README: adding a flatpak-builder manifest example.
Thanks to Sébastien Wilmet for the example.
2017-03-24 23:22:40 +01:00
Jehan
119fed7e8d LangModels: add Swedish support.
Encodings: ISO-8859-1, ISO-8859-4, ISO-8859-9, ISO-8859-15 and
WINDOWS-1252.
Test text from https://sv.wikipedia.org/wiki/Mölle
2016-09-28 22:42:13 +02:00
Jehan
d62154bd6e LangModels: add Slovene support.
Encodings: ISO-8859-2, ISO-8859-16, Windows-1250, IBM852 and
MAC-CENTRALEUROPE.
Test text from https://sl.wikipedia.org/wiki/Naseljivi_planet
2016-09-28 22:13:17 +02:00
Jehan
fbd2efdbe9 LangModels: Romanian support added.
Encodings: ISO-8859-2, ISO-8859-16, Windows-1250 and IBM852.
Test texts from https://ro.wikipedia.org/wiki/Danemarca
2016-09-28 19:57:50 +02:00
Jehan
a7525b404d LangModels: added support for Irish Gaelic.
Encodings: ISO-8859-1, ISO-8859-9, ISO-8859-15 and WINDOWS-1252.
Test text from:
https://ga.wikipedia.org/wiki/Gluais_théarmaí_seoltóireachta
2016-09-27 00:49:05 +02:00
Jehan
a3a271dfd5 LangModels: Estonian models created.
Encodings: ISO-8859-4, ISO-8859-13, ISO-8859-13, Windows-1252 and
Windows-1257.
Test text from https://et.wikipedia.org/wiki/Anton_Tšehhov
Windows-1257 and ISO-8859-13 are very close so I added quotation marks
(Jutumärgid) which are on codepoints only present in ISO-8859-13,
making both encoding apart.
2016-09-27 00:14:29 +02:00
Jehan
3c6d31f5c2 LangModels: new Croatian models.
Supports: ISO-8859-2, ISO-8859-13, ISO-8859-16, IBM852, Windows-1250
and MAC-CENTRALEUROPE.
Test text from https://hr.wikipedia.org/wiki/Brekinja
2016-09-26 01:32:49 +02:00
Jehan
f262b1d65b LangModels: add Italian support.
Officially supported: ISO-8859-1, ISO-8859-3, ISO-8859-9, ISO-8859-15
and WINDOWS-1252. Same as Finnish only ISO-8859-1 and UTF-8 test added
since other encoding end up similar as ISO-8859-1 for most common texts
(i.e. glyphs used in Italian are on the same codepoints on these other
encodings).
Test text from https://it.wikipedia.org/wiki/Architettura_longobarda
2016-09-21 18:52:09 +02:00
Jehan
87d0c16e0e README: add Finnish support. 2016-09-21 18:35:26 +02:00
Jehan
ac4aa94b73 README: add Polish support…
… and update "Mac-CentralEurope" into "MAC-CENTRALEUROPE" (as in iconv).
2016-09-21 17:38:22 +02:00
Jehan
f314b76c0a README: add Slovak support. 2016-09-21 13:42:31 +02:00
Jehan
5680cba0b8 README: adding Czech and Maltese support information. 2016-09-21 03:45:40 +02:00
Jehan
d810f1175b README: update Latvian and Lithuanian support.
Uchardet now recognizes these langs also with ISO-8859-4 and
ISO-8859-10.
2016-09-21 00:35:23 +02:00
Jehan
9f7ed67166 README: add info on Portuguese support. 2016-09-21 00:05:12 +02:00
Jehan
e98d257ec4 README: add ISO-8859-13 for Latvian and Lithuanian support. 2016-09-20 23:35:12 +02:00
Jehan
2a559e7b52 README, test: update README and rename EUC-KR test to UHC. 2016-09-19 01:44:32 +02:00
Jehan
8a8d6b654c Release: version 0.0.6. 2016-07-20 01:47:50 +02:00
Jehan
771d78b7df Update the URL links: uchardet is now a freedesktop project. 2016-07-20 01:47:50 +02:00
Jehan
20eb319359 README: make the licenses as a list.
This was breaking as markdown by not creating linefeeds.
2016-07-20 00:21:07 +02:00
Jehan
602c1ab0fc README, COPYING: adding links and text of licenses GPL 2.0 and LGPL 2.1.
Thanks to Ilya Tumaykin for reporting the missing info.
2016-06-04 14:21:38 +02:00
Jehan
d5dba26e04 README: add Danish support for 3 charsets. 2016-02-19 19:11:56 +01:00
Jehan
1694999bce README: update with VISCII support. 2016-02-13 03:52:06 +01:00
Jehan
178c6119b8 LangModels: add Windows-1258 support for Vietnamese.
I was planning on adding VISCII support as well, but Python encode()
method does not have any support for it apparently, so I cannot generate
the proper statistics data with the current version of the string.
2016-02-13 02:32:57 +01:00
Jehan
0446e24c8d README: uchardet now available on Fedora.
Already in Fedora devel and soon to be added as update on Fedora 23,
if I get it correctly. See:
https://bugzilla.redhat.com/show_bug.cgi?id=1264713
https://admin.fedoraproject.org/pkgdb/package/rpms/uchardet/
2016-02-12 17:53:22 +01:00
Jehan
9c3c37517c LangModels: add Arabic support.
Models constructed for ISO-8859-6 and Windows-1256.
2015-12-13 18:42:16 +01:00
Jehan
ffabb65712 LangModels: adding Spanish support.
With 3 charsets: ISO-8859-1, ISO-8859-15 and Windows-1252.
2015-12-12 18:54:35 +01:00
Jehan
886e03a523 Release: version 0.0.5. 2015-12-04 22:45:26 +01:00
Jehan
2856e68aac README: reorganize support list by alphabetic order.
(Except for "International" and "Others")
2015-12-04 03:33:22 +01:00
Jehan
dc03ea002f README: supports are per-language rather than per script system.
In particular separate "Cyrillic" into "Russian" and "Bulgarian"
(currently our only 2 supported languages using Cyrillic script).
2015-12-04 03:22:05 +01:00
Jehan
fb3c47a073 LangModels: add ISO-8859-11 and regenerate TIS-620 Thai models.
ISO-8859-11 is basically exactly identical to TIS-620, with the added
non-breaking space character.
Basically our detection will always return TIS-620 except for
exceptional cases when a text has a non-breaking space.
2015-12-04 03:14:52 +01:00
Jehan
5ee1c3ee39 LangModels: adding Turkish models for ISO-8859-3 and ISO-8859-9. 2015-12-04 02:35:09 +01:00
Jehan
f0e122b506 LangModels: add Esperanto ISO-8859-3 language model. 2015-12-04 01:35:56 +01:00
Jehan
b56a3c7b84 README: add German support. 2015-12-04 00:07:03 +01:00
Jehan
90728e4068 README: update with Windows-1252 support information. 2015-12-03 21:25:53 +01:00
Jehan
60f641bf37 Update README to mark independence with original Mozilla code. 2015-12-03 20:32:57 +01:00
Jehan
e4260f4a39 Release: version 0.0.4. 2015-12-03 19:48:58 +01:00
Jehan
ba56d91808 Update uchardet URL in various places. 2015-12-03 19:48:29 +01:00