uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2026-02-07 10:19:59 +08:00

Author	SHA1	Message	Date
Jehan	b7aebfdfda	LangModels: add support for Latvian \| Lithuanian / ISO-8859-4 \| ISO-8859-10. Just realizing that these 2 language can also be encoded with these charsets (even though ISO-8859-13 would appear to be more common… maybe?). Anyway now the models are updated and can recognize texts using these encoding for these languages. Added some test files as well, which work great.	2016-09-21 00:27:16 +02:00
Jehan	9f7ed67166	README: add info on Portuguese support.	2016-09-21 00:05:12 +02:00
Jehan	e138839f07	LangModels: add support for Portuguese / ISO-8859-1. I actually added also couples with ISO-8859-9, ISO-8859-15 and Windows-1252. Nevertheless there are no differences on the main characters related to Portuguese so differences will hardly be made and detection will usually return ISO-8859-1 only.	2016-09-21 00:01:07 +02:00
Jehan	e98d257ec4	README: add ISO-8859-13 for Latvian and Lithuanian support.	2016-09-20 23:35:12 +02:00
Jehan	ea2f4dd40f	LangModels: new support for Latvian / ISO-8859-13. Test text extracted from: https://lv.wikipedia.org/wiki/Vinsents_van_Gogs	2016-09-20 23:29:53 +02:00
Jehan	7cb3dd9ddd	LangModels: add support for Lithuanian / ISO-8859-13. Test text extracted from https://lt.wikipedia.org/wiki/Vincent_van_Gogh.	2016-09-20 23:09:24 +02:00
Jehan	2a559e7b52	README, test: update README and rename EUC-KR test to UHC.	2016-09-19 01:44:32 +02:00
Jehan	157de1dc65	src: the EUC-KR prober now returns "UHC" as encoding name. "UHC" is the "Unified Hangul Code" (aka Windows-949 or CP949). It is apparently "mostly" upward compatible with EUC-KR so returning UHC for a strict EUC-KR document is usually not to be considered wrong. Yet I can read that EUC-KR has its own way of representing hangul syllables not available in precomposed form, and this is not supported in UHC (since this latter has all possible precomposed syllables), hence the "mostly" upward-compatibility. My personal daily experience with Korean documents though is that I encounter a lot of UHC-encoded files, probably because of predominance of Microsoft operating systems, which spread this encoding. So until we get 2 separate detection machines, let's just return EUC-KR files as being "UHC".	2016-09-19 01:22:45 +02:00
Jehan	f14519a0fe	doc: add a README for releases.	2016-07-20 02:10:06 +02:00
Jehan	8a8d6b654c	Release: version 0.0.6. v0.0.6	2016-07-20 01:47:50 +02:00
Jehan	771d78b7df	Update the URL links: uchardet is now a freedesktop project.	2016-07-20 01:47:50 +02:00
Jehan	20eb319359	README: make the licenses as a list. This was breaking as markdown by not creating linefeeds.	2016-07-20 00:21:07 +02:00
Jehan	602c1ab0fc	README, COPYING: adding links and text of licenses GPL 2.0 and LGPL 2.1. Thanks to Ilya Tumaykin for reporting the missing info.	2016-06-04 14:21:38 +02:00
Jehan	210e52d99a	LangModels: update the Greek language models. I did this to improve the model after a user reported a Greek sutitle badly detected (see commit e0eec3b). It didn't help, but well... since I updated it with much more data from Wikipedia. Let's just commit it!	2016-05-25 17:39:10 +02:00
Jehan	e0eec3bae8	src: give a little weight to "probable sequences". Up to now, we were only considering positive sequences, which are sequences of 2 characters which happen the most. Yet our data gather 4 categories of sequences (the last one being called "negative", since they never happened in our data). I will call the category below positive: probable sequences. They may happen, yet not often. The last category could be called "neutral". This seems to fix the detection of a user's subtitle example without breaking any of our current unit tests. Probably I should still review this whole logics more in details later.	2016-05-25 17:38:20 +02:00
Jehan	4287d3accc	src: trailing whitespace removed.	2016-05-25 16:07:17 +02:00
Jehan	6cd8c322ad	script: stupid bug on BuildLangModel.py.	2016-05-25 15:23:36 +02:00
Jehan	fb1d544007	pkg-config: use GNUInstallDirs CMAKE_ variables in pc.in template.	2016-03-27 20:31:58 +02:00
Jehan	74b4f6a62b	Merge pull request #30 from Coacher/use-gnuinstalldirs-cmake-module Use GNUInstallDirs cmake module, fix library filename bug, minor cleanups.	2016-03-27 20:31:17 +02:00
Ilya Tumaykin	2a3e41a6c3	cmake: drop useless PACKAGE_NAME redefinition	2016-03-22 01:23:06 +03:00
Ilya Tumaykin	6db8b6f8fe	cmake: minor comment cleanups	2016-03-22 01:23:06 +03:00
Ilya Tumaykin	d0e7ddd8ab	cmake: fix library filename and SONAME Make library filename respect the current uchardet version and make library SONAME respect the current major version.	2016-03-22 01:23:05 +03:00
Ilya Tumaykin	dbeee08335	cmake: use lowercase suffix for debug build	2016-03-22 01:23:05 +03:00
Ilya Tumaykin	ad647d2e0a	cmake: keep compiler definitions in one place	2016-03-22 01:23:05 +03:00
Ilya Tumaykin	29f18210b1	cmake: hardcode less	2016-03-22 01:23:04 +03:00
Ilya Tumaykin	7201835c98	cmake: export UCHARDET_LIBRARY to the topmost scope	2016-03-22 01:23:04 +03:00
Ilya Tumaykin	e7feb35627	cmake: rename UCHARDET_STATIC_{TARGET -> LIBRARY} for clarity	2016-03-22 01:23:04 +03:00
Ilya Tumaykin	1a1f4bfbd8	cmake: rename UCHARDET_{TARGET -> LIBRARY} for clarity	2016-03-22 01:23:03 +03:00
Ilya Tumaykin	31a53570d6	cmake: use GNUInstallDirs cmake module Available in cmake >= 2.8.5.	2016-03-22 01:23:03 +03:00
Ilya Tumaykin	d0e29dc934	cmake: bump the minimum version to 2.8.5 Required for the GNUInstallDirs cmake module. See the next commit.	2016-03-22 01:21:58 +03:00
Jehan	ad7db2769e	Merge pull request #26 from Coacher/uniform-indent cmake: uniform indent everywhere.	2016-03-21 00:22:19 +01:00
Ilya Tumaykin	b44be77be6	cmake: uniform indent everywhere Indent with tabs, remove leading/trailing blank lines and spaces.	2016-03-21 01:07:41 +03:00
Jehan	b88a66f3f1	Merge pull request #28 from Coacher/cmake-updates cmake: use PACKAGE_NAME variable instead of hardcoding it.	2016-03-19 14:24:52 +01:00
Carbo Kuo	e28dfe3776	Merge pull request #29 from wiiaboo/ab-suite CMake: Fix regression in f53cb8c building in paths with spaces	2016-03-18 16:31:31 +01:00
Ricardo Constantino (:RiCON)	78b55ec9fe	CMake: Fix regression in f53cb8c building in paths with spaces Tested with Ninja and Make in Windows and Archlinux with paths with and without spaces.	2016-03-18 03:37:12 +00:00
Ilya Tumaykin	6c1e310f9b	cmake: hardcode less	2016-03-18 02:56:21 +03:00
Jehan	fcc525a64f	Merge pull request #25 from Coacher/master cmake: purge remnants of opencc after b6d872bb	2016-03-17 19:10:39 +01:00
Jehan	d255184609	Merge pull request #24 from wiiaboo/ab-suite Improving build with more options. Building only static possible, uchardet command line tool build can be disabled, bindir can be customized…	2016-03-17 19:09:30 +01:00
Ricardo Constantino (:RiCON)	86755b1f57	CMake: Don't build static more than once	2016-03-16 19:31:00 +00:00
Ricardo Constantino (:RiCON)	b908b689a0	CMake: Add static lib destination to UCHARDET_TARGET	2016-03-16 19:30:54 +00:00
Ricardo Constantino (:RiCON)	81ed86a26b	CMake: Use only CMAKE_INSTALL_BINDIR instead of DIR_BIN This way it always shows up in ccmake, even if not defined. A string is used instead of path because I personally think it makes more sense in the following use-cases: STRING: -DCMAKE_INSTALL_PREFIX=/home/user -DCMAKE_INSTALL_BINDIR=bins installs everything to /home/user/{lib,etc,share,(...)} and executables to ${CMAKE_INSTALL_PREFIX}/bins -DCMAKE_INSTALL_PREFIX=/home/user -DCMAKE_INSTALL_BINDIR=/opt/bin everything to /home/user/{lib,etc,share,(...)} and executables to /opt/bin PATH: -DCMAKE_INSTALL_PREFIX=/home/user -DCMAKE_INSTALL_BINDIR=bins everything to /home/user/{lib,etc,share,(...)} and executables to $(pwd)/bins (!) -DCMAKE_INSTALL_PREFIX=/home/user -DCMAKE_INSTALL_BINDIR=/opt/bin same as STRING	2016-03-16 19:11:33 +00:00
Ilya Tumaykin	aa4c2aeada	cmake: purge remnants of opencc after b6d872bb	2016-03-16 19:43:58 +03:00
Ricardo Constantino (:RiCON)	50b2e0802f	CMake: Allow not building executable	2016-03-16 14:34:03 +00:00
Ricardo Constantino (:RiCON)	6500f09931	CMake: Allow building static-only builds Add stdc++ to static libs in pkg-config	2016-03-16 14:30:15 +00:00
Ricardo Constantino (:RiCON)	f53cb8cddd	CMake: fix linking with Ninja	2016-03-16 14:17:47 +00:00
Ricardo Constantino (:RiCON)	36665da832	CMake: allow installing binary to non-default dir	2016-03-16 14:17:25 +00:00
Jehan	198190461e	script: move the Wikipedia title syntax cleaning to BuildLangModel.py.	2016-02-21 16:20:22 +01:00
Jehan	d24bd7d578	script: Wikipedia API's python wrapper does not return garbage text anymore. I can't see new commits since 2014. So I am assuming the issue was on Wikipedia side and that it has been fixed.	2016-02-21 16:07:10 +01:00
Jehan	37024460fe	script: add a README file dedicated to adding new support.	2016-02-21 16:06:11 +01:00
Jehan	42c6b42f65	Add a DOAP file. All URLs are still referring to the github project, because we have no other homepage or bug tracker yet.	2016-02-21 15:19:50 +01:00

1 2 3 4

198 Commits