Commit Graph

  • 669ede73a3 src: new weight concept in the C API. Jehan 2020-04-27 18:07:32 +02:00
  • f74d602449 src: fix the usage of uchardet tool. Jehan 2020-04-27 15:24:17 +02:00
  • d48ee7abc2 src: uchardet tool now shows the language code in verbose mode. Jehan 2020-04-23 18:39:17 +02:00
  • c550af99a7 script: update BuildLangModel.py to updated SequenceModel struct. Jehan 2020-04-23 18:36:24 +02:00
  • 5a949265d5 src: new API to get the detected language. Jehan 2020-04-23 18:24:12 +02:00
  • e7bf25ca08 test: fix test script to use the new API and get rid of build warning. Jehan 2020-04-23 16:43:56 +02:00
  • 7bc1bc4e0a src: new option --verbose|-V in the uchardet CLI tool. Jehan 2020-04-23 16:43:08 +02:00
  • 8118133e00 src: new API to get all candidates and their confidence. Jehan 2020-04-23 16:40:02 +02:00
  • 15fc8f0a0f src: now reporting encoding+confidence and keeping a list. Jehan 2020-04-23 16:15:54 +02:00
  • 2f5c24006e README, doc: some README and release procedure updates. Jehan 2022-12-08 22:32:56 +01:00
  • ae6302a016 Release: version 0.0.8. v0.0.8 Jehan 2022-11-30 21:06:30 +01:00
  • c218a3ccd6 README: add a section about CMake exported targets. Jehan 2022-11-30 22:23:43 +01:00
  • 6196f86c46 README: update with newly added (lang, charset) couples. Jehan 2022-11-30 20:06:52 +01:00
  • 388777be51 script, src, test: add IBM865 support for Danish. Jehan 2022-11-30 19:45:17 +01:00
  • 5aa628272b script: fix small issues with commits e41e8a4 and 8d15d6b. Jehan 2022-11-30 19:22:40 +01:00
  • c11c362b89 Add tests for norwegian Martin T. H. Sandsmark 2022-01-28 22:12:48 +01:00
  • 099a9a4fd6 Add norwegian support Martin T. H. Sandsmark 2022-01-28 21:29:20 +01:00
  • e41e8a47e4 improve model building script a bit Martin T. H. Sandsmark 2022-01-28 21:59:31 +01:00
  • 8d15d6b557 make the logfile usable Martin T. H. Sandsmark 2022-01-28 21:39:18 +01:00
  • 2a04e57c8f test: update the Maltese / ISO-8859-3 test file. Jehan 2021-05-23 16:41:27 +02:00
  • 45bd32d102 src/tools/uchardet.cpp: make stuff static Lucinda May Phipps 2022-04-23 17:07:04 +00:00
  • ef19faa8c5 Update uchardet-tests.c Lucinda May Phipps 2022-04-23 03:07:03 +00:00
  • 383bf118c9 don't use feof Lucinda May Phipps 2022-04-23 02:48:12 +00:00
  • 143b3fe513 README: update libchardet repository link myd7349 2022-08-01 19:38:19 +08:00
  • 23a664560b Issue #27: fix cmake andiwand 2021-11-13 16:54:09 +01:00
  • b3b2bd2721 gitignore: I forgot the 2 executables (CLI tool and test binary). Jehan 2021-11-09 14:25:16 +01:00
  • 48db2b0800 gitignore: add files generated by the build system. Jehan 2021-11-09 12:17:27 +01:00
  • d7dad549bd cmake exported targets Pedro López-Cabanillas 2021-11-09 09:52:15 +00:00
  • c34d882bb7 gitlab-ci: make test on Windows too. wip/Jehan/improved-API-make-test-win64 Jehan 2021-03-22 21:25:51 +01:00
  • adb1d57864 test: fix test binary build for Windows. Jehan 2021-03-22 21:06:20 +01:00
  • d5759d7e93 src: reset shortcut charset/language on Reset(). Jehan 2021-03-22 18:29:34 +01:00
  • f1e69d5bcf src: do not test with nsLatin1Prober anymore. Jehan 2021-03-22 18:15:34 +01:00
  • 06d9d1eac0 src: improve confidence computation (generic and single-byte charset). Jehan 2021-03-22 18:03:02 +01:00
  • 4a579fae02 script: generate more complete frequent characters when range is set. Jehan 2021-03-22 17:44:06 +01:00
  • d3dce1e98b script, src: regenerate the Thai model. Jehan 2021-03-22 17:06:27 +01:00
  • 4dee1a747d src, script: fix the order of characters for Vietnamese. Jehan 2021-03-21 16:02:03 +01:00
  • f8752f2b56 src, script: add concept of alphabet_mapping in language models. Jehan 2021-03-21 15:54:24 +01:00
  • 5fe9a7e1df script: regenerate Slovak and Slovene with better alphabet support. Jehan 2021-03-21 13:30:41 +01:00
  • 872294d2a9 script: fix a stupid bug making same ratio for all frequent characters. Jehan 2021-03-21 12:30:29 +01:00
  • 7439766ece script, src: regenerate the Vietnamese model. Jehan 2021-03-21 01:12:56 +01:00
  • e6b4811c9b src: fix negative confidence wrapping around because of unsigned int. Jehan 2021-03-20 23:02:10 +01:00
  • 4ef378ce2e script, src: remove generated statistics data for Korean. Jehan 2021-03-20 22:59:52 +01:00
  • 310e750abd src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. Jehan 2021-03-20 22:12:45 +01:00
  • 7493f8b6b7 README: fix a duplicate. Jehan 2021-03-19 23:45:30 +01:00
  • 406e1d0b29 Update README. Jehan 2021-03-19 23:24:34 +01:00
  • 7459a4d9b3 src: consider any combination with a non-frequent character as sequence. Jehan 2021-03-19 22:37:27 +01:00
  • 0729dfa974 src: add Hindi/UTF-8 support. Jehan 2021-03-19 22:34:55 +01:00
  • 2bc013b7b0 src: improve confidence computation. Jehan 2021-03-19 21:46:53 +01:00
  • 189169fe8f script: fix a bit BuildLangModel.py when use_ascii is True. Jehan 2021-03-19 18:38:30 +01:00
  • 36fd02fc2d script, src: add generic Korean model. Jehan 2021-03-18 17:51:22 +01:00
  • ccb5d40a6f src, test: fix the new Johab prober and add a test. Jehan 2021-03-18 00:23:13 +01:00
  • b1f6c88792 src: build new charset prober for Johab Korean. Jehan 2021-03-14 12:59:25 +01:00
  • 417013219c add charset prober for Johab Korean LSY 2019-03-14 06:34:42 +09:00
  • 71ca5a7cd5 script, src: generate the Hebrew models. Jehan 2021-03-17 23:22:50 +01:00
  • ec30b2ad54 test: 4 new tests for UTF-8. Jehan 2021-03-17 22:27:24 +01:00
  • d26bc965ad src: drop the SURE_YES confidence for character distribution probers. Jehan 2021-03-17 21:32:49 +01:00
  • 8b1755cac2 src: do not shortcut UTF-8 detection too early. Jehan 2021-03-17 21:26:31 +01:00
  • 5463f4e0c0 src: nsEscCharsetProber also returns the correct language. Jehan 2021-03-17 17:15:56 +01:00
  • ba6b46a68c src: make nsMBCSGroupProber report all valid candidates. Jehan 2021-03-17 16:34:26 +01:00
  • 49ed0e6f45 src: allow for nsCharSetProber to return several candidates. Jehan 2021-03-17 13:23:33 +01:00
  • 41fc0f235b src: nsMBCSGroupProber confidence weighed by language confidence. Jehan 2021-03-17 13:09:10 +01:00
  • 714ae9ca29 src: tweak again the language detection confidence. Jehan 2021-03-17 12:51:25 +01:00
  • 26ed628061 test: update unit test to check detected languages. Jehan 2021-03-17 12:39:54 +01:00
  • f30c1cd8c8 src: reset language detectors when resetting a nsMBCSGroupProber. Jehan 2021-03-17 11:03:30 +01:00
  • 5c3a2e8037 src, script: regenerate all existing language models. Jehan 2021-03-17 02:07:17 +01:00
  • 2a4d8d890e Using the generic language detector in UTF-8 detection. Jehan 2021-03-15 12:01:35 +01:00
  • 04c4fd419d New generic language detector class. Jehan 2021-03-16 12:05:56 +01:00
  • 9518f4d7a2 Rebuild a bunch of language models. Jehan 2021-03-15 10:20:14 +01:00
  • 82347030ba src: add a --weight option to the CLI tool. Jehan 2020-04-27 18:14:34 +02:00
  • 7f99b91388 src: new weight concept in the C API. Jehan 2020-04-27 18:07:32 +02:00
  • f15d097f29 src: fix the usage of uchardet tool. Jehan 2020-04-27 15:24:17 +02:00
  • 4a891ec4ac src: uchardet tool now shows the language code in verbose mode. Jehan 2020-04-23 18:39:17 +02:00
  • 1db089c7f8 script: update BuildLangModel.py to updated SequenceModel struct. Jehan 2020-04-23 18:36:24 +02:00
  • 911695f682 src: new API to get the detected language. Jehan 2020-04-23 18:24:12 +02:00
  • d1ed97b813 test: fix test script to use the new API and get rid of build warning. Jehan 2020-04-23 16:43:56 +02:00
  • ae4e3a7cbe src: new option --verbose|-V in the uchardet CLI tool. Jehan 2020-04-23 16:43:08 +02:00
  • 4da22cca97 src: new API to get all candidates and their confidence. Jehan 2020-04-23 16:40:02 +02:00
  • b43d938804 src: now reporting encoding+confidence and keeping a list. Jehan 2020-04-23 16:15:54 +02:00
  • 6f38ab95f5 Mention MacPorts in readme Aaron Madlon-Kay 2021-01-27 06:57:58 +00:00
  • c8a3572cca Issue #17: update README. Jehan 2020-04-29 16:12:54 +02:00
  • 472a906844 Issue #16: "i686" uname not properly detected as x86. Jehan 2020-04-28 20:41:01 +02:00
  • b376bcddea src: just a test. wip/Jehan/Issue16 Jehan 2020-04-27 16:04:20 +02:00
  • 1dee093bbd src: fix the usage of uchardet tool. Jehan 2020-04-27 15:24:17 +02:00
  • 8681fc060e build: Add uchardet CLI tool building support for MSVC myd7349 2020-04-25 18:43:02 +08:00
  • 5bcbd23acf build: Fix build errors on Windows myd7349 2020-02-17 02:11:11 +00:00
  • ff6e4eee07 src: uchardet tool now shows the language code in verbose mode. Jehan 2020-04-23 18:39:17 +02:00
  • dde09c7d08 script: update BuildLangModel.py to updated SequenceModel struct. Jehan 2020-04-23 18:36:24 +02:00
  • 4e967c9e88 src: new API to get the detected language. Jehan 2020-04-23 18:24:12 +02:00
  • 94736d1565 test: fix test script to use the new API and get rid of build warning. Jehan 2020-04-23 16:43:56 +02:00
  • c333362dd1 src: new option --verbose|-V in the uchardet CLI tool. Jehan 2020-04-23 16:43:08 +02:00
  • cdbba9ff3a src: new API to get all candidates and their confidence. Jehan 2020-04-23 16:40:02 +02:00
  • 4b7b0476fb src: now reporting encoding+confidence and keeping a list. Jehan 2020-04-23 16:15:54 +02:00
  • a49f8ef6ea doc: update README.maintainer. Jehan 2020-04-23 12:32:49 +02:00
  • 59f68dbe57 Release: version 0.0.7 v0.0.7 Jehan 2020-04-23 11:48:58 +02:00
  • 98bc2f31ef Issue #8: have BuildLangModel.py add ending newline to generated source. Jehan 2020-04-22 22:57:25 +02:00
  • 44a50c30ee Issue #8: no newline at end of file. Jehan 2020-04-22 22:53:25 +02:00
  • 6c7f32a751 Issue #10: Crashing sequence with nsSJISProber. Jehan 2020-04-22 22:11:51 +02:00
  • ef0313046b Also allow uchardet tool to detect encoding of a file named "--". Jehan 2020-04-22 21:11:23 +02:00
  • 4a37dfdf1c Issue #15: support "--" end-of-option. Jehan 2020-04-22 21:05:44 +02:00
  • ae7acbd0f2 Add dllexport to interface functions wangqr 2020-04-22 18:54:07 +00:00