Jehan 0fe51d3851 Issue #21: Greek CP737 support.
It actually breaks "zh:big5" so I'm going to hold-off a bit. Adding more
language and charset support is slowly starting to show the limitations
of our legacy multi-byte charset supports, since I haven't really
touched these since the original implementation of Mozilla.

It might be time to start reviewing these parts of the code.

The test file contents comes from 'Μαρμότα' page on Wikipedia in Greek
(though since 2 letters are missing in this encoding, despite its
popularity for Greek, I had to be careful in choosing pieces of text
without such letters).
2022-12-18 22:33:12 +01:00
..
LangModels Issue #21: Greek CP737 support. 2022-12-18 22:33:12 +01:00
tools src: add a --language|-l option to the uchardet CLI tool. 2022-12-14 00:24:53 +01:00
Big5Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
CharDistribution.cpp add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
CharDistribution.h src: build new charset prober for Johab Korean. 2022-12-14 00:23:13 +01:00
CMakeLists.txt script, src, test: add Serbian support. 2022-12-17 22:47:54 +01:00
EUCKRFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
EUCTWFreq.tab Fix global-buffer-overflow due EUCTW_TABLE_SIZE 2020-04-22 17:06:40 +00:00
GB2312Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
JISFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
JohabFreq.tab src: build new charset prober for Johab Korean. 2022-12-14 00:23:13 +01:00
JpCntx.cpp Fixes boolean operation precedence warnings... 2015-11-18 19:38:12 +01:00
JpCntx.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsBig5Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsBig5Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsCharSetProber.cpp src: cast value to its proper type. 2017-08-27 13:01:30 +02:00
nsCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsCJKDetector.cpp src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. 2022-12-14 00:24:53 +01:00
nsCJKDetector.h src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. 2022-12-14 00:24:53 +01:00
nsCodingStateMachine.h add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
nscore.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsEscCharsetProber.cpp src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEscCharsetProber.h src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEscSM.cpp src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEUCJPProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCJPProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCKRProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCKRProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCTWProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCTWProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsGB2312Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsGB2312Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsHebrewProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsHebrewProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsJohabProber.cpp src, test: fix the new Johab prober and add a test. 2022-12-14 00:23:13 +01:00
nsJohabProber.h src, test: fix the new Johab prober and add a test. 2022-12-14 00:23:13 +01:00
nsLanguageDetector-generated.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsLanguageDetector.cpp src: improve algorithm for confidence computation. 2022-12-14 20:02:59 +01:00
nsLanguageDetector.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsLatin1Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsLatin1Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsMBCSGroupProber.cpp script, src, test: add Serbian support. 2022-12-17 22:47:54 +01:00
nsMBCSGroupProber.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsMBCSSM.cpp add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
nsPkgInt.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsSBCharSetProber-generated.h Issue #21: Greek CP737 support. 2022-12-18 22:33:12 +01:00
nsSBCharSetProber.cpp src: improve confidence computation (generic and single-byte charset). 2022-12-14 00:24:53 +01:00
nsSBCharSetProber.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsSBCSGroupProber.cpp Issue #21: Greek CP737 support. 2022-12-18 22:33:12 +01:00
nsSBCSGroupProber.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsSJISProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsSJISProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsUniversalDetector.cpp src: reset shortcut charset/language on Reset(). 2022-12-14 00:24:53 +01:00
nsUniversalDetector.h src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsUTF8Prober.cpp src: drop less of UTF-8 confidence even with few non-multibyte chars. 2022-12-14 00:24:53 +01:00
nsUTF8Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
prmem.h Initial release. 2011-07-10 15:04:42 +08:00
symbols.cmake src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. 2022-12-14 00:24:53 +01:00
uchardet.cpp src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. 2022-12-14 00:24:53 +01:00
uchardet.h src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. 2022-12-14 00:24:53 +01:00