Jehan 7459a4d9b3 src: consider any combination with a non-frequent character as sequence.
Basically since we excluse non-letters (Control chars, punctuations,
spaces, separators, emoticones and whatnot), we consider any remaining
character as an off-script letter (we may have forgotten some cases, but
so far, it looks promising). Hence it is normal to consider a
combination with these (i.e. 2 off-script letters or 1 frequent letter +
1 off-script, in any order) as a sequence too. Doing so will drop the
confidence even more of any text having too much of these. As a
consequence, it expands again the gap between the first and second
contender, which seems to really show it works.
2021-03-19 22:43:35 +01:00
..
LangModels src: add Hindi/UTF-8 support. 2021-03-19 22:36:30 +01:00
tools src: add a --weight option to the CLI tool. 2021-03-14 00:12:30 +01:00
Big5Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
CharDistribution.cpp add charset prober for Johab Korean 2021-03-17 23:48:11 +01:00
CharDistribution.h src: build new charset prober for Johab Korean. 2021-03-17 23:48:20 +01:00
CMakeLists.txt src: add Hindi/UTF-8 support. 2021-03-19 22:36:30 +01:00
EUCKRFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
EUCTWFreq.tab Fix global-buffer-overflow due EUCTW_TABLE_SIZE 2020-04-22 17:06:40 +00:00
GB2312Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
JISFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
JohabFreq.tab src: build new charset prober for Johab Korean. 2021-03-17 23:48:20 +01:00
JpCntx.cpp Fixes boolean operation precedence warnings... 2015-11-18 19:38:12 +01:00
JpCntx.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsBig5Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsBig5Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsCharSetProber.cpp src: cast value to its proper type. 2017-08-27 13:01:30 +02:00
nsCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsCodingStateMachine.h add charset prober for Johab Korean 2021-03-17 23:48:11 +01:00
nscore.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsEscCharsetProber.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEscCharsetProber.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEscSM.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEUCJPProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCJPProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCKRProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCKRProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCTWProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCTWProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsGB2312Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsGB2312Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsHebrewProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsHebrewProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsJohabProber.cpp src, test: fix the new Johab prober and add a test. 2021-03-18 00:26:49 +01:00
nsJohabProber.h src, test: fix the new Johab prober and add a test. 2021-03-18 00:26:49 +01:00
nsLanguageDetector.cpp src: consider any combination with a non-frequent character as sequence. 2021-03-19 22:43:35 +01:00
nsLanguageDetector.h src: add Hindi/UTF-8 support. 2021-03-19 22:36:30 +01:00
nsLatin1Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsLatin1Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsMBCSGroupProber.cpp src: add Hindi/UTF-8 support. 2021-03-19 22:36:30 +01:00
nsMBCSGroupProber.h src: add Hindi/UTF-8 support. 2021-03-19 22:36:30 +01:00
nsMBCSSM.cpp add charset prober for Johab Korean 2021-03-17 23:48:11 +01:00
nsPkgInt.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsSBCharSetProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSBCharSetProber.h script, src: generate the Hebrew models. 2021-03-17 23:22:50 +01:00
nsSBCSGroupProber.cpp script, src: generate the Hebrew models. 2021-03-17 23:22:50 +01:00
nsSBCSGroupProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSJISProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSJISProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsUniversalDetector.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsUniversalDetector.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsUTF8Prober.cpp src: do not shortcut UTF-8 detection too early. 2021-03-17 21:26:31 +01:00
nsUTF8Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
prmem.h Initial release. 2011-07-10 15:04:42 +08:00
symbols.cmake src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00
uchardet.cpp src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00
uchardet.h src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00