Jehan bed459c6e7 src: drop less of UTF-8 confidence even with few non-multibyte chars.
Some languages are not meant to have multibyte characters. For instance,
English would typically have none. Yet you can still have UTF-8 English
text (with a few special characters, or foreign words…). So anyway let's
make it less of a deal breaker.

To be even fairer, the whole logics is biased of course and I believe
that eventually we should get rid of these lines of code dropping
confidence on a number of character. This is a ridiculous rule (we base
on our whole logics on language statistics and suddenly we add some
weird rule with a completely random number). But for now, I'll keep this
as-is until we make the whole library even more robust.
2022-12-14 00:24:53 +01:00
..
LangModels script, src: regenerate the Thai model. 2022-12-14 00:24:53 +01:00
tools src: add a --weight option to the CLI tool. 2022-12-14 00:23:13 +01:00
Big5Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
CharDistribution.cpp add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
CharDistribution.h src: build new charset prober for Johab Korean. 2022-12-14 00:23:13 +01:00
CMakeLists.txt script, src: remove generated statistics data for Korean. 2022-12-14 00:24:53 +01:00
EUCKRFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
EUCTWFreq.tab Fix global-buffer-overflow due EUCTW_TABLE_SIZE 2020-04-22 17:06:40 +00:00
GB2312Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
JISFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
JohabFreq.tab src: build new charset prober for Johab Korean. 2022-12-14 00:23:13 +01:00
JpCntx.cpp Fixes boolean operation precedence warnings... 2015-11-18 19:38:12 +01:00
JpCntx.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsBig5Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsBig5Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsCharSetProber.cpp src: cast value to its proper type. 2017-08-27 13:01:30 +02:00
nsCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsCJKDetector.cpp src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. 2022-12-14 00:24:53 +01:00
nsCJKDetector.h src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. 2022-12-14 00:24:53 +01:00
nsCodingStateMachine.h add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
nscore.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsEscCharsetProber.cpp src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEscCharsetProber.h src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEscSM.cpp src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEUCJPProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCJPProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCKRProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCKRProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCTWProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCTWProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsGB2312Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsGB2312Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsHebrewProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsHebrewProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsJohabProber.cpp src, test: fix the new Johab prober and add a test. 2022-12-14 00:23:13 +01:00
nsJohabProber.h src, test: fix the new Johab prober and add a test. 2022-12-14 00:23:13 +01:00
nsLanguageDetector.cpp src: improve confidence computation (generic and single-byte charset). 2022-12-14 00:24:53 +01:00
nsLanguageDetector.h script, src: remove generated statistics data for Korean. 2022-12-14 00:24:53 +01:00
nsLatin1Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsLatin1Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsMBCSGroupProber.cpp script, src: remove generated statistics data for Korean. 2022-12-14 00:24:53 +01:00
nsMBCSGroupProber.h src: add Hindi/UTF-8 support. 2022-12-14 00:23:13 +01:00
nsMBCSSM.cpp add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
nsPkgInt.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsSBCharSetProber.cpp src: improve confidence computation (generic and single-byte charset). 2022-12-14 00:24:53 +01:00
nsSBCharSetProber.h src: improve confidence computation (generic and single-byte charset). 2022-12-14 00:24:53 +01:00
nsSBCSGroupProber.cpp script, src: generate the Hebrew models. 2022-12-14 00:23:13 +01:00
nsSBCSGroupProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsSJISProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsSJISProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsUniversalDetector.cpp src: reset shortcut charset/language on Reset(). 2022-12-14 00:24:53 +01:00
nsUniversalDetector.h src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsUTF8Prober.cpp src: drop less of UTF-8 confidence even with few non-multibyte chars. 2022-12-14 00:24:53 +01:00
nsUTF8Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
prmem.h Initial release. 2011-07-10 15:04:42 +08:00
symbols.cmake src: new weight concept in the C API. 2022-12-14 00:23:13 +01:00
uchardet.cpp src: new weight concept in the C API. 2022-12-14 00:23:13 +01:00
uchardet.h src: new weight concept in the C API. 2022-12-14 00:23:13 +01:00