Jehan d26bc965ad src: drop the SURE_YES confidence for character distribution probers.
Some probers are based on character distribution analysis. Though it is
still relevant detection logics, we also know that it is a lot less
subtle than sequence distribution.

Therefore let's give a good confidence for a text passing such analysis,
yet not a near perfect one, thus leaving some chance for other probers.
In particular, we can definitely consider that if some text gets over
0.7 on sequence distribution analysis, this is a very likely candidate.

I had the case with the Finnish UTF-8 test which was passing (UTF-8,
Finnish) detection with a staggering 0.86 confidence, yet was overrided
by UHC (EUC-KR). This used to not be a problem when nsMBCSGroupProber
would check the UTF-8 prober first and stop there with just some basic
encoding detection. Now that we go further and return all relevant
candidates, some simpler detection algorithm which always return
too-good confidence is not the best idea.
2021-03-17 21:32:49 +01:00
..
LangModels src, script: regenerate all existing language models. 2021-03-17 02:07:17 +01:00
tools src: add a --weight option to the CLI tool. 2021-03-14 00:12:30 +01:00
Big5Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
CharDistribution.cpp src: drop the SURE_YES confidence for character distribution probers. 2021-03-17 21:32:49 +01:00
CharDistribution.h uchardet_get_charset() must return iconv-compatible names. 2015-11-17 16:15:21 +01:00
CMakeLists.txt New generic language detector class. 2021-03-16 18:37:09 +01:00
EUCKRFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
EUCTWFreq.tab Fix global-buffer-overflow due EUCTW_TABLE_SIZE 2020-04-22 17:06:40 +00:00
GB2312Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
JISFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
JpCntx.cpp Fixes boolean operation precedence warnings... 2015-11-18 19:38:12 +01:00
JpCntx.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsBig5Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsBig5Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsCharSetProber.cpp src: cast value to its proper type. 2017-08-27 13:01:30 +02:00
nsCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsCodingStateMachine.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nscore.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsEscCharsetProber.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEscCharsetProber.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEscSM.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEUCJPProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCJPProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCKRProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCKRProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCTWProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCTWProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsGB2312Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsGB2312Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsHebrewProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsHebrewProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsLanguageDetector.cpp src: tweak again the language detection confidence. 2021-03-17 12:51:25 +01:00
nsLanguageDetector.h src, script: regenerate all existing language models. 2021-03-17 02:07:17 +01:00
nsLatin1Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsLatin1Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsMBCSGroupProber.cpp src: make nsMBCSGroupProber report all valid candidates. 2021-03-17 16:38:20 +01:00
nsMBCSGroupProber.h src: make nsMBCSGroupProber report all valid candidates. 2021-03-17 16:38:20 +01:00
nsMBCSSM.cpp uchardet_get_charset() must return iconv-compatible names. 2015-11-17 16:15:21 +01:00
nsPkgInt.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsSBCharSetProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSBCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSBCSGroupProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSBCSGroupProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSJISProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSJISProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsUniversalDetector.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsUniversalDetector.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsUTF8Prober.cpp src: do not shortcut UTF-8 detection too early. 2021-03-17 21:26:31 +01:00
nsUTF8Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
prmem.h Initial release. 2011-07-10 15:04:42 +08:00
symbols.cmake src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00
uchardet.cpp src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00
uchardet.h src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00