Jehan 8b1755cac2 src: do not shortcut UTF-8 detection too early.
I had the case with the Czech test which was considered as Irish after
being shortcutted far too early after only 16 characters. Confidence
values was just barely above 0.5 for Irish (and barely below for Czech).

By adding a threshold (at least 256 characters), we give a bit of
relevant data to the engine to actually make an informed decision. By
then, the Czech detection was at more than 0.7, whereas the Irish one at
0.6.
2021-03-17 21:26:31 +01:00
..
LangModels src, script: regenerate all existing language models. 2021-03-17 02:07:17 +01:00
tools src: add a --weight option to the CLI tool. 2021-03-14 00:12:30 +01:00
Big5Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
CharDistribution.cpp Update code from upstream. 2011-07-11 14:42:50 +08:00
CharDistribution.h uchardet_get_charset() must return iconv-compatible names. 2015-11-17 16:15:21 +01:00
CMakeLists.txt New generic language detector class. 2021-03-16 18:37:09 +01:00
EUCKRFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
EUCTWFreq.tab Fix global-buffer-overflow due EUCTW_TABLE_SIZE 2020-04-22 17:06:40 +00:00
GB2312Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
JISFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
JpCntx.cpp Fixes boolean operation precedence warnings... 2015-11-18 19:38:12 +01:00
JpCntx.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsBig5Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsBig5Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsCharSetProber.cpp src: cast value to its proper type. 2017-08-27 13:01:30 +02:00
nsCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsCodingStateMachine.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nscore.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsEscCharsetProber.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEscCharsetProber.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEscSM.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsEUCJPProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCJPProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCKRProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCKRProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCTWProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsEUCTWProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsGB2312Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsGB2312Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsHebrewProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsHebrewProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsLanguageDetector.cpp src: tweak again the language detection confidence. 2021-03-17 12:51:25 +01:00
nsLanguageDetector.h src, script: regenerate all existing language models. 2021-03-17 02:07:17 +01:00
nsLatin1Prober.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsLatin1Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsMBCSGroupProber.cpp src: make nsMBCSGroupProber report all valid candidates. 2021-03-17 16:38:20 +01:00
nsMBCSGroupProber.h src: make nsMBCSGroupProber report all valid candidates. 2021-03-17 16:38:20 +01:00
nsMBCSSM.cpp uchardet_get_charset() must return iconv-compatible names. 2015-11-17 16:15:21 +01:00
nsPkgInt.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsSBCharSetProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSBCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSBCSGroupProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSBCSGroupProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSJISProber.cpp src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsSJISProber.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
nsUniversalDetector.cpp src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsUniversalDetector.h src: nsEscCharsetProber also returns the correct language. 2021-03-17 17:15:56 +01:00
nsUTF8Prober.cpp src: do not shortcut UTF-8 detection too early. 2021-03-17 21:26:31 +01:00
nsUTF8Prober.h src: allow for nsCharSetProber to return several candidates. 2021-03-17 13:29:13 +01:00
prmem.h Initial release. 2011-07-10 15:04:42 +08:00
symbols.cmake src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00
uchardet.cpp src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00
uchardet.h src: new weight concept in the C API. 2021-03-14 00:12:30 +01:00