Jehan db836fad63 script, src: generate more code for language and sequence model listing.
Right now, each time we add new language or new charset support, we have
too many pieces of code not to forget to edit. The script
script/BuildLangModel.py will now take care of the main parts: listing
the sequence models, listing the generic language models and computing
the numbers for each listing.

Furthermore the script will now end with a TODO list of the parts which
are still to be done manually (2 functions to edit and a CMakeLists).

Finally the script now allows to give a list of languages to edit rather
of having to run it with languages one by one. It also allows 2 special
code: "none", which will retrain none of the languages, but will
re-generate only the new generated listings; and "all" which will
retrain all models (useful in particulare when we change the model
formats or usage and want to regenerate everything).
2022-12-18 17:23:34 +01:00
..
LangModels script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
tools src: add a --language|-l option to the uchardet CLI tool. 2022-12-14 00:24:53 +01:00
Big5Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
CharDistribution.cpp add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
CharDistribution.h src: build new charset prober for Johab Korean. 2022-12-14 00:23:13 +01:00
CMakeLists.txt script, src, test: add Serbian support. 2022-12-17 22:47:54 +01:00
EUCKRFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
EUCTWFreq.tab Fix global-buffer-overflow due EUCTW_TABLE_SIZE 2020-04-22 17:06:40 +00:00
GB2312Freq.tab Initial release. 2011-07-10 15:04:42 +08:00
JISFreq.tab Initial release. 2011-07-10 15:04:42 +08:00
JohabFreq.tab src: build new charset prober for Johab Korean. 2022-12-14 00:23:13 +01:00
JpCntx.cpp Fixes boolean operation precedence warnings... 2015-11-18 19:38:12 +01:00
JpCntx.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsBig5Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsBig5Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsCharSetProber.cpp src: cast value to its proper type. 2017-08-27 13:01:30 +02:00
nsCharSetProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsCJKDetector.cpp src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. 2022-12-14 00:24:53 +01:00
nsCJKDetector.h src: new nsCJKDetector specifically Chinese/Japanese/Korean recognition. 2022-12-14 00:24:53 +01:00
nsCodingStateMachine.h add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
nscore.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsEscCharsetProber.cpp src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEscCharsetProber.h src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEscSM.cpp src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsEUCJPProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCJPProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCKRProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCKRProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCTWProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsEUCTWProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsGB2312Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsGB2312Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsHebrewProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsHebrewProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsJohabProber.cpp src, test: fix the new Johab prober and add a test. 2022-12-14 00:23:13 +01:00
nsJohabProber.h src, test: fix the new Johab prober and add a test. 2022-12-14 00:23:13 +01:00
nsLanguageDetector-generated.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsLanguageDetector.cpp src: improve algorithm for confidence computation. 2022-12-14 20:02:59 +01:00
nsLanguageDetector.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsLatin1Prober.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsLatin1Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsMBCSGroupProber.cpp script, src, test: add Serbian support. 2022-12-17 22:47:54 +01:00
nsMBCSGroupProber.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsMBCSSM.cpp add charset prober for Johab Korean 2022-12-14 00:23:13 +01:00
nsPkgInt.h Update code from upstream. 2011-07-11 14:42:50 +08:00
nsSBCharSetProber-generated.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsSBCharSetProber.cpp src: improve confidence computation (generic and single-byte charset). 2022-12-14 00:24:53 +01:00
nsSBCharSetProber.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsSBCSGroupProber.cpp script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsSBCSGroupProber.h script, src: generate more code for language and sequence model listing. 2022-12-18 17:23:34 +01:00
nsSJISProber.cpp src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsSJISProber.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
nsUniversalDetector.cpp src: reset shortcut charset/language on Reset(). 2022-12-14 00:24:53 +01:00
nsUniversalDetector.h src: nsEscCharsetProber also returns the correct language. 2022-12-14 00:23:13 +01:00
nsUTF8Prober.cpp src: drop less of UTF-8 confidence even with few non-multibyte chars. 2022-12-14 00:24:53 +01:00
nsUTF8Prober.h src: allow for nsCharSetProber to return several candidates. 2022-12-14 00:23:13 +01:00
prmem.h Initial release. 2011-07-10 15:04:42 +08:00
symbols.cmake src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. 2022-12-14 00:24:53 +01:00
uchardet.cpp src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. 2022-12-14 00:24:53 +01:00
uchardet.h src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/. 2022-12-14 00:24:53 +01:00