uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-13 23:20:08 +08:00

Author	SHA1	Message	Date
LSY	d72a5c88ce	add charset prober for Johab Korean	2022-12-14 00:23:13 +01:00
Jehan	a7c5a167a9	src: drop the SURE_YES confidence for character distribution probers. Some probers are based on character distribution analysis. Though it is still relevant detection logics, we also know that it is a lot less subtle than sequence distribution. Therefore let's give a good confidence for a text passing such analysis, yet not a near perfect one, thus leaving some chance for other probers. In particular, we can definitely consider that if some text gets over 0.7 on sequence distribution analysis, this is a very likely candidate. I had the case with the Finnish UTF-8 test which was passing (UTF-8, Finnish) detection with a staggering 0.86 confidence, yet was overrided by UHC (EUC-KR). This used to not be a problem when nsMBCSGroupProber would check the UTF-8 prober first and stop there with just some basic encoding detection. Now that we go further and return all relevant candidates, some simpler detection algorithm which always return too-good confidence is not the best idea.	2022-12-14 00:23:13 +01:00
BYVoid	84284eccf4	Update code from upstream.	2011-07-11 14:42:50 +08:00
BYVoid	3601900164	Initial release.	2011-07-10 15:04:42 +08:00