mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-08 01:36:41 +08:00
src: fix negative confidence wrapping around because of unsigned int.
In extreme case of more mCtrlChar than mTotalChar (since the later does not include control characters), we end up with a negative value, which in unsigned int becomes a huge integer. So because the confidence was so bad that it would be negative, we ended up in a huge confidence. We had this case with our Japanese UTF-8 test file which ended up identified as French ISO-8859-1. So I just cast the uint to float early on in order to avoid such pitfall. Now all our test cases succeed again, this time with full UTF-8+language support! Wouhou!
This commit is contained in:
parent
4ef378ce2e
commit
e6b4811c9b
@ -130,7 +130,7 @@ float nsSingleByteCharSetProber::GetConfidence(int candidate)
|
||||
/* The more control characters (proportionnaly to the size of the text), the
|
||||
* less confident we become in the current charset.
|
||||
*/
|
||||
r = r * (mTotalChar - mCtrlChar) / mTotalChar;
|
||||
r = r * ((float) mTotalChar - mCtrlChar) / mTotalChar;
|
||||
r = r*mFreqChar/mTotalChar;
|
||||
if (r >= (float)1.00)
|
||||
r = (float)0.99;
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user