From 9d29c3e26f4b8226b1d1394ecbe6712ba13d773c Mon Sep 17 00:00:00 2001 From: Jehan Date: Sat, 20 Mar 2021 23:02:10 +0100 Subject: [PATCH] src: fix negative confidence wrapping around because of unsigned int. In extreme case of more mCtrlChar than mTotalChar (since the later does not include control characters), we end up with a negative value, which in unsigned int becomes a huge integer. So because the confidence was so bad that it would be negative, we ended up in a huge confidence. We had this case with our Japanese UTF-8 test file which ended up identified as French ISO-8859-1. So I just cast the uint to float early on in order to avoid such pitfall. Now all our test cases succeed again, this time with full UTF-8+language support! Wouhou! --- src/nsSBCharSetProber.cpp | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/nsSBCharSetProber.cpp b/src/nsSBCharSetProber.cpp index fe6fba1..2dfe830 100644 --- a/src/nsSBCharSetProber.cpp +++ b/src/nsSBCharSetProber.cpp @@ -130,7 +130,7 @@ float nsSingleByteCharSetProber::GetConfidence(int candidate) /* The more control characters (proportionnaly to the size of the text), the * less confident we become in the current charset. */ - r = r * (mTotalChar - mCtrlChar) / mTotalChar; + r = r * ((float) mTotalChar - mCtrlChar) / mTotalChar; r = r*mFreqChar/mTotalChar; if (r >= (float)1.00) r = (float)0.99;