5 Commits

Author SHA1 Message Date
Jehan
15fc8f0a0f src: now reporting encoding+confidence and keeping a list.
Preparing for an updated API which will also allow to loop at the
confidence value, as well as get the list of possible candidate (i.e.
all detected encoding which had a confidence value high enough so that
we would even consider them).
It is still only internal logics though.
2022-12-14 00:23:13 +01:00
Jehan
4c8316f9cf Nearly-ASCII text with NBSP is still not ASCII.
There is no "exception" in encoding. The non-breaking space 0xA0 is not
ASCII, and therefore returning "ASCII" will later create issues (for
instance trying to re-encode with iconv produces an error).
This was obviously an explicit decision in original code (according to
code comments), probably tied to specifity of the original program from
Mozilla. Now we want strict detection.
I will return "ISO-8859-1" for "nearly-ASCII texts with NBSP as only
exception" (note that I could have returned any ISO-8859 charsets since
they all have this character in common).
2015-12-05 21:11:29 +01:00
Jehan
9a74d08b3c Fix minor space issues. 2015-11-24 00:15:44 +01:00
BYVoid
84284eccf4 Update code from upstream. 2011-07-11 14:42:50 +08:00
BYVoid
3601900164 Initial release. 2011-07-10 15:04:42 +08:00