uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-11 22:20:05 +08:00

Author	SHA1	Message	Date
Jehan	4e967c9e88	src: new API to get the detected language. This doesn't work for all probers yet, in particular not for the most generic probers (such as UTF-8) or WINDOWS-1252. These will return NULL. It's still a good first step. Right now, it returns the 2-character language code from ISO 639-1. A using project could easily get the English language name from the XML/json files provided by the iso-codes project. This project will also allow to easily localize the language name in other languages through gettext (this is what we do in GIMP for instance). I don't add any dependency though and leave it to downstream projects to implement this. I was also wondering if we want to support region information for cases when it would make sense. I especially wondered about it for Chinese encodings as some of them seem quite specific to a region (according to Wikipedia at least). For the time being though, these just return "zh". We'll see later if it makes sense to be more accurate (maybe depending on reports?).	2020-04-23 18:39:49 +02:00
Jehan	4b7b0476fb	src: now reporting encoding+confidence and keeping a list. Preparing for an updated API which will also allow to loop at the confidence value, as well as get the list of possible candidate (i.e. all detected encoding which had a confidence value high enough so that we would even consider them). It is still only internal logics though.	2020-04-23 16:15:54 +02:00
Jehan	4c8316f9cf	Nearly-ASCII text with NBSP is still not ASCII. There is no "exception" in encoding. The non-breaking space 0xA0 is not ASCII, and therefore returning "ASCII" will later create issues (for instance trying to re-encode with iconv produces an error). This was obviously an explicit decision in original code (according to code comments), probably tied to specifity of the original program from Mozilla. Now we want strict detection. I will return "ISO-8859-1" for "nearly-ASCII texts with NBSP as only exception" (note that I could have returned any ISO-8859 charsets since they all have this character in common).	2015-12-05 21:11:29 +01:00
Jehan	9a74d08b3c	Fix minor space issues.	2015-11-24 00:15:44 +01:00
BYVoid	84284eccf4	Update code from upstream.	2011-07-11 14:42:50 +08:00
BYVoid	3601900164	Initial release.	2011-07-10 15:04:42 +08:00

6 Commits