4 Commits

Author SHA1 Message Date
Jehan
157de1dc65 src: the EUC-KR prober now returns "UHC" as encoding name.
"UHC" is the "Unified Hangul Code" (aka Windows-949 or CP949). It is
apparently "mostly" upward compatible with EUC-KR so returning UHC for
a strict EUC-KR document is usually not to be considered wrong.
Yet I can read that EUC-KR has its own way of representing hangul
syllables not available in precomposed form, and this is not supported
in UHC (since this latter has all possible precomposed syllables), hence
the "mostly" upward-compatibility.
My personal daily experience with Korean documents though is that I
encounter a lot of UHC-encoded files, probably because of predominance
of Microsoft operating systems, which spread this encoding.
So until we get 2 separate detection machines, let's just return EUC-KR
files as being "UHC".
2016-09-19 01:22:45 +02:00
BYVoid
84284eccf4 Update code from upstream. 2011-07-11 14:42:50 +08:00
BYVoid
e948063c0e Refine ucharder.h 2011-07-10 15:41:24 +08:00
BYVoid
3601900164 Initial release. 2011-07-10 15:04:42 +08:00