From 157de1dc651a0032827dcd757232d5f24454f80f Mon Sep 17 00:00:00 2001
From: Jehan <jehan@girinstud.io>
Date: Mon, 19 Sep 2016 01:22:45 +0200
Subject: [PATCH] src: the EUC-KR prober now returns "UHC" as encoding name.

"UHC" is the "Unified Hangul Code" (aka Windows-949 or CP949). It is
apparently "mostly" upward compatible with EUC-KR so returning UHC for
a strict EUC-KR document is usually not to be considered wrong.
Yet I can read that EUC-KR has its own way of representing hangul
syllables not available in precomposed form, and this is not supported
in UHC (since this latter has all possible precomposed syllables), hence
the "mostly" upward-compatibility.
My personal daily experience with Korean documents though is that I
encounter a lot of UHC-encoded files, probably because of predominance
of Microsoft operating systems, which spread this encoding.
So until we get 2 separate detection machines, let's just return EUC-KR
files as being "UHC".
---
 src/nsEUCKRProber.h | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/src/nsEUCKRProber.h b/src/nsEUCKRProber.h
index 8e09984..954c038 100644
--- a/src/nsEUCKRProber.h
+++ b/src/nsEUCKRProber.h
@@ -51,7 +51,12 @@ public:
   }
   virtual ~nsEUCKRProber(void){delete mCodingSM;}
   nsProbingState HandleData(const char* aBuf, PRUint32 aLen);
-  const char* GetCharSetName() {return "EUC-KR";}
+  /* "Unified Hangul Code", also called "CP949" or "Windows-949" is a
+   * superset of EUC-KR. Though not fully ok to return UHC here (a
+   * separate prober would be better), it is acceptable, since many
+   * Korean documents are actually created with this character set.
+   */
+  const char* GetCharSetName() {return "UHC";}
   nsProbingState GetState(void) {return mState;}
   void      Reset(void);
   float     GetConfidence(void);