From 362e36d1ed9380522e489ca04a11f09be4c00566 Mon Sep 17 00:00:00 2001 From: Jehan Date: Tue, 17 Nov 2015 16:36:17 +0100 Subject: [PATCH] Add EUC-KR test file. Contains text taken from Wikipedia on EUC-KR page in Korean. https://ko.wikipedia.org/wiki/EUC-KR I added it as a simili-subtitle file because as the original Mozilla paper says: "The input text may contain extraneous noises which have no relation to its encoding, e.g. HTML tags, non-native words". Therefore I feel it is important to have test files a little noisy if possible, in order to test our resistance to noise in our algorithm. --- test/euc-kr.smi | 16 ++++++++++++++++ 1 file changed, 16 insertions(+) create mode 100644 test/euc-kr.smi diff --git a/test/euc-kr.smi b/test/euc-kr.smi new file mode 100644 index 0000000..1b71cb6 --- /dev/null +++ b/test/euc-kr.smi @@ -0,0 +1,16 @@ + + +EUC-KR.smi + + + +

EUC-KR +

EUC-KRÀº KS X 1001¿Í KS X 1003À» »ç¿ëÇÏ´Â 8ºñÆ® ¹®ÀÚ ÀÎÄÚµùÀ¸·Î, EUCÀÇ ÀÏÁ¾ÀÌ¸ç ´ëÇ¥ÀûÀÎ ÇÑ±Û ¿Ï¼ºÇü ÀÎÄÚµùÀ̱⠶§¹®¿¡ º¸Åë ¿Ï¼ºÇüÀ̶ó°í ºÒ¸°´Ù. +

EUC-KR ÀÎÄÚµùÀº ´ÙÀ½°ú °°ÀÌ ±¸¼ºµÈ´Ù. +

128º¸´Ù ÀÛÀº ¹ÙÀÌÆ®¿¡ KS X 1003À» ¹è´çÇÑ´Ù. +

128º¸´Ù Å©°Å³ª °°Àº ¹ÙÀÌÆ®¿¡ KS X 1001À» ¹è´çÇÑ´Ù. °¢ ±ÛÀÚ´Â Çà°ú ¿­¿¡ 128À» ´õÇÑ Äڵ尪À» »ç¿ëÇÏ¿© 2¹ÙÀÌÆ®·Î Ç¥ÇöµÈ´Ù. +

µû¶ó¼­ KS X 1001ÀÇ 40-27¿¡ ¹è´çµÈ "À§"¶ó´Â ±ÛÀÚ´Â EUC-KR¿¡¼­ C0 A7¶ó´Â ¹ÙÀÌÆ® ¿­·Î Ç¥ÇöµÈ´Ù. +

KS X 1001¿¡´Â ÇÑ±Û Ã¤¿ò ¹®ÀÚ¸¦ »ç¿ëÇÏ¿© ±Ô°ÝÀÇ ¹®ÀÚ ÁýÇÕ¿¡ Æ÷ÇÔµÇÁö ¾ÊÀº ÇѱÛÀ» Ç¥ÇöÇÏ´Â È®Àå ¹æ¹ýÀÌ ÀÖÁö¸¸, ´ëºÎºÐÀÇ °æ¿ì ÀÌ ¹æ¹ýÀº EUC-KR¿¡¼­ »ç¿ëµÇÁö ¾Ê°í ´ë½Å CP949¿Í °°Àº ´Ù¸¥ ¹æ¹ýÀ» »ç¿ëÇÏ¿© KS X 1001 ¹Ù±ùÀÇ Çö´ë ÇѱÛÀ» Ç¥ÇöÇÑ´Ù. + +