uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-12 06:30:05 +08:00

History

Jehan 04f9309932 tests: update ISO-8859-15 French test file. Previous technical text about charsets themselves were not relevant to identify a language. In particular the special characters different between ISO-8859-1 and ISO-8859-15 were used by themselves, out of a char sequence context. Therefore without language understanding, they could have as well been representing the ISO-8859-15 letters or the ISO-8859-1 symbols at the corresponding codepoints. Replacing with text from this Wikipedia page: https://fr.wikipedia.org/wiki/Œuf_(cuisine) This uses some of these same characters (in particular 'œ') but in contextual character sequences, making it relevant for our algorithm.		2015-11-30 00:19:15 +01:00
..
bg	Reorganize test files in language subdirectories.	2015-11-17 21:12:39 +01:00
el	Add Greek test files.	2015-11-18 02:57:09 +01:00
en	Add an ASCII test file for English...	2015-11-28 17:49:13 +01:00
fr	tests: update ISO-8859-15 French test file.	2015-11-30 00:19:15 +01:00
he	Add Hebrew test files.	2015-11-18 03:16:18 +01:00
hu	Reorganize test files in language subdirectories.	2015-11-17 21:12:39 +01:00
ja	Add UTF-16 test files without BOM...	2015-11-28 19:50:18 +01:00
ko	Adding UTF-8 file for Korean.	2015-11-18 02:36:33 +01:00
ru	Add some Russian test files.	2015-11-27 18:17:20 +01:00
th	Add Thai test file for UTF-8.	2015-11-18 03:26:34 +01:00
zh	Adding some more test files for Russian and Chinese.	2015-11-18 19:27:38 +01:00
CMakeLists.txt	Add UTF-16 test files without BOM...	2015-11-28 19:50:18 +01:00
uchardet-tests.c	Add automatic testing against every test file.	2015-11-18 18:18:27 +01:00