uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-24 12:44:46 +08:00

Author	SHA1	Message	Date
Jehan	2bade77bf9	tests: update Window-1250 test file for Hungarian. ISO-8859-2 and Windows-1250 are absolutely similar for all letters in the Hungarian alphabet. So for most texts, it is not an error to return one charset or the other. What could make the difference is for instance that Windows-1250 has some symbols where ISO-8859-2 has control characters, like quotes, dashes, the euro symbol… Since control characters have a negative impact on confidence now, texts with such symbols would tend towards Windows-1250 decision. The new test file has such quote symbols.	2015-12-12 18:12:08 +01:00
Jehan	15afc5c593	test: add a Hungarian Windows-1250 test but skip it for now. Text from: https://hu.wikipedia.org/wiki/Magyar_nyelv	2015-12-03 21:18:55 +01:00
Jehan	683255278d	Re-enable Hungarian language models. Now that we have at least one model for ISO-8859-1, the risk of detecting all ISO-8859-1 texts as ISO-8859-2 is lessened.	2015-12-02 22:24:36 +01:00
Jehan	f4f9fc3f28	test: reenable Windows-1251 test for Russian. Commit 4f1c3ff actually fixed it!	2015-12-02 21:53:27 +01:00
Jehan	a8e9de307b	Add UTF-16 test files without BOM... ... and disable the tests for now for these since uchardet is not able to detect UTF-16 without a BOM as for now.	2015-11-28 19:50:18 +01:00
Jehan	005fd98086	Add initial support for French with ISO-8859-1 and ISO-8859-15. Mostly generated with a script from Wikipedia data (only the typical positive ratio is slightly modified). This is a first test before adding my generating script to the main tree.	2015-11-28 02:14:39 +01:00
Jehan	5dcff7b241	Hide away tests known to fail. Some charsets are simply not supported (ex: fr:iso-8859-1), some are temporarily deactivated (ex: hu:iso-8859-2) and some are wrongly detected as closely related charsets. These were broken (or not efficient) from the start, and there is no need to pollute the `make test` output with these, which may make us miss when actual regressions will occur. So let's hide these away for now until we can improve the situation.	2015-11-18 20:02:58 +01:00
Jehan	4b38e68aa2	CMake tests: separate the lang and charset with colon... ... rather than an hyphen. It makes it easier to read.	2015-11-18 19:42:35 +01:00
Jehan	eb727d3aca	Add automatic testing against every test file.	2015-11-18 18:18:27 +01:00

9 Commits