uchardet

coffee/uchardet

Fork 0

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-08 01:36:41 +08:00

Commit Graph

Author	SHA1	Message	Date
Jehan	2bade77bf9	tests: update Window-1250 test file for Hungarian. ISO-8859-2 and Windows-1250 are absolutely similar for all letters in the Hungarian alphabet. So for most texts, it is not an error to return one charset or the other. What could make the difference is for instance that Windows-1250 has some symbols where ISO-8859-2 has control characters, like quotes, dashes, the euro symbol… Since control characters have a negative impact on confidence now, texts with such symbols would tend towards Windows-1250 decision. The new test file has such quote symbols.	2015-12-12 18:12:08 +01:00
Jehan	15afc5c593	test: add a Hungarian Windows-1250 test but skip it for now. Text from: https://hu.wikipedia.org/wiki/Magyar_nyelv	2015-12-03 21:18:55 +01:00
Jehan	0efcdfa546	Reorganize test files in language subdirectories. I realize that the language information a text has been written in is very important since it would completely change the character distribution. Our test files should take this into account, and we should create several test files in different languages for encoding used in various languages.	2015-11-17 21:12:39 +01:00

Author

SHA1

Message

Date

Jehan

2bade77bf9

tests: update Window-1250 test file for Hungarian.

ISO-8859-2 and Windows-1250 are absolutely similar for all letters in
the Hungarian alphabet. So for most texts, it is not an error to return
one charset or the other.
What could make the difference is for instance that Windows-1250 has
some symbols where ISO-8859-2 has control characters, like quotes,
dashes, the euro symbol…
Since control characters have a negative impact on confidence now,
texts with such symbols would tend towards Windows-1250 decision.
The new test file has such quote symbols.

2015-12-12 18:12:08 +01:00

Jehan

15afc5c593

test: add a Hungarian Windows-1250 test but skip it for now.

Text from: https://hu.wikipedia.org/wiki/Magyar_nyelv

2015-12-03 21:18:55 +01:00

Jehan

0efcdfa546

Reorganize test files in language subdirectories.

I realize that the language information a text has been written in is
very important since it would completely change the character
distribution. Our test files should take this into account, and we
should create several test files in different languages for encoding
used in various languages.

2015-11-17 21:12:39 +01:00

3 Commits