ISO-8859-2 and Windows-1250 are absolutely similar for all letters in
the Hungarian alphabet. So for most texts, it is not an error to return
one charset or the other.
What could make the difference is for instance that Windows-1250 has
some symbols where ISO-8859-2 has control characters, like quotes,
dashes, the euro symbol…
Since control characters have a negative impact on confidence now,
texts with such symbols would tend towards Windows-1250 decision.
The new test file has such quote symbols.
Mostly generated with a script from Wikipedia data (only the typical
positive ratio is slightly modified).
This is a first test before adding my generating script to the main tree.
Some charsets are simply not supported (ex: fr:iso-8859-1), some are
temporarily deactivated (ex: hu:iso-8859-2) and some are wrongly
detected as closely related charsets.
These were broken (or not efficient) from the start, and there is no
need to pollute the `make test` output with these, which may make us
miss when actual regressions will occur. So let's hide these away for
now until we can improve the situation.