For charsets UTF-8, GEORGIAN-ACADEMY and GEORGIAN-PS. The 2 GEORGIAN-*
sets were generated thanks to the new create-table.py script.
Test text comes from page 'ვირზაზუნა' page of Wikipedia in Georgian.
For UTF-8, ISO-8859-1 and WINDOWS-1252 support.
The test for UTF-8 and ISO-8859-1 is taken from 'Marmota' page on
Wikipedia in Catalan. The test for WINDOWS-1252 is taken from the
'Unió_Europea' page. ISO-8859-1 and WINDOWS-1252 being very similar,
regarding most letters (in particular the ones used in Catalan), I
differentiated the test with a text containing the '€' symbol, which is
on an unused spot in ISO-8859-1.
Right now, each time we add new language or new charset support, we have
too many pieces of code not to forget to edit. The script
script/BuildLangModel.py will now take care of the main parts: listing
the sequence models, listing the generic language models and computing
the numbers for each listing.
Furthermore the script will now end with a TODO list of the parts which
are still to be done manually (2 functions to edit and a CMakeLists).
Finally the script now allows to give a list of languages to edit rather
of having to run it with languages one by one. It also allows 2 special
code: "none", which will retrain none of the languages, but will
re-generate only the new generated listings; and "all" which will
retrain all models (useful in particulare when we change the model
formats or usage and want to regenerate everything).