The alphabet was not complete and thus confidence was a bit too low.
For instance the VISCII test case's confidence bumped from 0.643401 to
0.696346 and the UTF-8 test case bumped from 0.863777 to 0.99.
Only the Windows-1258 test case is slightly worse from 0.532846 to
0.532098. But the overwhole recognition gain is obvious anyway.
I was planning on adding VISCII support as well, but Python encode()
method does not have any support for it apparently, so I cannot generate
the proper statistics data with the current version of the string.