This prober comes from MR !1 on the main branch though it was too
agressive then and could not get merged. On the improved API branch, it
doesn't detect other tests as Johab anymore.
Also fixing it to work with the new API.
Finally adding a Johab/ko unit test.
I realize that the language information a text has been written in is
very important since it would completely change the character
distribution. Our test files should take this into account, and we
should create several test files in different languages for encoding
used in various languages.