4 Commits

Author SHA1 Message Date
Jehan
e6e51d9fe8 src: all language models now rebuilt after the fix. 2022-12-15 14:31:55 +01:00
Jehan
6bb1b3e101 scripts: all language models rebuilt with the new ratio data. 2022-12-14 20:16:44 +01:00
Jehan
7f386d922e script, src: rebuild the English model.
The previous model was most obviously wrong: all letters had the same
probability, even non-ASCII ones! Anyway this new model does make unit
tests a tiny bit better though the English detection is still weak (I
have more concepts which I want to experiment to get this better).
2022-12-14 00:36:02 +01:00
Jehan
bfa4b10d4d script, src: add English language model.
English detection is still quite crappy so I don't add a unit test yet.
Though I believe the detection being bad is mostly because of too much
shortcutting we are doing to go "fast". I should probably review this
whole part of the logics as well.
2022-12-14 00:24:53 +01:00