uchardet

coffee/uchardet

Fork 0

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-08 01:36:41 +08:00

Commit Graph

Author	SHA1	Message	Date
Jehan	0be80a21db	script, src: update Norwegian model with the new language features. As I just rebased my branch about new language detection API, I needed to re-generate Norwegian language models. Unfortunately it doesn't detect UTF-8 Norwegian text, though not far off (it detects it as second candidate with high 91% confidence; beaten by Danish UTF-8 with 94% confidence unfortunately!). Note that I also update the alphabet list for Norwegian as there were too many letters in there (according to Wikipedia at least), so even when training a model, we had some missing characters in the training set.	2022-12-14 00:24:53 +01:00

Author

SHA1

Message

Date

Jehan

0be80a21db

script, src: update Norwegian model with the new language features.

As I just rebased my branch about new language detection API, I needed
to re-generate Norwegian language models. Unfortunately it doesn't
detect UTF-8 Norwegian text, though not far off (it detects it as second
candidate with high 91% confidence; beaten by Danish UTF-8 with 94%
confidence unfortunately!).

Note that I also update the alphabet list for Norwegian as there were
too many letters in there (according to Wikipedia at least), so even
when training a model, we had some missing characters in the training
set.

2022-12-14 00:24:53 +01:00

1 Commits