uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2025-12-07 01:06:40 +08:00

Author	SHA1	Message	Date
Jehan	dc5caa46bc	BuildLangModel: fix hardcoded file names.	2015-11-30 19:18:25 +01:00
Jehan	3e5d37a6b5	BuildLangModel: process pages level per level. I.e. horizontally or "breadth first" rather than vertical tree traversal. This allows to make sure all the start pages in particular are searched, when using max_page option.	2015-11-30 19:12:04 +01:00
Jehan	d9d347099e	BuildLangModel: fix some minor comment from a previous spec.	2015-11-30 00:09:23 +01:00
Jehan	192f8de165	BuildLangModel: build models with computed frequent characters count.	2015-11-30 00:04:44 +01:00
Jehan	b64831ff89	BuildLangModel: allow a list of start pages... ... and add a page with a word with œ in French to make sure we have such words in our stats.	2015-11-29 15:51:23 +01:00
Jehan	dce79a6631	BuildLangModel: the SequenceModel naming must include the language name.	2015-11-29 15:49:56 +01:00
Jehan	c59465adfc	BuildLangModel: save lang model directly in the right directory.	2015-11-29 13:26:10 +01:00
Jehan	290fbd2e2e	BuildLangModel: add the licensing header to generated files.	2015-11-29 02:26:33 +01:00
Jehan	7f290975ba	BuildLangModel: map different cases of the same character together. With the new case_mapping lang property, we can consider upper and lower case versions of the same character as one character. This makes sense in some language, and would allow to enter some rarer characters (but still in the main alphabet) inside the frequent character list. For instance 'œ' and 'Œ' in French.	2015-11-29 02:14:48 +01:00
Jehan	00a78faa1d	BuildLangModel: the max_depth should be a script option... ... rather than a language property.	2015-11-29 01:59:28 +01:00
Jehan	274386f424	BuildLangModel: add a --max-page option to limit data size. This is mostly useful for debugging while we don't want to wait forever to test the script.	2015-11-29 01:42:36 +01:00
Jehan	0314f98ece	BuildLangModel.py: some in-progress script to build language models.	2015-11-29 01:30:04 +01:00

12 Commits