mirror of
https://gitlab.freedesktop.org/uchardet/uchardet.git
synced 2025-12-07 17:26:41 +08:00
script: update the README.
This commit is contained in:
parent
d40e5868d5
commit
419a971e6a
@ -16,7 +16,7 @@ to recognize French text encoded in ISO-8859-15, but may fail at
|
|||||||
detecting ISO-8859-15 for non-supported languages.
|
detecting ISO-8859-15 for non-supported languages.
|
||||||
|
|
||||||
This is why, though less flexible, it also makes uchardet much more
|
This is why, though less flexible, it also makes uchardet much more
|
||||||
accurate than other detection system, as well as making it an efficient
|
accurate than other detection systems, as well as making it an efficient
|
||||||
language recognition system.
|
language recognition system.
|
||||||
Since many single-byte charsets actually share the same layout (or very
|
Since many single-byte charsets actually share the same layout (or very
|
||||||
similar ones), it is actually impossible to have an accurate single-byte
|
similar ones), it is actually impossible to have an accurate single-byte
|
||||||
@ -47,7 +47,7 @@ can just run `pip3 install -r requirements.txt`.
|
|||||||
|
|
||||||
Let's say you added (or modified) support for French (`fr`), run:
|
Let's say you added (or modified) support for French (`fr`), run:
|
||||||
|
|
||||||
> ./BuildLangModel.py fr --max-page=100 --max-depth=4
|
> ./BuildLangModel.py fr --max-page=200 --max-depth=4
|
||||||
|
|
||||||
The options can be changed to any value. Bigger values mean the script
|
The options can be changed to any value. Bigger values mean the script
|
||||||
will process more data, so more processing time now, but uchardet may
|
will process more data, so more processing time now, but uchardet may
|
||||||
@ -55,12 +55,11 @@ possibly be more accurate in the end.
|
|||||||
|
|
||||||
## Updating core code ##
|
## Updating core code ##
|
||||||
|
|
||||||
If you were only updating data for a language model, you have nothing
|
If you were only updating data for an existing language model, you have nothing
|
||||||
else to do. Just build `uchardet` again and test it.
|
else to do. Just build `uchardet` again and test it.
|
||||||
|
|
||||||
If you were creating new models though, you will have to add these in
|
If you were creating new models though, you will have to add the sequence models
|
||||||
src/nsSBCSGroupProber.cpp and src/nsSBCharSetProber.h, and increase the
|
in src/nsSBCSGroupProber.cpp and the language model in src/nsMBCSGroupProber.cpp.
|
||||||
value of `NUM_OF_SBCS_PROBERS` in src/nsSBCSGroupProber.h.
|
|
||||||
Finally add the new file in src/CMakeLists.txt.
|
Finally add the new file in src/CMakeLists.txt.
|
||||||
|
|
||||||
I will be looking to make this step more straightforward in the future.
|
I will be looking to make this step more straightforward in the future.
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user