25 Commits

Author SHA1 Message Date
Jehan
06029ec334 src: allow setting a default language in the CLI tool.
The syntax of --weight stays the same with the addition that the
language '*' means setting the default weight.

For instance, if you are sure that your input is either French or
English, you could run:

> uchardet -l -w 'fr:1,en:1,*:0'

(setting same weight to French and English, and everything else to 0)
2025-08-08 11:40:10 +02:00
Marcus Nilsson
9699dfce07 Issue #40: Close file when it's no longer needed 2025-06-07 23:35:44 +00:00
Jehan
fb433a57b5 src: add a --language|-l option to the uchardet CLI tool. 2022-12-14 00:24:53 +01:00
Jehan
908f9b8ba7 src, test: rename s/uchardet_get_candidates/uchardet_get_n_candidates/.
This was badly named as this function does not return candidates, but
the number of candidates (to be actually used in other API).
2022-12-14 00:24:53 +01:00
Jehan
a0bfba3db3 src: add a --weight option to the CLI tool.
Syntax is: lang1:weight1,lang2:weight2…
For instance: `uchardet -wfr:1.1,it:1.05 file.txt` if you think a file
is probably French or maybe Italian.
2022-12-14 00:23:13 +01:00
Jehan
f74d602449 src: fix the usage of uchardet tool.
It was displaying -v for both verbose and version options. The new
--verbose short option is actually -V (uppercase).
2022-12-14 00:23:13 +01:00
Jehan
d48ee7abc2 src: uchardet tool now shows the language code in verbose mode. 2022-12-14 00:23:13 +01:00
Jehan
7bc1bc4e0a src: new option --verbose|-V in the uchardet CLI tool.
This new option will give the whole candidate list as well as their
respective confidence (ordered by higher to lower).
2022-12-14 00:23:13 +01:00
Lucinda May Phipps
45bd32d102 src/tools/uchardet.cpp: make stuff static 2022-11-29 13:57:31 +00:00
Lucinda May Phipps
383bf118c9 don't use feof 2022-11-29 13:57:31 +00:00
myd7349
5bcbd23acf build: Fix build errors on Windows
- Fix string no output variables on UWP

  On UWP, CMAKE_SYSTEM_PROCESSOR may be empty. As a result:
  string(TOLOWER ${CMAKE_SYSTEM_PROCESSOR} TARGET_ARCHITECTURE)
  will be treated as:
  string(TOLOWER TARGET_ARCHITECTURE)
  which, as a result, will cause a CMake error:

  CMake Error at CMakeLists.txt:42 (string):
    string no output variable specified

- Remove unnecessary header inclusions in uchardet.cpp

  These extra inclusions cause build errors on Windows.
2020-04-26 10:08:45 +08:00
Jehan
ef0313046b Also allow uchardet tool to detect encoding of a file named "--".
My previous commit was good except for the very special case of wanting
to analyze a file named "--". This file would be ignored.

With this change, only the first "--" option will be ignored as meaning
"end of option arguments", but any remaining value (another "--"
included) will be considered as a file path.
2020-04-22 21:11:23 +02:00
Jehan
4a37dfdf1c Issue #15: support "--" end-of-option. 2020-04-22 21:05:44 +02:00
Jehan
e0b9269849 Fix various other occurrences of bug tracker URL in code/build. 2020-04-22 12:29:41 +02:00
Jehan
771d78b7df Update the URL links: uchardet is now a freedesktop project. 2016-07-20 01:47:50 +02:00
Jehan
248d6dbd35 tools: exit with non-zero value on uchardet error. 2016-01-21 18:16:42 +01:00
Jehan
ba56d91808 Update uchardet URL in various places. 2015-12-03 19:48:29 +01:00
Jehan
d1bc09e4d7 Update authors.
I think I deserved being listed in the authors by now. ;-)
2015-12-03 19:44:13 +01:00
Jehan
0289c2a232 Differentiate ASCII and detection failure.
The lib used to return "" for both properly detected ASCII and
detection failure. And the tool would return "ascii/unknown".
Make a proper distinction between the 2 cases.
2015-11-28 17:04:52 +01:00
Loic Le Loarer
07af96b3a7 Use perror for error report 2015-07-16 01:20:03 +02:00
Loic Le Loarer
1c89a2f8ff Use stdin by default as before 2015-07-16 01:15:08 +02:00
Loic Le Loarer
972d061e90 Allow multiple filename in the command line 2015-07-16 00:59:58 +02:00
BYVoid
06e65096f1 Add comments on uchardet.h 2011-07-11 15:25:31 +08:00
BYVoid
84284eccf4 Update code from upstream. 2011-07-11 14:42:50 +08:00
BYVoid
331af64156 Add command line interface. 2011-07-10 16:42:38 +08:00