uchardet

mirror of https://gitlab.freedesktop.org/uchardet/uchardet.git synced 2026-02-08 10:47:01 +08:00

Author	SHA1	Message	Date
Martin T. H. Sandsmark	8d15d6b557	make the logfile usable	2022-11-30 19:09:09 +01:00
Jehan	2a04e57c8f	test: update the Maltese / ISO-8859-3 test file. Taken from the page: https://mt.wikipedia.org/wiki/Lingwa_Maltija The old test was fine but had some French words in it, which lowered the confidence for Maltese. Technically it should not be a huge issue in the end, i.e. that if there are enough actual Maltese words, the stats should still weigh in favor of Maltese likeness (which they mostly did anyway), but since I am making some other changes, this was just not enough. In particular I was changing some of the UTF-8 confidence logics and the file ended up detected as UTF-8 (even though it has illegal sequence and cannot be! Cf. #9). So the real long-term solution is to actually fix our UTF-8 detector, which I'll do at some point, but for the time being, let's have definite non-questionable Maltese in there to simplify testing at this early stage of uchardet rewriting.	2022-11-29 14:59:17 +01:00
Lucinda May Phipps	45bd32d102	src/tools/uchardet.cpp: make stuff static	2022-11-29 13:57:31 +00:00
Lucinda May Phipps	ef19faa8c5	Update uchardet-tests.c	2022-11-29 13:57:31 +00:00
Lucinda May Phipps	383bf118c9	don't use feof	2022-11-29 13:57:31 +00:00
myd7349	143b3fe513	README: update libchardet repository link	2022-08-01 19:38:19 +08:00
andiwand	23a664560b	Issue #27 : fix cmake	2021-12-01 13:49:37 +01:00
Jehan	b3b2bd2721	gitignore: I forgot the 2 executables (CLI tool and test binary).	2021-11-09 14:26:21 +01:00
Jehan	48db2b0800	gitignore: add files generated by the build system. Though it is highly encouraged to do out-of-source builds, it is not strictly forbidden to do in-source builds. So we should ignore the files generated by CMake. Only tested with a Linux build, with both make and ninja backends. I added .dll and .dylib versions (for Windows and macOS respectively), guessing these will be the file names on these platforms, unless mistaken (since untested). As discussed in !10, let's add with this commit files generated by the build system, but not any personal environment files (specific to contributors' environment). If I missed any file name which can be generated by the build system in some platforms, configuration, or condition, let's add them as we discover them.	2021-11-09 14:05:31 +01:00
Pedro López-Cabanillas	d7dad549bd	cmake exported targets The minimum required cmake version is raised to 3.1, because the exported targets started at that version. The build system creates the exported targets: - The executable uchardet::uchardet - The library uchardet::libuchardet - The static library uchardet::libuchardet_static A downstream project using CMake can find and link the library target directly with cmake (without needing pkg-config) this way: ~~~ project(sample LANGUAGES C) find_package ( uchardet ) if (uchardet_FOUND) add_executable( sample sample.c ) target_link_libraries ( sample PRIVATE uchardet::libuchardet ) endif () ~~~ After installing uchardet in a prefix like "$HOME/uchardet/": cmake -DCMAKE_PREFIX_PATH="$HOME/uchardet/;..." Instead installing, the build directory can be used directly, for instance: cmake -Duchardet_DIR="$HOME/uchardet-0.1.0/build/" ...	2021-11-09 09:52:15 +00:00
Aaron Madlon-Kay	6f38ab95f5	Mention MacPorts in readme	2021-01-27 06:57:58 +00:00
Jehan	c8a3572cca	Issue #17 : update README. Replace the old link to the science paper by one on archive-mozilla website. Remove the original source link as I can't find any archived version of it (even on archive.org, only the folder structure is saved, not actual files themselves, so it's useless). Also add some history, which is probably a nice touch. Add a link to crossroad to help people who'd want to cross-compile uchardet. Finally add the R binding by Artem Klevtsov and QtAV as reported.	2020-04-29 16:20:00 +02:00
Jehan	472a906844	Issue #16 : "i686" uname not properly detected as x86. This is basically a continuation of an older bug from Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=101033	2020-04-28 20:43:12 +02:00
myd7349	8681fc060e	build: Add uchardet CLI tool building support for MSVC	2020-04-26 08:16:14 +00:00
myd7349	5bcbd23acf	build: Fix build errors on Windows - Fix string no output variables on UWP On UWP, CMAKE_SYSTEM_PROCESSOR may be empty. As a result: string(TOLOWER ${CMAKE_SYSTEM_PROCESSOR} TARGET_ARCHITECTURE) will be treated as: string(TOLOWER TARGET_ARCHITECTURE) which, as a result, will cause a CMake error: CMake Error at CMakeLists.txt:42 (string): string no output variable specified - Remove unnecessary header inclusions in uchardet.cpp These extra inclusions cause build errors on Windows.	2020-04-26 10:08:45 +08:00
Jehan	a49f8ef6ea	doc: update README.maintainer. There is one more step to transform a git tag into a proper "Gitlab release" with the new platform.	2020-04-23 12:32:49 +02:00
Jehan	59f68dbe57	Release: version 0.0.7 v0.0.7	2020-04-23 11:48:58 +02:00
Jehan	98bc2f31ef	Issue #8 : have BuildLangModel.py add ending newline to generated source.	2020-04-22 22:57:25 +02:00
Jehan	44a50c30ee	Issue #8 : no newline at end of file. Not sure if it is in the C++ standard, or was, but apparently some compilers may complain when files don't end with a newline (though neither GCC nor Clang as our CI and my local builds are fine). So here are all our generated source which didn't have such ending newline (hopefully I forgot none). I just loaded them in my vim editor, and resaved them. This was enough to add an ending newline.	2020-04-22 22:53:25 +02:00
Jehan	6c7f32a751	Issue #10 : Crashing sequence with nsSJISProber. uchardet_handle_data() should not try to process data of nul length. Still this is not technically an error to feed empty data to the engine, and I could imagine it could happen especially when done in some automatic process with random input files (which looks like what was happening in the reporter case). So feeding empty data just returns a success without actually doing any processing, allowing to continue the data feed.	2020-04-22 22:11:51 +02:00
Jehan	ef0313046b	Also allow uchardet tool to detect encoding of a file named "--". My previous commit was good except for the very special case of wanting to analyze a file named "--". This file would be ignored. With this change, only the first "--" option will be ignored as meaning "end of option arguments", but any remaining value (another "--" included) will be considered as a file path.	2020-04-22 21:11:23 +02:00
Jehan	4a37dfdf1c	Issue #15 : support "--" end-of-option.	2020-04-22 21:05:44 +02:00
wangqr	ae7acbd0f2	Add dllexport to interface functions This allows building the DLL on Windows with other compilers than GNU ones. See MR !4.	2020-04-22 18:54:07 +00:00
Artem Klevtsov	2694ba6363	Fix global-buffer-overflow due EUCTW_TABLE_SIZE	2020-04-22 17:06:40 +00:00
Jehan	81ab1d1da1	gitlab-ci: Adding a Clang build.	2020-04-22 18:04:56 +02:00
Jehan	6afec53adc	gitlab-ci: Windows 32 and 64-bit builds.	2020-04-22 18:00:36 +02:00
Jehan	b5674dbd50	gitlab-ci: first CI build for uchardet. Very simple CI since uchardet is an extremely low/no dependency library. So basically we install CMake in Debian/testing and we are good.	2020-04-22 17:22:23 +02:00
Jehan	e0b9269849	Fix various other occurrences of bug tracker URL in code/build.	2020-04-22 12:29:41 +02:00
Jehan	60bf53c81e	README: update to Gitlab links. Freedesktop moved its infrastructure to Gitlab a while ago.	2020-04-22 00:33:48 +02:00
Jehan	0cfb75724a	README: some small updates.	2020-04-22 00:17:23 +02:00
Jehan	bdfd6116a9	Add a mention about fd.o code of conduct.	2018-09-26 15:12:25 +02:00
Ilya Tumaykin	f136d434f0	build: turn TARGET_ARCHITECTURE into option Default value is autodetected if not specified by user.	2018-01-21 15:58:13 +01:00
Jehan	95872ef41c	Adding some information about building for Windows.	2017-12-26 03:37:42 +01:00
Jehan	df67ae4fe0	CMake: get rid of some commented code. It says that's for Win32 platform and uses the install prefix as library prefix. But that's not at all the same kind of prefixes! CMAKE_INSTALL_PREFIX expected value is the path to install the lib (what is called the "installation prefix"), whereas CMAKE_*_LIBRARY_PREFIX are the prefix on the file name (usually "lib" on UNIX-like systems). Anyway I don't see a need to change this value. It will be called "libuchardet.dll" on Win32. I don't see the problem. Also this code was already commented out, and compilation and usage for Win32 works just fine without it. :-)	2017-12-24 19:47:05 +01:00
Jehan	cd617d181d	CMake: do not check/set SSE and float-store options on non-x86 targets. Not sure if that's right. I guess we might also find non-x86 machines where floating point computation won't follow IEEE standard as well. But let's do this for now to prevent from useless performance hit.	2017-11-07 00:37:54 +01:00
Jehan	939482ab2b	CMake: slightly improve the configuration option messages. Also add full stops, similarly to CMake defaut options.	2017-11-06 02:11:20 +01:00
Jehan	77bf71ea36	CMake: rename s/ENABLE_SSE2/CHECK_SSE2/. "ENABLE_SSE2" may be misleading since having it ON does not necessarily mean that SSE2 flags will be actually set. It only means that the support will be checked (then set only when supported). Also adding the warning about possible performance decrease.	2017-11-06 02:07:40 +01:00
Jehan	5996bbd995	Bug 101033 - Testsuite fails on i386. Floating point accuracy may be different depending on the architecture. In particular some architectures may store floating values with different precision, resulting in unreliable results across various machines. It would seem in particular true on older x86 machines without SSE support, which were reported cases. The proposed solution is to test for SSE support and explicitly add the proper flags (even though they are set by default anyway on modern x86). When this is not available (on older machines or simply when not on x86 processors), I replace sse2 flags with -ffloat-store, which forces IEEE floating point definition. The reason why not to always force -ffloat-store is because it seems to decrease performance on some machines. SSE is prefered if available. I also add a ENABLE_SSE2 option on the CMake file to allow builders to use -ffloat-store even though SSE2 may be available on the build machine. This would allow to build portable binaries which can also be installed on older machines.	2017-11-06 01:56:45 +01:00
Jehan	056a5a6e51	README: add some applications having uchardet as dependency. There are likely more (and I know some are planning support) but these are the ones I know of and with support already in.	2017-09-21 00:06:03 +02:00
Jehan	1898847eb6	src: cast value to its proper type. Thanks to Marino Faggiana for reporting it. See: https://github.com/BYVoid/uchardet/issues/37	2017-08-27 13:01:30 +02:00
Jehan	170ef349cf	src: fix some doc comments. s/a instance/an instance/. Unless mistaken, we should use "an" with next word starting with vowel.	2017-08-19 10:46:25 +02:00
Jehan	c049332c41	src: s/detctor/detector/.	2017-08-18 12:03:54 +02:00
Jehan	d9d014742a	README: Gentoo also has a uchardet package. And it is up-to-date with upstream URL at Freedesktop! Good!	2017-05-28 21:13:59 +02:00
Jehan	53f7ad0e0b	Bug 101032 - assignments to nsSMState in nsCodingStateMachine result... ... in unspecified behavior. When compiling with UBSan (-fsanitize=undefined), execution complains: > runtime error: load of value 5, which is not a valid value for type 'nsSMState' Since the machine states depend on every different charset's state machine, it is not possible to simply extend the enum with more generic values. Instead let's just make the state as an unsigned int value and define the 3 generic states as constants.	2017-05-28 20:01:06 +02:00
Jehan	50bc02c0ff	Request C++11 standard project-wise and make it a strong requirement. It is unneeded to do it by target, using the globale property CMAKE_CXX_STANDARD instead. Also with CMAKE_CXX_STANDARD_REQUIRED, I make this a strong requirement. The documentation indeed states that the CXX_STANDARD "is treated as optional and may “decay” to a previous standard if the requested is not available". This means that uchardet will likely not be buildable with a compiler with no C++11 support. But I assume this is not a common situation, and probably we should not care about outdated compilers. I remain open to suggestions and disagreement on the topic obviously.	2017-05-28 15:43:44 +02:00
Jehan	1bf198cb0f	Make C++11 the standard used for uchardet. As discussed in bug 101032, it seems like the most common usage nowadays. Let's make a specific choice to avoid different behavior on different builds later on.	2017-05-28 15:32:06 +02:00
Jehan	98bf4d73fd	Bug 101204 - different results with different chunk sizes. ASCII and ISO-8859-1 should not be detected in nsUniversalDetector::HandleData() but in nsUniversalDetector::DataEnd() instead. Otherwise it creates an unwanted shortcut from the first call to uchardet_handle_data() if the input is broken into several pieces and if the first chunk happens to be ASCII (or ASCII + NBSP).	2017-05-28 14:14:48 +02:00
Jehan	50743e16f8	src: minor indentation fix.	2017-05-14 21:35:11 +02:00
Jehan	6cf13f108b	test: output the test file path which we failed to open. Also properly free the string in such case.	2017-05-14 20:29:30 +02:00
Jehan	94b10b9b29	Bug 101030 - Buffer overflow related to ISO2022JP detection in... ... en:ascii and ja:iso-2022-jp tests. I don't know much about this part of the code at this point. Yet I can clearly deduct that the length of the charLenTable is supposed to be the classFactor of the SMModel. Therefore 2 classes were missing in ISO2022JPCharLenTable, hence a buffer overflow happens when trying to reach these. I am not sure of the values I should add there. For now, let's set 0 to both, but adding also a comment so that I can review this code later on, when I will get to read and understand this piece of code in more depth.	2017-05-14 19:49:01 +02:00

1 2 3 4 5 ...

277 Commits