fast_float

mirror of https://github.com/fastfloat/fast_float.git synced 2026-07-30 08:16:25 +08:00

Author	SHA1	Message	Date
Daniel Lemire	f6df0f2917	Merge pull request #398 from redis-performance/exp062-063-combo Enable the 4-digit SWAR follow-up on GCC + skip the rounding-mode probe for long mantissas	2026-07-13 15:13:16 -04:00
Daniel Lemire	11c390b4dc	Merge pull request #396 from fastfloat/dependabot/github_actions/github-actions-640176b5ab Bump actions/checkout from 6 to 7 in the github-actions group across 1 directory	2026-07-13 09:09:20 -04:00
dependabot[bot]	1fa146ed1d	Bump actions/checkout in the github-actions group across 1 directory Bumps the github-actions group with 1 update in the / directory: [actions/checkout](https://github.com/actions/checkout). Updates `actions/checkout` from 6 to 7 - [Release notes](https://github.com/actions/checkout/releases) - [Commits](https://github.com/actions/checkout/compare/v6...v7) --- updated-dependencies: - dependency-name: actions/checkout dependency-version: '7' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com>	2026-07-13 00:06:02 +00:00
Daniel Lemire	ce25f0d195	Remove Stars section from README Removed the Stars section and related chart from README.	2026-07-10 09:20:48 -04:00
fcostaoliveira	46b2fbaabb	Reconcile redis-perf/optim: content superseded by upstream #381/#382/#387 EXP-050/052/053 were upstreamed in final form via #381 (integer-scan unroll) and #382 (4-digit follow-up, clang-gated); the #387 span-elision restructure then rewrote the surrounding code. This merge carries the old branch history while the tree = upstream/main (8.2.7) + EXP-062 (ungate 4-digit SWAR on gcc) + EXP-063 (mantissa bound before rounds_to_nearest probe). NOTE: EXP-052's 2x unroll of loop_parse_if_eight_digits was never upstreamed and is dropped here pending post-#387 revalidation (see EXPERIMENTS.md).	2026-07-03 14:44:24 +01:00
fcostaoliveira	5082489e35	EXP-063: test the mode-independent mantissa bound before the rounds_to_nearest probe	2026-07-03 14:34:47 +01:00
fcostaoliveira	f4f36e04f7	EXP-062: enable the 4-digit SWAR fraction follow-up on all compilers (drop __clang__ gate)	2026-07-03 14:34:47 +01:00
Daniel Lemire	8ec5d236e2	Merge pull request #395 from sahvx655-wq/is-space-signed-compare guard is_space against negative signed code units	2026-06-18 16:33:25 -04:00
sahvx655-wq	c539b5399c	guard is_space against negative signed code units	2026-06-18 19:51:56 +05:30
Daniel Lemire	34164f547b	8.2.10 v8.2.10	2026-06-14 09:52:47 -04:00
Daniel Lemire	4eec7bec38	Merge pull request #394 from fastfloat/int-overflow-simdjson-approach Int overflow check with a faster approach	2026-06-14 09:52:09 -04:00
Daniel Lemire	fd970ab05e	updating visual studio	2026-06-13 21:41:53 -04:00
Daniel Lemire	a7249f86ed	replace checked re-parse with O(1) simdjson-style overflow check The previous commit detects multi-wrap u64 overflow at the max_digits boundary by re-parsing the digits through a checked multiply-add loop (O(max_digits)). Replace that with the constant-time check used in simdjson: the leading digit plus a single threshold comparison. For a max_digits-length value, min_safe_u64(base) == base^(max_digits-1) is the smallest such value and also the width of each leading-digit band [dms, (d+1)ms). Since that width is < 2^64, the only band that can straddle 2^64 is d == dmax (the largest leading digit that still fits), and there it straddles at most once, so a single threshold dmaxms separates wrapped from non-wrapped values. A leading digit above dmax always overflows; below dmax always fits. dmax and the threshold derive from the existing min_safe_u64 table, so no new tables are needed and dmaxms cannot itself overflow. Add a programmatic, self-verifying test for parse_int_string overflow detection covering bases 2..36, complementing the hand-picked strings added earlier. Every generated input is cross-checked against an independent trusted oracle (a plain 64-bit checked multiply-add); on success the parsed value is also compared exactly and full consumption of the input is asserted. Per base it exercises: - an exact-boundary sweep of the 64 values straddling 2^64 (UINT64_MAX-31 .. 2^64+31), built by walking the digit string; - UINT64_MAX, 2^64 and the all-max-digit value, each also with leading zeros; - random max_digits-length values across every leading digit, with the heaviest sampling on the lead == dmax band that straddles 2^64, and full coverage of lead > dmax (the multi-wrap region the naive min_safe check accepted by mistake); - max_digits-1 (never overflows) and max_digits+1 (always overflows). A small signed (int64_t) section checks the exact INT64_MIN/INT64_MAX limits round-trip and that INT64_MAX+1 / INT64_MIN-1 are rejected in every base.	2026-06-13 21:34:34 -04:00
sahvx655-wq	632cc97b5b	detect uint64 overflow that wraps past min_safe in parse_int_string	2026-06-13 21:21:29 -04:00
Daniel Lemire	8234a89623	8.2.9 v8.2.9	2026-06-11 20:29:24 -04:00
Daniel Lemire	0dce102cb4	Merge pull request #391 from sahvx655-wq/int-fast-path-wide-units reject non-digit wide code units in uint8/uint16 integer fast path	2026-06-11 20:28:10 -04:00
Daniel Lemire	30868f8734	Merge pull request #392 from biojppm/fix/gcc9_compile_error Fix compile error with gcc 9: use of [[unlikely]]	2026-06-11 09:37:38 -04:00
Joao Paulo Magalhaes	8e6edc8ad2	Fix compile error with gcc 9: use of [[unlikely]]	2026-06-10 15:37:26 +01:00
sahvx655-wq	82882b237d	gate uint8/uint16 base-10 fast paths to single-byte code units	2026-06-10 12:12:34 +05:30
Daniel Lemire	937198691a	Merge pull request #389 from correctmost/cm/remove-unreachable-return Remove an unreachable return statement	2026-06-09 11:18:15 -04:00
Daniel Lemire	0352ba3fef	Merge pull request #390 from correctmost/cm/remove-unreachable-block Remove an else if statement that is always false	2026-06-09 11:17:35 -04:00
correctmost	6ae691372f	Remove an else if statement that is always false Commit b334317d added the same std::isnan(v) check as an earlier condition. The warning was reported by cppcheck.	2026-06-09 03:48:54 -04:00
correctmost	8fe7a9405b	Remove an unreachable return statement The redundant statement was reported by cppcheck.	2026-06-09 03:37:58 -04:00
Daniel Lemire	e8ec8e8f34	8.2.8 v8.2.8	2026-06-08 15:29:36 -04:00
Daniel Lemire	c05156ff60	Merge pull request #388 from biojppm/fix/clang_compile_error Fix compile error in clang<10: fails on pragma -Wc++20-extensions	2026-06-08 15:28:45 -04:00
Joao Paulo Magalhaes	23e245f2b3	Fix compile error in clang<10: fails on pragma -Wc++20-extensions This fixes a compile error in all clang versions lower than 10, triggered by the use of the pragma ignore with what is an unknown warning on those compiler versions: ``` /__w/ext/fast_float/include/fast_float/parse_number.h:361:34: error: unknown warning group '-Wc++20-extensions', ignored [-Werror,-Wunknown-pragmas] ``` The fix requires looking at __clang_major__, which is unfortunately different in Apple, so a version dispatch is required.	2026-06-08 12:39:48 +01:00
Daniel Lemire	e0b53eaf63	8.2.7 v8.2.7	2026-06-07 14:14:42 -04:00
Daniel Lemire	3044c9b182	Merge pull request #387 from fastfloat/pr386 Using unlikely markers for PR386	2026-06-07 14:12:38 -04:00
Daniel Lemire	29bd11571b	one too many	2026-06-07 11:19:47 -04:00
Daniel Lemire	b1fbfe932a	silencing -Wc++20-extensions at the point of use solely	2026-06-07 11:18:09 -04:00
Daniel Lemire	520fded4a3	adressing comments by @jwakely	2026-06-06 13:13:49 -04:00
Daniel Lemire	b72e07132c	let us using 'unlikely' hints.	2026-06-05 22:01:27 -04:00
fcostaoliveira	3067491f41	clang-format (clang-format-17 comment reflow + signature wrap; no semantic change)	2026-06-03 09:35:26 +01:00
fcostaoliveira	cb5d9cd9a4	Skip materializing the integer/fraction spans on the hot path parsed_number_string_t carries two span<UC const> members (integer, fraction) that are only read on the rare slow paths (digit_comp, and the >19-significant- digit truncation recompute). Materializing them on every parse forces the ~56/64- byte struct to be written out and marshaled through the by-value return, which shows up as backend/store pressure on the hot path. This adds a runtime `store_spans` flag (default true, so all existing callers are unchanged) to parse_number_string; from_chars_float_advanced parses with it false, attempts the Clinger and Eisel-Lemire fast paths inline, and only re-parses with spans on the two rare slow branches. The re-parse is pushed into a single `fastfloat_noinline` (noinline+cold) helper so the force-inlined hot scanner is emitted once rather than duplicated into the caller (without this the extra inline copies regress some targets, e.g. ARM gcc, by bloating the hot frame and lengthening the loop-carried dependency chain). A runtime flag is used deliberately rather than a template parameter: a template would create a second instantiation of the whole scanner whose icache cost wipes out the gain. Measured (per-parser microbench, median of 5, pinned core), fast_float from_chars <double>/<float>, vs the current tip: - Intel Ice Lake (Xeon 8360Y): +17-19% (gcc), Intel TMA shows backend-bound 26.0% -> 2.2% and retiring 60.3% -> 77.3% on short floats (the eliminated span spill), with -36% pipeline slots. - Intel Cascade Lake (Xeon 6248): +18-22% (gcc), +13-23% (clang). - ARM Neoverse-V2 (Graviton4): +73-196% (gcc), +8-11% (clang) -- the struct spill dominated the gcc hot loop there. Correctness: the full float exhaustive suite (exhaustive32, exhaustive32_64, exhaustive32_midpoint, random64) passes, and a 2^32 sweep is byte-identical to the current tip. Public from_chars / from_chars_advanced / parsed_number_string_t are unchanged.	2026-06-03 09:30:42 +01:00
Daniel Lemire	6258cbc5a1	Merge pull request #380 from fastfloat/dependabot/github_actions/github-actions-0eb558eb98 Bump the github-actions group across 1 directory with 3 updates	2026-06-02 14:02:10 -04:00
Daniel Lemire	254f10ce39	Merge pull request #385 from jwakely/patch-2 Fix spelling	2026-06-02 14:01:41 -04:00
Jonathan Wakely	1b11407da9	Fix spelling Run clang-format to reformat the long lines.	2026-06-02 15:30:37 +01:00
Daniel Lemire	f0ed8cdf52	display the latest version.	2026-06-01 18:28:09 -04:00
Daniel Lemire	cfd12ebcf1	8.2.6 v8.2.6	2026-06-01 18:07:41 -04:00
Daniel Lemire	06f3e27411	Merge pull request #383 from redis-performance/pr/parallel-exhaustive Parallelize the exhaustive float32 sweeps across hardware threads (~75-88x)	2026-06-01 18:07:01 -04:00
fcostaoliveira	b642d9202f	tests: parallelize exhaustive32 and exhaustive32_64 sweeps too Same std::thread split as exhaustive32_midpoint; preserves each test's existing failure behavior (abort for exhaustive32, stop-flag for exhaustive32_64).	2026-06-01 21:09:46 +01:00
Daniel Lemire	ed861322d8	Merge pull request #382 from redis-performance/pr/four-digit-followup Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang)	2026-06-01 15:45:15 -04:00
Daniel Lemire	0f682cd6eb	Merge pull request #381 from redis-performance/pr/integer-scan-unroll Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)	2026-06-01 13:44:06 -04:00
fcostaoliveira	b20c420964	tests: parallelize the exhaustive midpoint sweep across hardware threads	2026-06-01 13:01:10 +01:00
fcostaoliveira	7589a4fea5	Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang) After the 8-digit SWAR block loop, consume a remaining 4-7 digit run in one read4_to_u32 + parse_four_digits_unrolled step instead of byte-by-byte (reusing the existing 4-digit helpers). The parsed result is identical; this is purely a faster way to consume the same digits. Gated to clang: on gcc the extra 4-digit check regresses inputs whose remainder is < 4 digits (e.g. the 17-digit fraction of uniform [0,1] -> -3% on 'random'), because the check becomes pure overhead there; clang does not show that. m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark, from_chars->double, clang 18, base vs patch back-to-back (2 samples): canada.txt +11.7%, mesh.txt +7.4%, random ~flat. No regression.	2026-06-01 11:55:50 +01:00
fcostaoliveira	b64d014e2f	Unroll the integer-part digit scan (straight-line for the common 1-5 digit case) parse_number_string scans the integer part one byte at a time in a while loop, while the fraction already uses the 8-digit SWAR loop. Most integer parts are 1-5 digits, so the loop back-edge dominates. Peel the first five iterations into nested ifs, falling through to the original while for longer runs. Semantics are identical (i = 10*i + digit, advancing p); no behavior change. AWS m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark, from_chars->double. base vs patch measured back-to-back, mean of 2 runs: canada: gcc +3.1%, clang +2.8% mesh: gcc +5.4%, clang +5.1% random: ~flat (1-digit integer part) No regression; gcc and clang agree. Alternatives benchmarked and rejected: reusing loop_parse_if_eight_digits for the integer part regressed 5-8% (integer parts are too short for 8-digit SWAR setup); a counted for(k<5) loop matched on gcc but clang optimized it worse (canada -0.9%). The explicit peel is the only form solidly positive on both compilers.	2026-06-01 09:55:08 +01:00
fcostaoliveira	3ff2c0b894	EXP-053: clang-format (reflow comment + expression wrap; no semantic change) Pre-clear the lint_and_format_check CI gate. clang-format-18 (CI pins 17; LLVM base style is identical for these constructs). Behavior/benchmarks unchanged. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 09:15:26 +01:00
dependabot[bot]	b3ec8d89cf	Bump the github-actions group across 1 directory with 3 updates Bumps the github-actions group with 3 updates in the / directory: [actions/setup-node](https://github.com/actions/setup-node), [mymindstorm/setup-emsdk](https://github.com/mymindstorm/setup-emsdk) and [jidicula/clang-format-action](https://github.com/jidicula/clang-format-action). Updates `actions/setup-node` from 6.3.0 to 6.4.0 - [Release notes](https://github.com/actions/setup-node/releases) - [Commits](`53b83947a5...48b55a011b`) Updates `mymindstorm/setup-emsdk` from 14 to 16 - [Release notes](https://github.com/mymindstorm/setup-emsdk/releases) - [Commits](`6ab9eb1bda...4528d102f7`) Updates `jidicula/clang-format-action` from 4.17.0 to 4.18.0 - [Release notes](https://github.com/jidicula/clang-format-action/releases) - [Commits](`3a18028048...654a770daa`) --- updated-dependencies: - dependency-name: actions/setup-node dependency-version: 6.4.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: github-actions - dependency-name: jidicula/clang-format-action dependency-version: 4.18.0 dependency-type: direct:production update-type: version-update:semver-minor dependency-group: github-actions - dependency-name: mymindstorm/setup-emsdk dependency-version: '16' dependency-type: direct:production update-type: version-update:semver-major dependency-group: github-actions ... Signed-off-by: dependabot[bot] <support@github.com>	2026-06-01 00:15:05 +00:00
fcostaoliveira	a30c1f3d3f	EXP-053: 4-digit SWAR follow-up in loop_parse_if_eight_digits, GCC path (ffc EXP-001) After the 8-digit block loop, consume a remaining 4-7 digit run in one SWAR step (reusing fast_float's existing read4_to_u32 / is_made_of_four_digits_fast / parse_four_digits_unrolled) instead of byte-by-byte. GCC path only: on Clang the follow-up's presence bloated the 2x-unroll codegen and regressed random -6.2%. ARM Graviton4 (canonical fast_float MB/s vs EXP-052): GCC: canada +2.6% (948.1 from 924.0, i/f 248.7->229.7), random/mesh flat Clang: unchanged (EXP-052 path preserved) Correctness: 14/14 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 01:04:30 +01:00
fcostaoliveira	ee84946702	EXP-052: 2x unroll of char loop_parse_if_eight_digits (ported from ffc EXP-044) Clang/AArch64-gated 16-digit-per-iteration unroll of the fraction SWAR loop; eliminates the back-edge for typical 17-digit [0,1] mantissas. GCC keeps the auto-unrolled simple loop. ARM Graviton4 (canonical fast_float MB/s vs EXP-050): Clang: random +2.8% (1365.7 from 1328.8), mesh +1.7%, canada +0.5% GCC: unchanged (#else path) Correctness: 14/14 pass. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-01 00:56:52 +01:00

1 2 3 4 5 ...

1046 Commits