371 Commits

Author SHA1 Message Date
Daniel Lemire
8e3e876b2e Add optional support for digit separators and cpp prefixes (#369)
Rebased onto current main. Adds optional support in from_chars_advanced to
skip a configurable digit separator (e.g. ') and to skip standard cpp prefixes
(0x/0X, 0b/0B) before decimal parsing.

Reconciled with main's straight-line-unroll optimization of the integer-part
scan: the fast unrolled path and loop_parse_if_eight_digits fast path are
preserved for the common no-separator case; separator-aware loops are used only
when a digit separator is configured.

Original work by zaewc (PR #369), squashed during conflict resolution.
2026-06-05 22:18:00 -04:00
Jonathan Wakely
1b11407da9 Fix spelling
Run clang-format to reformat the long lines.
2026-06-02 15:30:37 +01:00
Daniel Lemire
cfd12ebcf1 8.2.6 2026-06-01 18:07:41 -04:00
Daniel Lemire
ed861322d8
Merge pull request #382 from redis-performance/pr/four-digit-followup
Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang)
2026-06-01 15:45:15 -04:00
fcostaoliveira
7589a4fea5 Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang)
After the 8-digit SWAR block loop, consume a remaining 4-7 digit run in one
read4_to_u32 + parse_four_digits_unrolled step instead of byte-by-byte (reusing
the existing 4-digit helpers). The parsed result is identical; this is purely a
faster way to consume the same digits.

Gated to clang: on gcc the extra 4-digit check regresses inputs whose remainder
is < 4 digits (e.g. the 17-digit fraction of uniform [0,1] -> -3% on 'random'),
because the check becomes pure overhead there; clang does not show that.

m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark,
from_chars->double, clang 18, base vs patch back-to-back (2 samples):
  canada.txt +11.7%, mesh.txt +7.4%, random ~flat. No regression.
2026-06-01 11:55:50 +01:00
fcostaoliveira
b64d014e2f Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)
parse_number_string scans the integer part one byte at a time in a while loop,
while the fraction already uses the 8-digit SWAR loop. Most integer parts are
1-5 digits, so the loop back-edge dominates. Peel the first five iterations into
nested ifs, falling through to the original while for longer runs. Semantics are
identical (i = 10*i + digit, advancing p); no behavior change.

AWS m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark,
from_chars->double. base vs patch measured back-to-back, mean of 2 runs:
  canada: gcc +3.1%, clang +2.8%
  mesh:   gcc +5.4%, clang +5.1%
  random: ~flat (1-digit integer part)
No regression; gcc and clang agree.

Alternatives benchmarked and rejected: reusing loop_parse_if_eight_digits for the
integer part regressed 5-8% (integer parts are too short for 8-digit SWAR setup);
a counted for(k<5) loop matched on gcc but clang optimized it worse (canada -0.9%).
The explicit peel is the only form solidly positive on both compilers.
2026-06-01 09:55:08 +01:00
Daniel Lemire
05087a303d 8.2.5 2026-04-16 14:39:03 -04:00
Michael Lippautz
001c04cc8a Remove <algorithm> include and replace std::min with ternary operators
Replaces uses of std::min with ternary operators in ascii_number.h, digit_comparison.h, and float_common.h to remove the dependency on the <algorithm> header in those files.
2026-04-16 17:17:19 +00:00
Michael Lippautz
b063de82c7
Include <algorithm> in float_common.h
`fastfloat_strncasecmp` relies on `std::min`.
2026-04-16 09:51:58 +02:00
Daniel Lemire
18e55e48a8 lint 2026-03-10 17:06:04 -04:00
Daniel Lemire
eb9ab42c0a 8.2.4 2026-03-10 12:10:12 -04:00
Koleman Nix
2606bcdf2f A few inlines 2026-03-07 15:36:09 -05:00
Xisco Fauli
3c6a64b87d fix warning C4702: unreachable code 2026-02-06 11:28:34 +01:00
Daniel Lemire
01ce95dfe4 v8.2.3 2026-02-03 11:27:40 -05:00
sleepingieght
4fa83ccff4 fix early return error in fastfloat_strncasecmp 2026-01-21 19:21:06 +05:30
Daniel Lemire
babb1f3335
Merge pull request #356 from sleepingeight/surya/opt-cmp
optimize fastfloat_strncasecmp
2026-01-18 18:56:05 -05:00
Shikhar
b14e6a466a simpler optimizations
Signed-off-by: Shikhar <shikharish05@gmail.com>
2026-01-02 05:21:01 +05:30
Shikhar
13d4b94183 small fix 2026-01-02 05:21:01 +05:30
Shikhar
d0af1cfdbd optimize uint16 parsing
Signed-off-by: Shikhar <shikharish05@gmail.com>
2026-01-02 05:21:01 +05:30
Daniel Lemire
d5bc4e1b2e
Merge pull request #358 from shikharish/uint8-base-fix
add base check for uint8
2025-12-31 13:44:12 -05:00
Daniel Lemire
97b54ca9e7 v8.2.2 2025-12-31 13:12:46 -05:00
Shikhar
4dc5225797 add base check for uint8 parsing
Signed-off-by: Shikhar <shikharish05@gmail.com>
2025-12-31 22:07:45 +05:30
Shikhar
fb522b66d0 fix endianess bug in uint8 parsing
Signed-off-by: Shikhar <shikharish05@gmail.com>
2025-12-31 21:51:23 +05:30
sleepingieght
4eb0d806fa add specialisations 2025-12-30 20:27:45 +05:30
sleepingieght
265cb849f3 optimise fastfloat_strncasecmp 2025-12-30 01:15:22 +05:30
Daniel Lemire
11ce67e5eb v8.2.1 2025-12-29 11:09:40 -05:00
Daniel Lemire
f4f9da1e6b fix for issue 354 2025-12-29 10:55:20 -05:00
Daniel Lemire
dd77fb5e4c v8.2.0 2025-12-27 12:08:58 -05:00
Daniel Lemire
b4d26ec866 v8.1.1 2025-12-27 12:06:36 -05:00
Pavel Novikov
cb813a7765
fixed UB 2025-12-27 00:15:30 +03:00
Shikhar
780c341359 fix macro
Signed-off-by: Shikhar <shikharish05@gmail.com>
2025-12-25 00:45:51 +05:30
Shikhar
fdb0eddf99 c++14 constexpr
Signed-off-by: Shikhar <shikharish05@gmail.com>
2025-12-25 00:45:51 +05:30
Shikhar
fce0ab61df uint8_t parsing
Signed-off-by: Shikhar <shikharish05@gmail.com>
2025-12-25 00:45:51 +05:30
Raine 'Gravecat' Simmons
9d78a01ff7
Fixed formatting with clang-format 2025-11-22 21:53:37 +00:00
Raine 'Gravecat' Simmons
409d6215b4
Fixes compilation on GCC/MinGW 2025-11-22 16:11:06 +00:00
Pavel Novikov
1ea4d2563e
made function non-template
+fixed a couple of typos
2025-09-30 12:18:29 +03:00
Daniel Lemire
7262d9454e lint 2025-09-29 15:08:24 -04:00
Daniel Lemire
fd98fd6689
specialize for std::float32_t and std::float64_t explicitly
credit: @lemire
2025-09-29 21:43:36 +03:00
Pavel Novikov
197c0ffca7
clang format 2025-09-29 21:43:35 +03:00
Pavel Novikov
13345cab65
added template overload for integer_times_pow10() 2025-09-18 21:29:25 +03:00
Daniel Lemire
88b1e5321c version 8.1.0 2025-09-18 09:38:45 -06:00
Daniel Lemire
2aa6d0ba72
Merge pull request #326 from fastfloat/patch803
Release candidate 8.0.3
2025-09-18 09:37:20 -06:00
Daniel Lemire
0b6d911220 format 2025-09-18 08:30:28 -06:00
Pavel Novikov
7a77227521
minor fix of forward declaration 2025-09-18 17:02:29 +03:00
Daniel Lemire
e20c952456
Merge pull request #320 from toughengineer/int_multiplication_by_power_of_10
Implemented multiplication of integer by a power of 10
2025-09-18 07:48:09 -06:00
Daniel Lemire
bb956b29db release candidate 8.0.3 2025-09-18 07:44:53 -06:00
Daniel Lemire
48fc5404d4 compatibility fix 2025-09-18 07:44:05 -06:00
InvalidUsernameException
9d81c71aef Do not mis-parse certain wide-character emojis as integer
When calling ch_to_digit() with a UTF-16 or UTF-32 code unit, it simply
truncates away any data stored in the non-low byte(s) of the code unit.
It then uses a lookup table to determine whether the low byte
corresponds to an ASCII digit. This is incorrect because as soon as any
bit outside the low byte is set, the number will never correspond to a
ASCII digit anymore.

To fix this, we produce a mask that is all zeroes if any bit outside the
low byte is set in the code unit, all ones otherwise. Anding this mask
with the original code unit forces the table lookup to return the
sentinel value from the zero-index if any high bit was set and causes
the code unit not to be parsed as integer.

This bug was discovered when loading Mastodon posts inside the Ladybird
browser where some of Mastodon's JavaScript would trigger the code path
that erroneously parsed the emoji as integer. It had the visible effect
that some digits inside the posts would get rendered as one of the
emojis that parsed to that digit. For more details see this issue:
https://github.com/LadybirdBrowser/ladybird/issues/6205

The emojis in the test case are simply all the emojis used on Mastodon
that caused the bug. They can be found here:
06803422da/app/javascript/mastodon/features/emoji/emoji_map.json
2025-09-15 23:12:28 +02:00
WenLei
6677924083 float_common.h: Support RISC-V 2025-09-11 11:11:30 +08:00
Pavel Novikov
0a230326ab
now finally got the anti-ambiguity overloads right, right? 2025-09-06 02:22:43 +03:00