fast_float

mirror of https://github.com/fastfloat/fast_float.git synced 2026-07-31 08:46:24 +08:00

History

fcostaoliveira 7589a4fea5 Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang) After the 8-digit SWAR block loop, consume a remaining 4-7 digit run in one read4_to_u32 + parse_four_digits_unrolled step instead of byte-by-byte (reusing the existing 4-digit helpers). The parsed result is identical; this is purely a faster way to consume the same digits. Gated to clang: on gcc the extra 4-digit check regresses inputs whose remainder is < 4 digits (e.g. the 17-digit fraction of uniform [0,1] -> -3% on 'random'), because the check becomes pure overhead there; clang does not show that. m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark, from_chars->double, clang 18, base vs patch back-to-back (2 samples): canada.txt +11.7%, mesh.txt +7.4%, random ~flat. No regression.	2026-06-01 11:55:50 +01:00
..
fast_float	Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang)	2026-06-01 11:55:50 +01:00

fcostaoliveira 7589a4fea5 Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang)

After the 8-digit SWAR block loop, consume a remaining 4-7 digit run in one
read4_to_u32 + parse_four_digits_unrolled step instead of byte-by-byte (reusing
the existing 4-digit helpers). The parsed result is identical; this is purely a
faster way to consume the same digits.

Gated to clang: on gcc the extra 4-digit check regresses inputs whose remainder
is < 4 digits (e.g. the 17-digit fraction of uniform [0,1] -> -3% on 'random'),
because the check becomes pure overhead there; clang does not show that.

m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark,
from_chars->double, clang 18, base vs patch back-to-back (2 samples):
  canada.txt +11.7%, mesh.txt +7.4%, random ~flat. No regression.

2026-06-01 11:55:50 +01:00

fast_float

Add a 4-digit SWAR follow-up to loop_parse_if_eight_digits (clang)

2026-06-01 11:55:50 +01:00