fast_float

mirror of https://github.com/fastfloat/fast_float.git synced 2026-07-31 08:46:24 +08:00

History

fcostaoliveira b64d014e2f Unroll the integer-part digit scan (straight-line for the common 1-5 digit case) parse_number_string scans the integer part one byte at a time in a while loop, while the fraction already uses the 8-digit SWAR loop. Most integer parts are 1-5 digits, so the loop back-edge dominates. Peel the first five iterations into nested ifs, falling through to the original while for longer runs. Semantics are identical (i = 10*i + digit, advancing p); no behavior change. AWS m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark, from_chars->double. base vs patch measured back-to-back, mean of 2 runs: canada: gcc +3.1%, clang +2.8% mesh: gcc +5.4%, clang +5.1% random: ~flat (1-digit integer part) No regression; gcc and clang agree. Alternatives benchmarked and rejected: reusing loop_parse_if_eight_digits for the integer part regressed 5-8% (integer parts are too short for 8-digit SWAR setup); a counted for(k<5) loop matched on gcc but clang optimized it worse (canada -0.9%). The explicit peel is the only form solidly positive on both compilers.	2026-06-01 09:55:08 +01:00
..
fast_float	Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)	2026-06-01 09:55:08 +01:00

fcostaoliveira b64d014e2f Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)

parse_number_string scans the integer part one byte at a time in a while loop,
while the fraction already uses the 8-digit SWAR loop. Most integer parts are
1-5 digits, so the loop back-edge dominates. Peel the first five iterations into
nested ifs, falling through to the original while for longer runs. Semantics are
identical (i = 10*i + digit, advancing p); no behavior change.

AWS m8g.metal-24xl (Graviton4), -O3 -march=native, simple_fastfloat_benchmark,
from_chars->double. base vs patch measured back-to-back, mean of 2 runs:
  canada: gcc +3.1%, clang +2.8%
  mesh:   gcc +5.4%, clang +5.1%
  random: ~flat (1-digit integer part)
No regression; gcc and clang agree.

Alternatives benchmarked and rejected: reusing loop_parse_if_eight_digits for the
integer part regressed 5-8% (integer parts are too short for 8-digit SWAR setup);
a counted for(k<5) loop matched on gcc but clang optimized it worse (canada -0.9%).
The explicit peel is the only form solidly positive on both compilers.

2026-06-01 09:55:08 +01:00

fast_float

Unroll the integer-part digit scan (straight-line for the common 1-5 digit case)

2026-06-01 09:55:08 +01:00