InvalidUsernameException 9d81c71aef Do not mis-parse certain wide-character emojis as integer
When calling ch_to_digit() with a UTF-16 or UTF-32 code unit, it simply
truncates away any data stored in the non-low byte(s) of the code unit.
It then uses a lookup table to determine whether the low byte
corresponds to an ASCII digit. This is incorrect because as soon as any
bit outside the low byte is set, the number will never correspond to a
ASCII digit anymore.

To fix this, we produce a mask that is all zeroes if any bit outside the
low byte is set in the code unit, all ones otherwise. Anding this mask
with the original code unit forces the table lookup to return the
sentinel value from the zero-index if any high bit was set and causes
the code unit not to be parsed as integer.

This bug was discovered when loading Mastodon posts inside the Ladybird
browser where some of Mastodon's JavaScript would trigger the code path
that erroneously parsed the emoji as integer. It had the visible effect
that some digits inside the posts would get rendered as one of the
emojis that parsed to that digit. For more details see this issue:
https://github.com/LadybirdBrowser/ladybird/issues/6205

The emojis in the test case are simply all the emojis used on Mastodon
that caused the bug. They can be found here:
06803422da/app/javascript/mastodon/features/emoji/emoji_map.json
2025-09-15 23:12:28 +02:00
..
bloat_analysis harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
build_tests formatted code 2024-12-01 16:39:28 +01:00
installation_tests harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
basictest.cpp lint 2025-02-06 20:25:09 -05:00
BUILD.bazel add char8_t test 2024-11-25 15:43:51 +01:00
CMakeLists.txt Merge branch 'main' into P2497R0 2025-09-03 12:04:36 -04:00
example_comma_test.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
example_test.cpp make it build 2024-12-03 23:23:34 +01:00
exhaustive32_64.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
exhaustive32_midpoint.cpp formatted code 2024-12-01 16:39:28 +01:00
exhaustive32.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
fast_int.cpp Do not mis-parse certain wide-character emojis as integer 2025-09-15 23:12:28 +02:00
fixedwidthtest.cpp harmonize ifdef checks 2024-12-01 16:36:45 +01:00
fortran.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
json_fmt.cpp turning json option into macro parameter 2025-03-09 15:13:43 -04:00
long_exhaustive32_64.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
long_exhaustive32.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
long_random64.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
long_test.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
p2497.cpp lint 2025-05-19 18:16:14 -04:00
powersoffive_hardround.cpp formatted code 2024-12-01 16:39:28 +01:00
random64.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
random_string.cpp formatted code 2024-12-01 16:39:28 +01:00
rcppfastfloat_test.cpp harmonize files to use "east const" 2024-11-23 09:46:18 +01:00
short_random_string.cpp formatted code 2024-12-01 16:39:28 +01:00
string_test.cpp formatted code 2024-12-01 16:39:28 +01:00
supported_chars_test.cpp add char8_t test 2024-11-25 15:43:51 +01:00
wide_char_test.cpp add failing test for wide chars 2024-11-21 00:08:55 +01:00