[AArch64] Avoid extracting alpha in ARGB1555ToYRow_NEON

The existing implementation of this kernel uses the ARGB1555TOARGB macro
which extracts and sign-extends the alpha component into v3, however
this particular kernel does not need the alpha component. We can avoid
calculating the alpha component completely by using the existing
RGB555TOARGB macro, so use that instead.

Reduction in runtimes observed for ARGB1555ToYRow_NEON (no noticeable
improvement observed on Cortex-A510):

Cortex-A55: -3.6%
Cortex-A76: -20.9%
 Cortex-X2: -15.1%

Bug: libyuv:976
Change-Id: I2cf2729c8297c53dcd32d0df28e64d4d5c7f6def
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5509200
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
This commit is contained in:
George Steed 2024-03-23 19:46:37 +00:00 committed by Frank Barchard
parent 7c122e8859
commit c5f9583b1c

View File

@ -3012,7 +3012,7 @@ void ARGB1555ToYRow_NEON(const uint8_t* src_argb1555,
"1: \n"
"ld1 {v0.16b}, [%0], #16 \n" // load 8 ARGB1555 pixels.
"subs %w2, %w2, #8 \n" // 8 processed per loop.
ARGB1555TOARGB
RGB555TOARGB
"umull v3.8h, v0.8b, v4.8b \n" // B
"prfm pldl1keep, [%0, 448] \n"
"umlal v3.8h, v1.8b, v5.8b \n" // G