From c5f9583b1c2b8ebf66ecb2ef978a295ab62bfd3f Mon Sep 17 00:00:00 2001 From: George Steed Date: Sat, 23 Mar 2024 19:46:37 +0000 Subject: [PATCH] [AArch64] Avoid extracting alpha in ARGB1555ToYRow_NEON The existing implementation of this kernel uses the ARGB1555TOARGB macro which extracts and sign-extends the alpha component into v3, however this particular kernel does not need the alpha component. We can avoid calculating the alpha component completely by using the existing RGB555TOARGB macro, so use that instead. Reduction in runtimes observed for ARGB1555ToYRow_NEON (no noticeable improvement observed on Cortex-A510): Cortex-A55: -3.6% Cortex-A76: -20.9% Cortex-X2: -15.1% Bug: libyuv:976 Change-Id: I2cf2729c8297c53dcd32d0df28e64d4d5c7f6def Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5509200 Reviewed-by: Frank Barchard Reviewed-by: Justin Green --- source/row_neon64.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/source/row_neon64.cc b/source/row_neon64.cc index f01cc6172..dcf901773 100644 --- a/source/row_neon64.cc +++ b/source/row_neon64.cc @@ -3012,7 +3012,7 @@ void ARGB1555ToYRow_NEON(const uint8_t* src_argb1555, "1: \n" "ld1 {v0.16b}, [%0], #16 \n" // load 8 ARGB1555 pixels. "subs %w2, %w2, #8 \n" // 8 processed per loop. - ARGB1555TOARGB + RGB555TOARGB "umull v3.8h, v0.8b, v4.8b \n" // B "prfm pldl1keep, [%0, 448] \n" "umlal v3.8h, v1.8b, v5.8b \n" // G