George Steed 1eae2efbc7 [AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBShadeRow_NEON
The use of LD4 and ST4 to de-interleave ARGB color channels is
unnecessary here since we can just adjust the scale multiplicand to
match the interleaved layout. LD4 and ST4 are known to perform poorly on
some micro-architectures so using LD1 and ST1 here should be preferred.

Reduction in runtime for ARGBShadeRow_NEON:

  Cortex-A55: -19.9%
 Cortex-A510: -50.8%
  Cortex-A76: -36.0%
   Cortex-X2: -46.4%

Bug: libyuv:976
Change-Id: I10a0e6a0a62242826d39b1e963063770f084226a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5494093
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-04-30 00:48:35 +00:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc MT2T Warning fixes for fuchsia 2022-12-06 19:54:40 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc [AArch64] Add Neon dot-product implementation of HammingDistance 2024-04-26 18:39:00 +00:00
compare_neon.cc Scale by even factor low level row function 2020-11-03 21:25:18 +00:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon dot-product implementation of HammingDistance 2024-04-26 18:39:00 +00:00
convert_argb.cc [AArch64] Add SVE implementation for I422AlphaToARGBRow 2024-04-29 18:54:07 +00:00
convert_from_argb.cc [AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow 2024-04-09 03:09:36 +00:00
convert_from.cc Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Remove M420 and refactor NV12ToI420 2020-05-26 18:48:00 +00:00
convert_to_i420.cc Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x. 2021-07-19 22:22:22 +00:00
convert.cc [AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow 2024-04-09 03:09:36 +00:00
cpu_id.cc [AArch64] getauxval(AT_HWCAP{,2}) feature detection, attempt #2 2024-04-25 21:26:31 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
rotate_any.cc Remove MMI support 2022-01-26 08:41:33 +00:00
rotate_argb.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
rotate_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
rotate_gcc.cc Transpose 4x4 for SSE2 and AVX2 2023-03-03 17:46:23 +00:00
rotate_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
rotate_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_neon64.cc MergeUV_AVX512BW for I420ToNV12 2023-02-13 20:14:57 +00:00
rotate_neon.cc GCC warning fix for MT2T 2023-03-16 06:57:20 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
row_any.cc [AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow 2024-04-09 03:09:36 +00:00
row_common.cc [RVV] Support AR64ToAB64 and RGBA-family color conversions 2023-09-05 22:44:48 +00:00
row_gcc.cc YUY2ToARGB use ymm6/7 for shuffle constants 2024-01-22 21:47:23 +00:00
row_lasx.cc AVX10 cpuid detect added 2024-01-10 00:08:22 +00:00
row_lsx.cc Fix compilation errors. 2024-01-03 19:15:56 +00:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc [AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBShadeRow_NEON 2024-04-30 00:48:35 +00:00
row_neon.cc [AArch64] Replace instances of ORR with MOV where possible 2024-04-25 20:48:16 +00:00
row_rvv.cc [RVV] Support AR64ToAB64 and RGBA-family color conversions 2023-09-05 22:44:48 +00:00
row_sve.cc [AArch64] Avoid extraneous CMP in I{444,422}ToARGBRow_SVE2 impl 2024-04-29 18:56:22 +00:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc UVScale down by 2 fix for C and optimize for NEON 2023-04-12 22:49:20 +00:00
scale_argb.cc [AArch64] Add SVE implementation for I422ToARGBRow 2024-04-27 18:26:11 +00:00
scale_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
scale_gcc.cc ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows 2022-10-21 19:35:17 +00:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc [AArch64] Replace instances of ORR with MOV where possible 2024-04-25 20:48:16 +00:00
scale_neon.cc UVScale down by 2 fix for C and optimize for NEON 2023-04-12 22:49:20 +00:00
scale_rgb.cc RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_rvv.cc Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default 2023-12-07 22:54:23 +00:00
scale_uv.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00