George Steed 1c31461771 [AArch64] Add Neon dot-product implementation for ARGBGrayRow
We can use dot product instructions to apply the coefficients without
needing to use LD4 deinterleaving load instructions, and then TBL to mix
in the original alpha component. This is significantly faster on some
micro-architectures where LD4 instructions are known to be slow compared
to normal loads.

Reduction in cycle counts observed compared to existing Neon code:

 Cortex-A55: -12.6%
Cortex-A510: -48.6%
 Cortex-A76: -39.7%
Cortex-A720: -52.3%
  Cortex-X1: -63.5%
  Cortex-X2: -67.0%

Bug: b/42280946
Change-Id: I3641785e74873438acc00d675f5bc490dfa95b50
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5785972
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-09-16 04:31:11 +00:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc [AArch64] Add SVE2 implementation of RGB24ToARGBRow 2024-07-08 20:12:05 +00:00
convert_from_argb.cc [AArch64] Add I8MM implementation of ARGBToUV444Row 2024-07-16 17:32:52 +00:00
convert_from.cc Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Remove M420 and refactor NV12ToI420 2020-05-26 18:48:00 +00:00
convert_to_i420.cc Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x. 2021-07-19 22:22:22 +00:00
convert.cc Convert16To8Row_AVX512BW using vpmovuswb 2024-08-15 20:13:33 +00:00
cpu_id.cc [AArch64] Enable SME feature detection on Apple Silicon 2024-07-08 16:19:27 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc [AArch64] Add Neon dot-product implementation for ARGBGrayRow 2024-09-16 04:31:11 +00:00
rotate_any.cc [AArch64] Fix rotate by odd sizes 2024-07-15 18:13:31 +00:00
rotate_argb.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
rotate_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_sme.cc [AArch64] Add SME implementation of TransposeUVWxH 2024-07-19 12:15:40 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc Rotate use NULL for C compatability 2024-07-23 18:02:47 +00:00
row_any.cc Convert16To8Row_AVX512BW using vpmovuswb 2024-08-15 20:13:33 +00:00
row_common.cc Fix -Wundef warnings 2024-08-02 17:39:59 +00:00
row_gcc.cc Convert16To8Row_AVX512BW using vpmovuswb 2024-08-15 20:13:33 +00:00
row_lasx.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
row_lsx.cc [AArch64] Fix SVE/SME vector length printing in cpuid 2024-07-02 19:44:41 +00:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc [AArch64] Add Neon dot-product implementation for ARGBGrayRow 2024-09-16 04:31:11 +00:00
row_neon.cc Fix -Wmissing-prototypes warnings 2024-08-12 19:08:24 +00:00
row_rvv.cc Fix -Wmissing-prototypes warnings 2024-08-12 19:08:24 +00:00
row_sve.cc Remove the ' separators in hex integer constants 2024-08-14 20:50:28 +00:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc [AArch64] Unroll ScaleRowDown4_NEON 2024-09-16 04:30:04 +00:00
scale_argb.cc [AArch64] Add SVE implementation for I422ToARGBRow 2024-04-27 18:26:11 +00:00
scale_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
scale_gcc.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc [AArch64] Unroll ScaleRowDown4_NEON 2024-09-16 04:30:04 +00:00
scale_neon.cc scale_neon.cc: Fix -Wmissing-prototypes warnings 2024-08-13 03:50:51 +00:00
scale_rgb.cc RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_rvv.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_uv.cc Fix -Wunused-parameter warnings in release builds 2024-08-14 20:49:37 +00:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc [AArch64] Unroll ScaleRowDown4_NEON 2024-09-16 04:30:04 +00:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00