George Steed 432d186116 [AArch64] Add Neon dot-product implementation for ARGBSepiaRow
We can use the dot product instructions to apply the coefficients
directly without the need for LD4 de-interleaving load instructions,
since these are known to be slow on some micro-architectures.

ST4 is also known to be slow on more modern micro-architectures, however
avoiding this is left for a future SVE implementation where we can make
use of interleaving-narrowing instructions.

Reduction in cycle counts observed compared to existing Neon code:

 Cortex-A55:  -5.8%
Cortex-A510: -18.9%
 Cortex-A76: -21.8%
Cortex-A720: -30.2%
  Cortex-X1: -28.6%
  Cortex-X2: -23.4%

Bug: b/42280946
Change-Id: I5887559649cc805a810d867b652c85d48285657d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5790970
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-09-16 04:31:35 +00:00
..
basic_types.h Disable old int types by default. 2018-07-09 21:16:47 +00:00
compare_row.h [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
compare.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_argb.h YUY2ToARGBMatrix and UYVYToARGBMatrix added to allow any color matrix 2024-01-19 21:21:37 +00:00
convert_from_argb.h MM21ToYUY2 and ABGRToJ420 conversion 2022-08-16 22:07:38 +00:00
convert_from.h Add 10/12 bit YUV To YUV functions 2021-02-25 23:16:54 +00:00
convert.h Implement I010ToNV12 conversion 2024-08-06 17:36:13 +00:00
cpu_id.h Fix a -Wundef warning on macOS with Apple silicon 2024-08-14 22:10:43 +00:00
loongson_intrinsics.h RAWToJ400 faster version for ARM 2022-03-18 07:22:36 +00:00
macros_msa.h Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
mjpeg_decoder.h add YUV24 and AYUV formats 2019-03-05 02:53:56 +00:00
planar_functions.h Note stride params of HalfFloatPlane are in bytes 2024-08-12 20:17:23 +00:00
rotate_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_row.h Convert16To8Row_AVX512BW using vpmovuswb 2024-08-15 20:13:33 +00:00
rotate.h Add 10 bit rotate methods. 2023-01-04 21:10:01 +00:00
row.h [AArch64] Add Neon dot-product implementation for ARGBSepiaRow 2024-09-16 04:31:35 +00:00
scale_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_rgb.h RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_row.h Fix -Wmissing-prototypes warnings 2024-08-12 19:08:24 +00:00
scale_uv.h add yuvconvstants util 2021-02-12 19:45:16 +00:00
scale.h Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
version.h ScalePlane crash fix for 3/4 scaling 2024-09-13 01:20:39 +00:00
video_common.h Add support for AR64 format 2021-03-13 20:55:21 +00:00