George Steed 1c31461771 [AArch64] Add Neon dot-product implementation for ARGBGrayRow
We can use dot product instructions to apply the coefficients without
needing to use LD4 deinterleaving load instructions, and then TBL to mix
in the original alpha component. This is significantly faster on some
micro-architectures where LD4 instructions are known to be slow compared
to normal loads.

Reduction in cycle counts observed compared to existing Neon code:

 Cortex-A55: -12.6%
Cortex-A510: -48.6%
 Cortex-A76: -39.7%
Cortex-A720: -52.3%
  Cortex-X1: -63.5%
  Cortex-X2: -67.0%

Bug: b/42280946
Change-Id: I3641785e74873438acc00d675f5bc490dfa95b50
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5785972
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-09-16 04:31:11 +00:00
..
basic_types.h Disable old int types by default. 2018-07-09 21:16:47 +00:00
compare_row.h [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
compare.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_argb.h YUY2ToARGBMatrix and UYVYToARGBMatrix added to allow any color matrix 2024-01-19 21:21:37 +00:00
convert_from_argb.h MM21ToYUY2 and ABGRToJ420 conversion 2022-08-16 22:07:38 +00:00
convert_from.h Add 10/12 bit YUV To YUV functions 2021-02-25 23:16:54 +00:00
convert.h Implement I010ToNV12 conversion 2024-08-06 17:36:13 +00:00
cpu_id.h Fix a -Wundef warning on macOS with Apple silicon 2024-08-14 22:10:43 +00:00
loongson_intrinsics.h RAWToJ400 faster version for ARM 2022-03-18 07:22:36 +00:00
macros_msa.h Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
mjpeg_decoder.h add YUV24 and AYUV formats 2019-03-05 02:53:56 +00:00
planar_functions.h Note stride params of HalfFloatPlane are in bytes 2024-08-12 20:17:23 +00:00
rotate_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_row.h Convert16To8Row_AVX512BW using vpmovuswb 2024-08-15 20:13:33 +00:00
rotate.h Add 10 bit rotate methods. 2023-01-04 21:10:01 +00:00
row.h [AArch64] Add Neon dot-product implementation for ARGBGrayRow 2024-09-16 04:31:11 +00:00
scale_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_rgb.h RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_row.h Fix -Wmissing-prototypes warnings 2024-08-12 19:08:24 +00:00
scale_uv.h add yuvconvstants util 2021-02-12 19:45:16 +00:00
scale.h Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
version.h ScalePlane crash fix for 3/4 scaling 2024-09-13 01:20:39 +00:00
video_common.h Add support for AR64 format 2021-03-13 20:55:21 +00:00