George Steed 6c70eb2819 [AArch64] Add Neon impls for I{210,410}ToAR30Row_NEON
There are existing x86 implementations for these kernels, but not for
AArch64, so add them.

Reduction in runtimes, compared to the existing C code compiled with
LLVM 17:

 I210ToAR30Row on Cortex-A55: -43.8%
I210ToAR30Row on Cortex-A510: -27.0%
 I210ToAR30Row on Cortex-A76: -50.4%
 I410ToAR30Row on Cortex-A55: -44.3%
I410ToAR30Row on Cortex-A510: -17.5%
 I410ToAR30Row on Cortex-A76: -57.2%

Co-authored-by: Cosmina Dunca <cosmina.dunca@arm.com>
Bug: libyuv:976
Change-Id: Ib5fb9b2ce6ef06ec76ecd8473be5fe76d2622fbc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5593931
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-06-03 22:46:12 +00:00
..
compare_common.cc
compare_gcc.cc
compare_msa.cc
compare_neon64.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
compare_neon.cc
compare_win.cc
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc [AArch64] Add Neon impls for I{210,410}ToAR30Row_NEON 2024-06-03 22:46:12 +00:00
convert_from_argb.cc [AArch64] Add SVE2 implementation of ARGBToRGB565Row 2024-05-31 17:42:27 +00:00
convert_from.cc
convert_jpeg.cc
convert_to_argb.cc
convert_to_i420.cc
convert.cc [AArch64] Add SVE2 implementations for ARGBToUVRow and similar 2024-05-01 19:46:43 +00:00
cpu_id.cc [AArch64] Impose feature dependencies in detection code 2024-05-21 07:21:49 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc
planar_functions.cc Remove unneeded #ifdef HAVE_JPEG code 2024-05-09 23:02:18 +00:00
rotate_any.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_argb.cc
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_neon64.cc [AArch64] Use ST2 to avoid TRN step in TransposeWx16_NEON 2024-05-31 08:27:05 +00:00
rotate_neon.cc [Arm] Clean up rotate_neon.cc kernels 2024-06-03 22:23:40 +00:00
rotate_win.cc
rotate.cc [AArch64] Remove unused code from TransposeUVWx8_NEON 2024-05-27 21:52:56 +00:00
row_any.cc [AArch64] Add Neon impls for I{210,410}ToAR30Row_NEON 2024-06-03 22:46:12 +00:00
row_common.cc
row_gcc.cc YUY2ToARGB use ymm6/7 for shuffle constants 2024-01-22 21:47:23 +00:00
row_lasx.cc AVX10 cpuid detect added 2024-01-10 00:08:22 +00:00
row_lsx.cc Fix compilation errors. 2024-01-03 19:15:56 +00:00
row_msa.cc
row_neon64.cc [AArch64] Add Neon impls for I{210,410}ToAR30Row_NEON 2024-06-03 22:46:12 +00:00
row_neon.cc [AArch64] Replace instances of ORR with MOV where possible 2024-04-25 20:48:16 +00:00
row_rvv.cc
row_sve.cc [AArch64] Add SVE2 implementation of ARGBToRGB565Row 2024-05-31 17:42:27 +00:00
row_win.cc
scale_any.cc
scale_argb.cc [AArch64] Add SVE implementation for I422ToARGBRow 2024-04-27 18:26:11 +00:00
scale_common.cc
scale_gcc.cc
scale_lsx.cc
scale_msa.cc
scale_neon64.cc [AArch64] Replace instances of ORR with MOV where possible 2024-04-25 20:48:16 +00:00
scale_neon.cc
scale_rgb.cc
scale_rvv.cc
scale_uv.cc
scale_win.cc
scale.cc
test.sh
video_common.cc