George Steed 96bbdb53ed [AArch64] Add SVE2 implementation of I422ToRGBARow
This is almost identical to the existing I422ToARGBRow_SVE2 kernel, we
just need to interleave differently for the output.

The RGBA format actually saves us an instruction compared to ARGB since
there is no need to merge in the alpha component, we can just replace
the odd elements of the alpha vector itself during the narrowing.

Also rename some existing macros to make more sense when distinguishing
between ARGB and RGBA.

Reductions in runtime observed compared to the existing Neon code:

Cortex-A510: -27.0%
Cortex-A720:  -5.3%
  Cortex-X2: -14.7%

Bug: libyuv:973
Change-Id: I1e12ff608ee49c25b918097007e16d87b39cb067
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5593797
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-06-04 18:18:07 +00:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc MT2T Warning fixes for fuchsia 2022-12-06 19:54:40 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
compare_neon.cc Scale by even factor low level row function 2020-11-03 21:25:18 +00:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc [AArch64] Add SVE2 implementation of I422ToRGBARow 2024-06-04 18:18:07 +00:00
convert_from_argb.cc [AArch64] Add SVE2 implementation of ARGBToRGB565DitherRow 2024-06-03 23:15:04 +00:00
convert_from.cc Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Remove M420 and refactor NV12ToI420 2020-05-26 18:48:00 +00:00
convert_to_i420.cc Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x. 2021-07-19 22:22:22 +00:00
convert.cc [AArch64] Add SVE2 implementations for AYUVTo{UV,VU}Row 2024-06-04 18:18:07 +00:00
cpu_id.cc [AArch64] Impose feature dependencies in detection code 2024-05-21 07:21:49 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc Remove unneeded #ifdef HAVE_JPEG code 2024-05-09 23:02:18 +00:00
rotate_any.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_argb.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc Transpose 4x4 for SSE2 and AVX2 2023-03-03 17:46:23 +00:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_neon64.cc [AArch64] Use ST2 to avoid TRN step in TransposeWx16_NEON 2024-05-31 08:27:05 +00:00
rotate_neon.cc [Arm] Clean up rotate_neon.cc kernels 2024-06-03 22:23:40 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc [AArch64] Remove unused code from TransposeUVWx8_NEON 2024-05-27 21:52:56 +00:00
row_any.cc [AArch64] Add SVE2 implementations for AYUVTo{UV,VU}Row 2024-06-04 18:18:07 +00:00
row_common.cc [RVV] Support AR64ToAB64 and RGBA-family color conversions 2023-09-05 22:44:48 +00:00
row_gcc.cc YUY2ToARGB use ymm6/7 for shuffle constants 2024-01-22 21:47:23 +00:00
row_lasx.cc AVX10 cpuid detect added 2024-01-10 00:08:22 +00:00
row_lsx.cc Fix compilation errors. 2024-01-03 19:15:56 +00:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc [AArch64] Use full Neon vectors in ARGB4444ToARGBRow_NEON 2024-06-03 22:52:33 +00:00
row_neon.cc [AArch64] Replace instances of ORR with MOV where possible 2024-04-25 20:48:16 +00:00
row_rvv.cc [RVV] Support AR64ToAB64 and RGBA-family color conversions 2023-09-05 22:44:48 +00:00
row_sve.cc [AArch64] Add SVE2 implementation of I422ToRGBARow 2024-06-04 18:18:07 +00:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc UVScale down by 2 fix for C and optimize for NEON 2023-04-12 22:49:20 +00:00
scale_argb.cc [AArch64] Add SVE implementation for I422ToARGBRow 2024-04-27 18:26:11 +00:00
scale_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
scale_gcc.cc ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows 2022-10-21 19:35:17 +00:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc [AArch64] Replace instances of ORR with MOV where possible 2024-04-25 20:48:16 +00:00
scale_neon.cc UVScale down by 2 fix for C and optimize for NEON 2023-04-12 22:49:20 +00:00
scale_rgb.cc RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_rvv.cc Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default 2023-12-07 22:54:23 +00:00
scale_uv.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00