George Steed 11ff6067a5 [AArch64] Add SVE2 implementation of RAWToRGB24Row
There is no nice way of forming the TBL permute indices here since we
are operating on sets of three bytes at a time, so instead load the
appropriate indices from a static array. We can make use of SVE
predication to ensure we are operating on a multiple of three bytes for
the load/store instructions rather than needing to make use of more
expensive LD3 or ST3 instructions.

Reduction in runtime observed compared to the existing Neon
implementation:

Cortex-A510: -39.2%
Cortex-A720: -34.5%
  Cortex-X2: -31.0%

Bug: libyuv:973
Change-Id: I68560bde7a529e5cec150b0e9d3ffe4341038fb8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5631543
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-07-08 15:55:14 +00:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc [AArch64] Add SVE2 implementations for RAWTo{ARGB,RGBA}Row 2024-07-06 22:40:15 +00:00
convert_from_argb.cc [AArch64] Add SVE2 implementation of ARGBToRGB565DitherRow 2024-06-03 23:15:04 +00:00
convert_from.cc Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Remove M420 and refactor NV12ToI420 2020-05-26 18:48:00 +00:00
convert_to_i420.cc Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x. 2021-07-19 22:22:22 +00:00
convert.cc [AArch64] Add SVE2 implementations for AYUVTo{UV,VU}Row 2024-06-04 18:18:07 +00:00
cpu_id.cc [AArch64] Add SME feature detection on Linux 2024-06-08 23:34:22 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc [AArch64] Add SVE2 implementation of RAWToRGB24Row 2024-07-08 15:55:14 +00:00
rotate_any.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_argb.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
rotate_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_sme.cc [AArch64] Add initial build system support for SME 2024-06-08 23:32:41 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc [AArch64] Remove unused code from TransposeUVWx8_NEON 2024-05-27 21:52:56 +00:00
row_any.cc [AArch64] Add P{210,410}To{ARGB,AR30}Row_NEON 2024-07-06 22:37:08 +00:00
row_common.cc [RVV] Support AR64ToAB64 and RGBA-family color conversions 2023-09-05 22:44:48 +00:00
row_gcc.cc [AArch64] Fix SVE/SME vector length printing in cpuid 2024-07-02 19:44:41 +00:00
row_lasx.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
row_lsx.cc [AArch64] Fix SVE/SME vector length printing in cpuid 2024-07-02 19:44:41 +00:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc [AArch64] Add P{210,410}To{ARGB,AR30}Row_NEON 2024-07-06 22:37:08 +00:00
row_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
row_rvv.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
row_sve.cc [AArch64] Add SVE2 implementation of RAWToRGB24Row 2024-07-08 15:55:14 +00:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc UVScale down by 2 fix for C and optimize for NEON 2023-04-12 22:49:20 +00:00
scale_argb.cc [AArch64] Add SVE implementation for I422ToARGBRow 2024-04-27 18:26:11 +00:00
scale_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
scale_gcc.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_rgb.cc RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_rvv.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_uv.cc Disable RVV ScaleDownBy4 if compiler option is not enabled 2024-06-18 01:52:40 +00:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00