George Steed 949cb623bf Add SVE2 and SME implementations of I444ToRGB24Row
Move the READYUV444_SVE_2X and I444TORGB_SVE_2X macros to row_sve.h so
they are usable in both SVE2 and SME implementations, and use them to
add new I444ToRGB24Row implementations for SVE2 and SME. We need to use
the unrolled versions here to use the ST3B interleaving store
instructions, since there is no partial vector version of this store
instruction.

Reduction in time taken observed for the new SVE2 implementation,
compared to the existing Neon implementation:

Cortex-A510: -57.6%
Cortex-A520: -38.1%
Cortex-A710: -15.5%
Cortex-A715:  -9.2%
Cortex-A720:  -9.2%
  Cortex-X2: -25.8%
  Cortex-X3: -26.2%
  Cortex-X4: -23.2%
Cortex-X925: -17.8%

Change-Id: I6acd0b798a35e5352d4fad664769f12d3d938ed7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6530646
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-05-22 13:33:06 -07:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc ARGBToUV 64 bit use ymm8 for shuffler 2025-05-12 15:09:40 -07:00
compare_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
compare_win.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc Add SVE2 and SME implementations of I444ToRGB24Row 2025-05-22 13:33:06 -07:00
convert_from_argb.cc Add Neon I8MM implementations of ARGB to UV and variants 2025-05-12 11:14:00 -07:00
convert_from.cc Sub sampling conversions use CopyPlane for Y channel 2025-01-02 13:34:11 -08:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Apply clang format 2025-01-02 13:31:20 -08:00
convert_to_i420.cc Apply clang format 2025-01-02 13:31:20 -08:00
convert.cc Add Neon I8MM implementations of ARGB to UV and variants 2025-05-12 11:14:00 -07:00
cpu_id.cc Detect SME without SVE dependency 2025-03-31 17:27:40 -07:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc [AArch64] Add SVE2 and SME implementations for Convert8To8Row 2025-01-28 15:53:26 -08:00
rotate_any.cc [AArch64] Fix rotate by odd sizes 2024-07-15 18:13:31 +00:00
rotate_argb.cc Apply clang format 2025-01-02 13:31:20 -08:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
rotate_neon64.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
rotate_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
rotate_sme.cc [AArch64] Re-enable SME only for Linux and new versions of Clang 2024-09-23 09:29:53 +00:00
rotate_win.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
rotate.cc [AArch64] Add SME implementation of CopyRow 2024-12-12 03:02:07 -08:00
row_any.cc Add Neon I8MM implementations of ARGB to UV and variants 2025-05-12 11:14:00 -07:00
row_common.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
row_gcc.cc ARGBToUV 64 bit use ymm8 for shuffler 2025-05-12 15:09:40 -07:00
row_lasx.cc Fix unified sources build for LoongArch LASX 2025-04-01 09:48:19 -07:00
row_lsx.cc Fix unified sources build for LoongArch LASX 2025-04-01 09:48:19 -07:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc ARGBToUV 64 bit use ymm8 for shuffler 2025-05-12 15:09:40 -07:00
row_neon.cc ARGBToUV allow 32 bit x86 build 2025-04-28 12:11:00 -07:00
row_rvv.cc Apply clang format 2025-01-02 13:31:20 -08:00
row_sme.cc Add SVE2 and SME implementations of I444ToRGB24Row 2025-05-22 13:33:06 -07:00
row_sve.cc Add SVE2 and SME implementations of I444ToRGB24Row 2025-05-22 13:33:06 -07:00
row_win.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
scale_any.cc [AArch64] Unroll and use TBL in ScaleRowDown34_NEON 2024-09-16 15:37:27 +00:00
scale_argb.cc RVV disable 64 bit elements and vcombine_v 2025-03-25 12:51:25 -07:00
scale_common.cc [AArch64] Add SME implementations of InterpolateRow{,_16,_16To8} 2024-12-12 03:03:41 -08:00
scale_gcc.cc ARGBToUV 64 bit use ymm8 for shuffler 2025-05-12 15:09:40 -07:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
scale_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
scale_rgb.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_rvv.cc RVV disable 64 bit elements and vcombine_v 2025-03-25 12:51:25 -07:00
scale_sme.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_uv.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_win.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
scale.cc J420ToI420 using planar 8 bit scaling 2025-01-22 02:50:24 -08:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00