George Steed c4a0c8d34a [AArch64] Add SVE2 and SME implementations for Convert8To8Row
SVE can make use of the UMULH instruction to avoid needing separate
widening multiply and narrowing steps for the scale application.

Reduction in runtime for Convert8To8Row_SVE2 observed compared to the
existing Neon implementation:

        Cortex-A510: -13.2%
        Cortex-A520: -16.4%
        Cortex-A710: -37.1%
        Cortex-A715: -38.5%
        Cortex-A720: -38.4%
          Cortex-X2: -33.2%
          Cortex-X3: -31.8%
          Cortex-X4: -31.8%
        Cortex-X925: -13.9%

Change-Id: I17c0cb81661c5fbce786b47cdf481549cfdcbfc7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6207692
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-01-28 15:53:26 -08:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc Apply clang format 2025-01-02 13:31:20 -08:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc Apply clang format 2025-01-02 13:31:20 -08:00
compare_neon.cc Apply clang format 2025-01-02 13:31:20 -08:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc [AArch64] Port YUVToRGB color conversions to SME 2024-12-12 03:07:54 -08:00
convert_from_argb.cc [AArch64] Add SME implementation of MergeUVRow{,_16} 2024-12-12 01:16:19 -08:00
convert_from.cc Sub sampling conversions use CopyPlane for Y channel 2025-01-02 13:34:11 -08:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Apply clang format 2025-01-02 13:31:20 -08:00
convert_to_i420.cc Apply clang format 2025-01-02 13:31:20 -08:00
convert.cc J420ToI420 using planar 8 bit scaling 2025-01-22 02:50:24 -08:00
cpu_id.cc J420ToI420 using planar 8 bit scaling 2025-01-22 02:50:24 -08:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc [AArch64] Add SVE2 and SME implementations for Convert8To8Row 2025-01-28 15:53:26 -08:00
rotate_any.cc [AArch64] Fix rotate by odd sizes 2024-07-15 18:13:31 +00:00
rotate_argb.cc Apply clang format 2025-01-02 13:31:20 -08:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc Apply clang format 2025-01-02 13:31:20 -08:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
rotate_neon64.cc Apply clang format 2025-01-02 13:31:20 -08:00
rotate_neon.cc Apply clang format 2025-01-02 13:31:20 -08:00
rotate_sme.cc [AArch64] Re-enable SME only for Linux and new versions of Clang 2024-09-23 09:29:53 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc [AArch64] Add SME implementation of CopyRow 2024-12-12 03:02:07 -08:00
row_any.cc J420ToI420 AVX2 2025-01-27 11:23:44 -08:00
row_common.cc J420ToI420 using planar 8 bit scaling 2025-01-22 02:50:24 -08:00
row_gcc.cc J420ToI420 AVX2 2025-01-27 11:23:44 -08:00
row_lasx.cc Apply clang format 2025-01-02 13:31:20 -08:00
row_lsx.cc Apply clang format 2025-01-02 13:31:20 -08:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc aarch32 J420ToI420 2025-01-22 13:47:09 -08:00
row_neon.cc aarch32 J420ToI420 2025-01-22 13:47:09 -08:00
row_rvv.cc Apply clang format 2025-01-02 13:31:20 -08:00
row_sme.cc [AArch64] Add SVE2 and SME implementations for Convert8To8Row 2025-01-28 15:53:26 -08:00
row_sve.cc [AArch64] Add SVE2 and SME implementations for Convert8To8Row 2025-01-28 15:53:26 -08:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc [AArch64] Unroll and use TBL in ScaleRowDown34_NEON 2024-09-16 15:37:27 +00:00
scale_argb.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_common.cc [AArch64] Add SME implementations of InterpolateRow{,_16,_16To8} 2024-12-12 03:03:41 -08:00
scale_gcc.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc J420ToI420 using planar 8 bit scaling 2025-01-22 02:50:24 -08:00
scale_neon.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_rgb.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_rvv.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_sme.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_uv.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc J420ToI420 using planar 8 bit scaling 2025-01-22 02:50:24 -08:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00