George Steed 2c32b689e4 [AArch64] Improve instruction interleaving in READI212_SVE
The existing instruction arrangement is sub-optimal on little cores
since it has instructions with dependencies next to each other, so
spread them out to improve performance.

No significant change observed on bigger cores, but little cores do show
some small improvements except for the *Alpha* kernels which regress
slightly.

Runtimes observed compared to the previous SVE implementation:

                   | Cortex-A510 | Cortex-A520
I210AlphaToARGBRow |   (!) +7.0% |   (!) +6.8%
     I210ToAR30Row |      -10.3% |       -9.9%
     I210ToARGBRow |       -2.4% |       -2.3%
     I212ToAR30Row |      -10.3% |       -9.9%
     I212ToARGBRow |       -2.4% |       -2.3%

Change-Id: I626942ce02c4610cfac1ea4f8e7890653ee4324f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6067150
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-12-03 21:50:47 +00:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc Fix bugs in ARGBAttenuateRow_LASX/LSX function 2024-11-30 23:09:04 +00:00
convert_from_argb.cc [AArch64] Add I8MM implementation of ARGBToUV444Row 2024-07-16 17:32:52 +00:00
convert_from.cc Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Make functions that malloc check for ubsan math overflow 2024-10-08 21:08:34 +00:00
convert_to_i420.cc Make functions that malloc check for ubsan math overflow 2024-10-08 21:08:34 +00:00
convert.cc [AArch64] Use full Neon vectors in RGB565To{ARGB,UV,Y}Row_NEON 2024-09-16 04:35:47 +00:00
cpu_id.cc Add CopyPlane_Unaligned, _Any and _Invert tests/benchmarksCpuId test 2024-11-19 23:53:05 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
rotate_any.cc [AArch64] Fix rotate by odd sizes 2024-07-15 18:13:31 +00:00
rotate_argb.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
rotate_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_sme.cc [AArch64] Re-enable SME only for Linux and new versions of Clang 2024-09-23 09:29:53 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
row_any.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
row_common.cc Change ARGBMultiplyRow_C to match Neon 2024-09-23 21:48:33 +00:00
row_gcc.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
row_lasx.cc Fix bugs in ARGBAttenuateRow_LASX/LSX function 2024-11-30 23:09:04 +00:00
row_lsx.cc Fix bugs in ARGBAttenuateRow_LASX/LSX function 2024-11-30 23:09:04 +00:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
row_neon.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
row_rvv.cc Fix -Wmissing-prototypes warnings 2024-08-12 19:08:24 +00:00
row_sme.cc [AArch64] Add SME implementation of I444ToARGBRow 2024-10-29 18:10:23 +00:00
row_sve.cc [AArch64] Improve instruction interleaving in READI212_SVE 2024-12-03 21:50:47 +00:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc [AArch64] Unroll and use TBL in ScaleRowDown34_NEON 2024-09-16 15:37:27 +00:00
scale_argb.cc [AArch64] Add SME implementation of I422ToARGBRow 2024-10-29 05:49:28 +00:00
scale_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
scale_gcc.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc [AArch64] Add Neon implementation of ScaleRowDown2Linear_16 2024-11-25 21:10:26 +00:00
scale_neon.cc scale_neon.cc: Fix -Wmissing-prototypes warnings 2024-08-13 03:50:51 +00:00
scale_rgb.cc Make functions that malloc check for ubsan math overflow 2024-10-08 21:08:34 +00:00
scale_rvv.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_sme.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
scale_uv.cc [AArch64] Add SME implementation of ScaleUVRowDown2Box 2024-11-12 18:30:30 +00:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc [AArch64] Add Neon implementation of ScaleRowDown2Linear_16 2024-11-25 21:10:26 +00:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00