George Steed 03a935493d [AArch64] Simplify predicate width calculations
Several of the existing SVE kernels used calculations of the form:

        remainder = width & (vl - 1) == 0 ? vl : width & (vl - 1);

This is due to initial SVE contributed code unconditionally using the
predicated tail for the final iteration even if the width was a perfect
multiple of the vector length.

In the current code the fully-predicated main body loop will instead
iterate through the width completely and simply skip over the tail
entirely. Skipping over the tail means that the case handled by the
ternary condition now never occurs, and the remainder calculation can
now simply be:

        remainder = width & (vl - 1);

This avoids the need for a compare and conditional select in the
function prologue.

Change-Id: Ia73f5f8bc66fad6bea64439dc2beeaccb54622d2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6067151
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-12-03 21:54:32 +00:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc Fix bugs in ARGBAttenuateRow_LASX/LSX function 2024-11-30 23:09:04 +00:00
convert_from_argb.cc [AArch64] Add I8MM implementation of ARGBToUV444Row 2024-07-16 17:32:52 +00:00
convert_from.cc Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Make functions that malloc check for ubsan math overflow 2024-10-08 21:08:34 +00:00
convert_to_i420.cc Make functions that malloc check for ubsan math overflow 2024-10-08 21:08:34 +00:00
convert.cc [AArch64] Use full Neon vectors in RGB565To{ARGB,UV,Y}Row_NEON 2024-09-16 04:35:47 +00:00
cpu_id.cc Add CopyPlane_Unaligned, _Any and _Invert tests/benchmarksCpuId test 2024-11-19 23:53:05 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
rotate_any.cc [AArch64] Fix rotate by odd sizes 2024-07-15 18:13:31 +00:00
rotate_argb.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
rotate_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_sme.cc [AArch64] Re-enable SME only for Linux and new versions of Clang 2024-09-23 09:29:53 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
row_any.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
row_common.cc Change ARGBMultiplyRow_C to match Neon 2024-09-23 21:48:33 +00:00
row_gcc.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
row_lasx.cc Fix bugs in ARGBAttenuateRow_LASX/LSX function 2024-11-30 23:09:04 +00:00
row_lsx.cc Fix bugs in ARGBAttenuateRow_LASX/LSX function 2024-11-30 23:09:04 +00:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
row_neon.cc HalfFloat fix SigIll on aarch64 2024-11-22 22:08:00 +00:00
row_rvv.cc Fix -Wmissing-prototypes warnings 2024-08-12 19:08:24 +00:00
row_sme.cc [AArch64] Add SME implementation of I444ToARGBRow 2024-10-29 18:10:23 +00:00
row_sve.cc [AArch64] Simplify predicate width calculations 2024-12-03 21:54:32 +00:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc [AArch64] Unroll and use TBL in ScaleRowDown34_NEON 2024-09-16 15:37:27 +00:00
scale_argb.cc [AArch64] Add SME implementation of I422ToARGBRow 2024-10-29 05:49:28 +00:00
scale_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
scale_gcc.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc [AArch64] Add Neon implementation of ScaleRowDown2Linear_16 2024-11-25 21:10:26 +00:00
scale_neon.cc scale_neon.cc: Fix -Wmissing-prototypes warnings 2024-08-13 03:50:51 +00:00
scale_rgb.cc Make functions that malloc check for ubsan math overflow 2024-10-08 21:08:34 +00:00
scale_rvv.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_sme.cc CpuId test FSMR - Fast Short Rep Movsb 2024-11-18 17:56:45 +00:00
scale_uv.cc [AArch64] Add SME implementation of ScaleUVRowDown2Box 2024-11-12 18:30:30 +00:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc [AArch64] Add Neon implementation of ScaleRowDown2Linear_16 2024-11-25 21:10:26 +00:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00