George Steed c1fe5663f5 [AArch64] Use full vectors in ARGB4444To{Y,UV}Row_NEON
The existing ARGB4444TORGB macro only makes use of 64 bit wide vectors
rather than the full 128 bits available, so unroll it to allow us to
process more data per instruction.

For ARGB4444ToUVRow_NEON we already have enough data available each
iteration to make use of full vectors, but for ARGB4444ToYRow_NEON we
also need to adjust the "any" kernel to allow us to process 16 elements
per iteration.

Reduction in runtimes observed compared to the existing Neon kernels:

            | ARGB4444ToUVRow | ARGB4444ToYRow
 Cortex-A55 |          -27.8% |         -34.6%
Cortex-A510 |          -37.0% |         -44.4%
 Cortex-A76 |          -40.2% |         -22.0%
Cortex-A720 |          -33.4% |         -35.5%
  Cortex-X1 |          -34.1% |         -19.7%
  Cortex-X2 |          -32.1% |         -26.3%

Bug: libyuv:976
Change-Id: I08f6286bab0ebf5e24d5d5803f8c45ec6ba776ee
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5631541
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-07-10 23:12:43 +00:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
compare_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
compare.cc [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
convert_argb.cc [AArch64] Add SVE2 implementation of RGB24ToARGBRow 2024-07-08 20:12:05 +00:00
convert_from_argb.cc [AArch64] Add SVE2 implementations of ARGBTo{RAW,RGB24}Row 2024-07-08 20:27:54 +00:00
convert_from.cc Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Remove M420 and refactor NV12ToI420 2020-05-26 18:48:00 +00:00
convert_to_i420.cc Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x. 2021-07-19 22:22:22 +00:00
convert.cc [AArch64] Add SVE2 implementations for AYUVTo{UV,VU}Row 2024-06-04 18:18:07 +00:00
cpu_id.cc [AArch64] Enable SME feature detection on Apple Silicon 2024-07-08 16:19:27 +00:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc [AArch64] Add SVE2 implementation of RAWToRGB24Row 2024-07-08 15:55:14 +00:00
rotate_any.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_argb.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_msa.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
rotate_neon64.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
rotate_sme.cc [AArch64] Add initial build system support for SME 2024-06-08 23:32:41 +00:00
rotate_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
rotate.cc [AArch64] Remove unused code from TransposeUVWx8_NEON 2024-05-27 21:52:56 +00:00
row_any.cc [AArch64] Use full vectors in ARGB4444To{Y,UV}Row_NEON 2024-07-10 23:12:43 +00:00
row_common.cc [RVV] Support AR64ToAB64 and RGBA-family color conversions 2023-09-05 22:44:48 +00:00
row_gcc.cc [AArch64] Fix SVE/SME vector length printing in cpuid 2024-07-02 19:44:41 +00:00
row_lasx.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
row_lsx.cc [AArch64] Fix SVE/SME vector length printing in cpuid 2024-07-02 19:44:41 +00:00
row_msa.cc Fix Bugs on mips platform V2. 2022-03-01 13:16:31 +00:00
row_neon64.cc [AArch64] Use full vectors in ARGB4444To{Y,UV}Row_NEON 2024-07-10 23:12:43 +00:00
row_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
row_rvv.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
row_sve.cc [AArch64] Add SVE2 implementations of ARGBTo{RAW,RGB24}Row 2024-07-08 20:27:54 +00:00
row_win.cc Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
scale_any.cc UVScale down by 2 fix for C and optimize for NEON 2023-04-12 22:49:20 +00:00
scale_argb.cc [AArch64] Add SVE implementation for I422ToARGBRow 2024-04-27 18:26:11 +00:00
scale_common.cc Fix warnings for missing prototypes 2023-06-30 17:46:56 +00:00
scale_gcc.cc cpuid show vector length on ARM and RISCV 2024-07-02 18:10:56 +00:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc [AArch64] Rework data loading in ScaleARGBFilterCols_NEON 2024-07-10 23:10:43 +00:00
scale_neon.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_rgb.cc RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_rvv.cc Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
scale_uv.cc Disable RVV ScaleDownBy4 if compiler option is not enabled 2024-06-18 01:52:40 +00:00
scale_win.cc Switch win32 to row_gcc for clangcl. 2021-04-22 19:32:32 +00:00
scale.cc malloc return 1 for failures and assert for internal functions 2023-12-04 22:55:20 +00:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00