mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2025-12-07 17:26:49 +08:00
Most micro-architectures seem to prefer an additional ZIP1 instruction
in READYUV422 to needing a lane-indexed LD1 load instruction.
We introduce a new macro to handle the YUV to RGB conversion where the U
and V components are in separate vectors. This avoids causing a slowdown
for the UV-interleaved input format kernels (NV12 and NV21) where we do
not want to separate them.
Reduction in runtime for selected kernels on Cortex cores (no
performance difference observed on Cortex-A55):
A510 A76 A720 X1 X2
I422AlphaToARGBRow_NEON -4.3% -7.3% -10.1% -4.0% -4.4%
I422ToARGB1555Row_NEON -4.5% +0.4% -7.9% -4.8% -3.9%
I422ToARGB4444Row_NEON -7.7% -2.6% -4.1% -1.9% -1.3%
I422ToARGBRow_NEON -3.7% -2.9% -10.2% -3.8% -4.4%
I422ToRGB24Row_NEON -5.9% +5.4% -3.2% -4.3% -4.3%
I422ToRGB565Row_NEON -4.8% -2.8% -8.5% -3.8% -4.6%
I422ToRGBARow_NEON -3.7% +4.6% -10.5% -3.0% -4.5%
I444AlphaToARGBRow_NEON -3.5% +2.7% -3.7% -5.0% -8.2%
I444ToARGBRow_NEON -1.8% -15.1% -3.5% -6.5% -8.1%
I444ToRGB24Row_NEON -2.0% -6.8% +0.1% -4.7% +1.2%
There are a few cases which are slower on Cortex-A76, but significant
speedups elsewhere.
Bug: libyuv:976
Change-Id: Ib3b4ef81f7bfc1d7ff9c4c24aef9ad86741410ff
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465580
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
|
||
|---|---|---|
| .. | ||
| compare_common.cc | ||
| compare_gcc.cc | ||
| compare_msa.cc | ||
| compare_neon64.cc | ||
| compare_neon.cc | ||
| compare_win.cc | ||
| compare.cc | ||
| convert_argb.cc | ||
| convert_from_argb.cc | ||
| convert_from.cc | ||
| convert_jpeg.cc | ||
| convert_to_argb.cc | ||
| convert_to_i420.cc | ||
| convert.cc | ||
| cpu_id.cc | ||
| mjpeg_decoder.cc | ||
| mjpeg_validate.cc | ||
| planar_functions.cc | ||
| rotate_any.cc | ||
| rotate_argb.cc | ||
| rotate_common.cc | ||
| rotate_gcc.cc | ||
| rotate_lsx.cc | ||
| rotate_msa.cc | ||
| rotate_neon64.cc | ||
| rotate_neon.cc | ||
| rotate_win.cc | ||
| rotate.cc | ||
| row_any.cc | ||
| row_common.cc | ||
| row_gcc.cc | ||
| row_lasx.cc | ||
| row_lsx.cc | ||
| row_msa.cc | ||
| row_neon64.cc | ||
| row_neon.cc | ||
| row_rvv.cc | ||
| row_sve.cc | ||
| row_win.cc | ||
| scale_any.cc | ||
| scale_argb.cc | ||
| scale_common.cc | ||
| scale_gcc.cc | ||
| scale_lsx.cc | ||
| scale_msa.cc | ||
| scale_neon64.cc | ||
| scale_neon.cc | ||
| scale_rgb.cc | ||
| scale_rvv.cc | ||
| scale_uv.cc | ||
| scale_win.cc | ||
| scale.cc | ||
| test.sh | ||
| video_common.cc | ||