libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-07-31 08:46:21 +08:00

History

George Steed 5d694bec38 [AArch64] Replace UQSHRN{,2} pair by UZP2 in YUVTORGB The existing Neon code makes use of a pair of UQSHRN and UQSHRN2 instructions to extract the top half of a widened multiply result. These instructions would ordinarily saturate, however saturation can never happen in this case since we are shifting by 16 to get the top half of each element, the top bits remain as-is. We could move this to using a slightly simpler non-saturating shift, however in this case it is simpler and faster to just use UZP2 to extract the top half of each 32-bit lane directly. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 I400ToARGBRow_NEON \| -9.4% \| -14.9% \| -13.9% I422AlphaToARGBRow_NEON \| -7.9% \| -11.4% \| -11.5% I422ToARGB1555Row_NEON \| -7.3% \| -17.2% \| -14.7% I422ToARGB4444Row_NEON \| -7.6% \| -17.9% \| -13.7% I422ToARGBRow_NEON \| -8.2% \| -9.8% \| -11.9% I422ToRGB24Row_NEON \| -8.0% \| -13.3% \| -12.8% I422ToRGB565Row_NEON \| -7.5% \| -15.1% \| -14.6% I422ToRGBARow_NEON \| -8.3% \| -13.1% \| -12.2% I444AlphaToARGBRow_NEON \| -8.3% \| -7.6% \| -12.7% I444ToARGBRow_NEON \| -8.6% \| -3.5% \| -13.5% I444ToRGB24Row_NEON \| -8.5% \| -7.8% \| -13.4% NV12ToARGBRow_NEON \| -8.8% \| -1.4% \| -12.0% NV12ToRGB24Row_NEON \| -8.5% \| -11.5% \| -12.3% NV12ToRGB565Row_NEON \| -7.9% \| -15.0% \| -15.7% NV21ToARGBRow_NEON \| -8.7% \| -1.6% \| -12.3% NV21ToRGB24Row_NEON \| -8.4% \| -11.5% \| -12.0% UYVYToARGBRow_NEON \| -8.8% \| -8.9% \| -11.9% YUY2ToARGBRow_NEON \| -8.7% \| -10.8% \| -13.3% Bug: libyuv:976 Change-Id: I6c505fe722e5f91f93718b85fe881ad056d8602d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5366653 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>		2024-03-14 20:04:46 +00:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	MT2T Warning fixes for fuchsia	2022-12-06 19:54:40 +00:00
compare_msa.cc	use unix line endings	2018-06-20 23:19:59 +00:00
compare_neon64.cc	Scale by even factor low level row function	2020-11-03 21:25:18 +00:00
compare_neon.cc	Scale by even factor low level row function	2020-11-03 21:25:18 +00:00
compare_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
compare.cc	MT2T Warning fixes for fuchsia	2022-12-06 19:54:40 +00:00
convert_argb.cc	YUY2ToARGBMatrix and UYVYToARGBMatrix added to allow any color matrix	2024-01-19 21:21:37 +00:00
convert_from_argb.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
convert_from.cc	Change ScalePlane,ScalePlane_16,... to return int	2023-11-03 23:53:24 +00:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Remove M420 and refactor NV12ToI420	2020-05-26 18:48:00 +00:00
convert_to_i420.cc	Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.	2021-07-19 22:22:22 +00:00
convert.cc	I444ToI420 and I422ToI420 check U and V pointers and return -1 if NULL.	2024-01-18 21:56:11 +00:00
cpu_id.cc	Revert "AMX detect OS support for linux kernel"	2024-02-29 00:33:29 +00:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_any.cc	Remove MMI support	2022-01-26 08:41:33 +00:00
rotate_argb.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
rotate_gcc.cc	Transpose 4x4 for SSE2 and AVX2	2023-03-03 17:46:23 +00:00
rotate_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
rotate_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
rotate_neon64.cc	MergeUV_AVX512BW for I420ToNV12	2023-02-13 20:14:57 +00:00
rotate_neon.cc	GCC warning fix for MT2T	2023-03-16 06:57:20 +00:00
rotate_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
rotate.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
row_any.cc	Optimize the following 19 functions with LSX in row_lsx.cc.	2023-05-19 18:55:58 +00:00
row_common.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_gcc.cc	YUY2ToARGB use ymm6/7 for shuffle constants	2024-01-22 21:47:23 +00:00
row_lasx.cc	AVX10 cpuid detect added	2024-01-10 00:08:22 +00:00
row_lsx.cc	Fix compilation errors.	2024-01-03 19:15:56 +00:00
row_msa.cc	Fix Bugs on mips platform V2.	2022-03-01 13:16:31 +00:00
row_neon64.cc	[AArch64] Replace UQSHRN{,2} pair by UZP2 in YUVTORGB	2024-03-14 20:04:46 +00:00
row_neon.cc	ARGBAttenuate use (a + b + 255) >> 8	2023-06-16 21:37:53 +00:00
row_rvv.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_win.cc	Fix tidy warning that uint32_t dither4 should not be const	2023-06-02 00:42:02 +00:00
scale_any.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_argb.cc	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default	2023-12-07 22:54:23 +00:00
scale_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
scale_gcc.cc	ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows	2022-10-21 19:35:17 +00:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
scale_neon64.cc	Add missing memory/cc clobbers to AArch64 Neon kernels	2024-03-04 10:22:51 +00:00
scale_neon.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_rgb.cc	RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24	2022-03-19 01:44:06 +00:00
scale_rvv.cc	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default	2023-12-07 22:54:23 +00:00
scale_uv.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
scale_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
scale.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00