libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-01-01 03:12:16 +08:00

History

George Steed e52007eff9 [AArch64] Add SVE2 implementation for I444ToARGBRow Being able to use SVE2 functionality for these kernels has a number of performance wins compared to the existing Neon code: * For the Y component calculation we are able to use UMULH, versus the existing UMULL x2 + UZP2 sequence in Neon. * For the RGBTORGBA8 calculation we are able to take advantage of interleaving narrowing instructions, allowing us to use ST2 rather than ST4 for the store. This is a big performance win on some micro-architectures where ST4 is costly. * The use of predication means we do not need to add "any" kernels, we can simply rerun the calculation with a not-full predicate for the final iteration. To avoid the overhead of generating a predicate register on every iteration we duplicate the loop body and only generate a predicate on the final iteration of the loop. This costs a small amount on the final iteration but should still be significantly quicker than the overhead of a function call needed by the "any" cases. Duplicating the loop body to reduce the use of the WHILELT instruction improves little core performance by ~12% by itself but has negligable impact on other micro-architectures. Reduction in runtime for the new SVE2 implementation compared to the existing Neon implementation on selected micro-architectures: Cortex-A510: -36.5% Cortex-A720: -17.3% Cortex-X2: -11.3% Bug: libyuv:973 Change-Id: I2a485f0dfa077a56f96b80a667ad38bbea47b4b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424739 Reviewed-by: Frank Barchard <fbarchard@chromium.org>		2024-04-09 03:11:01 +00:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	MT2T Warning fixes for fuchsia	2022-12-06 19:54:40 +00:00
compare_msa.cc	use unix line endings	2018-06-20 23:19:59 +00:00
compare_neon64.cc	Scale by even factor low level row function	2020-11-03 21:25:18 +00:00
compare_neon.cc	Scale by even factor low level row function	2020-11-03 21:25:18 +00:00
compare_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
compare.cc	MT2T Warning fixes for fuchsia	2022-12-06 19:54:40 +00:00
convert_argb.cc	[AArch64] Add SVE2 implementation for I444ToARGBRow	2024-04-09 03:11:01 +00:00
convert_from_argb.cc	[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow	2024-04-09 03:09:36 +00:00
convert_from.cc	Change ScalePlane,ScalePlane_16,... to return int	2023-11-03 23:53:24 +00:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Remove M420 and refactor NV12ToI420	2020-05-26 18:48:00 +00:00
convert_to_i420.cc	Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.	2021-07-19 22:22:22 +00:00
convert.cc	[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow	2024-04-09 03:09:36 +00:00
cpu_id.cc	[AArch64] Enable detection of additional architecture features	2024-04-05 17:48:22 +00:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_any.cc	Remove MMI support	2022-01-26 08:41:33 +00:00
rotate_argb.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
rotate_gcc.cc	Transpose 4x4 for SSE2 and AVX2	2023-03-03 17:46:23 +00:00
rotate_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
rotate_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
rotate_neon64.cc	MergeUV_AVX512BW for I420ToNV12	2023-02-13 20:14:57 +00:00
rotate_neon.cc	GCC warning fix for MT2T	2023-03-16 06:57:20 +00:00
rotate_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
rotate.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
row_any.cc	[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow	2024-04-09 03:09:36 +00:00
row_common.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_gcc.cc	YUY2ToARGB use ymm6/7 for shuffle constants	2024-01-22 21:47:23 +00:00
row_lasx.cc	AVX10 cpuid detect added	2024-01-10 00:08:22 +00:00
row_lsx.cc	Fix compilation errors.	2024-01-03 19:15:56 +00:00
row_msa.cc	Fix Bugs on mips platform V2.	2022-03-01 13:16:31 +00:00
row_neon64.cc	[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow	2024-04-09 03:09:36 +00:00
row_neon.cc	ARGBAttenuate use (a + b + 255) >> 8	2023-06-16 21:37:53 +00:00
row_rvv.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_sve.cc	[AArch64] Add SVE2 implementation for I444ToARGBRow	2024-04-09 03:11:01 +00:00
row_win.cc	Fix tidy warning that uint32_t dither4 should not be const	2023-06-02 00:42:02 +00:00
scale_any.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_argb.cc	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default	2023-12-07 22:54:23 +00:00
scale_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
scale_gcc.cc	ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows	2022-10-21 19:35:17 +00:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
scale_neon64.cc	Add missing memory/cc clobbers to AArch64 Neon kernels	2024-03-04 10:22:51 +00:00
scale_neon.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_rgb.cc	RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24	2022-03-19 01:44:06 +00:00
scale_rvv.cc	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default	2023-12-07 22:54:23 +00:00
scale_uv.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
scale_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
scale.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00