libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-06-17 17:36:39 +08:00

History

George Steed 3d66e94fb5 [AArch64] Improve ARGBToUVRow_SVE2 and related kernels This commit reworks the implementation of ARGBToUVMatrixRow_SVE2, using an approach similar to that recently used in 61bdaee13a701d2b52c6dc943ccc5c888077a591. In particular we can rework these SVE2 implementations to use 8-bit dot-product instructions instead of 16-bit, allowing us to process more data in a single vector. To ensure that the input values fit in 8-bits, negate the UV constants arrays passed to the kernel and undo the now-unnecessary flipping of the middle two component values. This commit mostly reverses the performance inversion where the Neon I8MM implementation was previously faster than the SVE2 implementation. The reduction in runtime observed compared to the existing Neon I8MM implementation is now: Cortex-A510: +5.6% (!) Cortex-A520: -3.0% Cortex-A710: -12.6% Cortex-A715: -10.9% Cortex-A720: -10.8% Cortex-X2: -3.8% Cortex-X3: -10.3% Cortex-X4: -9.5% Cortex-X925: -6.7% Change-Id: I30253976dc8e3651cfb5fd39b63a6763975d41e3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6640990 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>		2025-06-12 14:10:44 -07:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
compare_msa.cc	use unix line endings	2018-06-20 23:19:59 +00:00
compare_neon64.cc	ARGBToUV 64 bit use ymm8 for shuffler	2025-05-12 15:09:40 -07:00
compare_neon.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
compare_win.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
compare.cc	[AArch64] Add Neon implementation of HashDjb2	2024-05-01 19:37:31 +00:00
convert_argb.cc	Add SVE2 and SME implementations of I422ToAR30Row	2025-05-27 11:39:00 -07:00
convert_from_argb.cc	Add Neon I8MM implementations of ARGB to UV and variants	2025-05-12 11:14:00 -07:00
convert_from.cc	Sub sampling conversions use CopyPlane for Y channel	2025-01-02 13:34:11 -08:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Apply clang format	2025-01-02 13:31:20 -08:00
convert_to_i420.cc	Apply clang format	2025-01-02 13:31:20 -08:00
convert.cc	Add Neon I8MM implementations of ARGB to UV and variants	2025-05-12 11:14:00 -07:00
cpu_id.cc	Detect SME without SVE dependency	2025-03-31 17:27:40 -07:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	Add Neon implementation of Convert8To16Row	2025-05-29 13:37:48 -07:00
rotate_any.cc	[AArch64] Fix rotate by odd sizes	2024-07-15 18:13:31 +00:00
rotate_argb.cc	Apply clang format	2025-01-02 13:31:20 -08:00
rotate_common.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_gcc.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
rotate_lsx.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_msa.cc	cpuid show vector length on ARM and RISCV	2024-07-02 18:10:56 +00:00
rotate_neon64.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
rotate_neon.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
rotate_sme.cc	[AArch64] Re-enable SME only for Linux and new versions of Clang	2024-09-23 09:29:53 +00:00
rotate_win.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
rotate.cc	[AArch64] Add SME implementation of CopyRow	2024-12-12 03:02:07 -08:00
row_any.cc	ubsan compliant '_any' functions using ptrdiff_t for pointer math	2025-06-10 15:01:52 -07:00
row_common.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
row_gcc.cc	ARGBToUV 64 bit use ymm8 for shuffler	2025-05-12 15:09:40 -07:00
row_lasx.cc	Fix unified sources build for LoongArch LASX	2025-04-01 09:48:19 -07:00
row_lsx.cc	Fix unified sources build for LoongArch LASX	2025-04-01 09:48:19 -07:00
row_msa.cc	Fix Bugs on mips platform V2.	2022-03-01 13:16:31 +00:00
row_neon64.cc	TestI400LargeSize test __x86_64__, _M_X64, or __aarch64__	2025-06-10 15:53:02 -07:00
row_neon.cc	ARGBToUV allow 32 bit x86 build	2025-04-28 12:11:00 -07:00
row_rvv.cc	Apply clang format	2025-01-02 13:31:20 -08:00
row_sme.cc	Add SVE2 and SME implementations of I422ToAR30Row	2025-05-27 11:39:00 -07:00
row_sve.cc	[AArch64] Improve ARGBToUVRow_SVE2 and related kernels	2025-06-12 14:10:44 -07:00
row_win.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
scale_any.cc	[AArch64] Unroll and use TBL in ScaleRowDown34_NEON	2024-09-16 15:37:27 +00:00
scale_argb.cc	RVV disable 64 bit elements and vcombine_v	2025-03-25 12:51:25 -07:00
scale_common.cc	[AArch64] Add SME implementations of InterpolateRow{,_16,_16To8}	2024-12-12 03:03:41 -08:00
scale_gcc.cc	ARGBToUV 64 bit use ymm8 for shuffler	2025-05-12 15:09:40 -07:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
scale_neon64.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
scale_neon.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
scale_rgb.cc	Apply clang format	2025-01-02 13:31:20 -08:00
scale_rvv.cc	RVV disable 64 bit elements and vcombine_v	2025-03-25 12:51:25 -07:00
scale_sme.cc	Apply clang format	2025-01-02 13:31:20 -08:00
scale_uv.cc	Apply clang format	2025-01-02 13:31:20 -08:00
scale_win.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
scale.cc	J420ToI420 using planar 8 bit scaling	2025-01-22 02:50:24 -08:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00