libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-01-01 03:12:16 +08:00

History

George Steed 367dd50755 [AArch64] Add SVE2 impls for {UYVY,YUY2}ToARGBRow This is mostly similar to the existing NV{12,21}ToARGBRow_SVE2 kernels except reading the YUV components all from the same interleaved input array. We load four-byte elements and then use TBL to de-interleave the UV components. Unlike the NV{12,21} cases we need to de-interleave bytes rather than widened 16-bit elements. Since we need a TBL instruction already it would ordinarily be possible to perform the zero-extension from bytes to 16-bit elements by setting the index for every other byte to be out of range. Such an approach does not work in SVE since at a vector length of 2048 bits since all possible byte values (0-255) are valid indices into the vector. We instead get around this by rewriting the I4XXTORGB_SVE macro to perform widening multiplies, operating on the low byte of each 16-bit UV element instead of the full value and therefore eliminating the need for a zero-extension. Observed reductions in runtimes compared to the existing Neon code: \| UYVYToARGBRow \| YUY2ToARGBRow Cortex-A510 \| -30.2% \| -30.2% Cortex-A720 \| -4.8% \| -4.7% Cortex-X2 \| -9.6% \| -10.1% Bug: libyuv:973 Change-Id: I841a049aba020d0517563d24d2f14f4d1221ebc6 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5622132 Reviewed-by: Frank Barchard <fbarchard@chromium.org>		2024-06-13 22:06:46 +00:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	MT2T Warning fixes for fuchsia	2022-12-06 19:54:40 +00:00
compare_msa.cc	use unix line endings	2018-06-20 23:19:59 +00:00
compare_neon64.cc	[AArch64] Add Neon implementation of HashDjb2	2024-05-01 19:37:31 +00:00
compare_neon.cc	Scale by even factor low level row function	2020-11-03 21:25:18 +00:00
compare_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
compare.cc	[AArch64] Add Neon implementation of HashDjb2	2024-05-01 19:37:31 +00:00
convert_argb.cc	[AArch64] Add SVE2 impls for {UYVY,YUY2}ToARGBRow	2024-06-13 22:06:46 +00:00
convert_from_argb.cc	[AArch64] Add SVE2 implementation of ARGBToRGB565DitherRow	2024-06-03 23:15:04 +00:00
convert_from.cc	Change ScalePlane,ScalePlane_16,... to return int	2023-11-03 23:53:24 +00:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Remove M420 and refactor NV12ToI420	2020-05-26 18:48:00 +00:00
convert_to_i420.cc	Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.	2021-07-19 22:22:22 +00:00
convert.cc	[AArch64] Add SVE2 implementations for AYUVTo{UV,VU}Row	2024-06-04 18:18:07 +00:00
cpu_id.cc	[AArch64] Add SME feature detection on Linux	2024-06-08 23:34:22 +00:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	[AArch64] Add I8MM implementation of ARGBColorMatrixRow	2024-06-12 16:17:59 +00:00
rotate_any.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_argb.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_common.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_gcc.cc	Transpose 4x4 for SSE2 and AVX2	2023-03-03 17:46:23 +00:00
rotate_lsx.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_msa.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_neon64.cc	[AArch64] Use ST2 to avoid TRN step in TransposeWx16_NEON	2024-05-31 08:27:05 +00:00
rotate_neon.cc	[Arm] Clean up rotate_neon.cc kernels	2024-06-03 22:23:40 +00:00
rotate_sme.cc	[AArch64] Add initial build system support for SME	2024-06-08 23:32:41 +00:00
rotate_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
rotate.cc	[AArch64] Remove unused code from TransposeUVWx8_NEON	2024-05-27 21:52:56 +00:00
row_any.cc	[AArch64] Remove redundant semicolons after ANY41CT	2024-06-08 23:33:54 +00:00
row_common.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_gcc.cc	YUY2ToARGB use ymm6/7 for shuffle constants	2024-01-22 21:47:23 +00:00
row_lasx.cc	AVX10 cpuid detect added	2024-01-10 00:08:22 +00:00
row_lsx.cc	Fix compilation errors.	2024-01-03 19:15:56 +00:00
row_msa.cc	Fix Bugs on mips platform V2.	2022-03-01 13:16:31 +00:00
row_neon64.cc	[AArch64] Add I8MM implementation of ARGBColorMatrixRow	2024-06-12 16:17:59 +00:00
row_neon.cc	[Arm][AArch64] Remove unused ARGBToUVJ444Row_NEON definition	2024-06-10 18:36:31 +00:00
row_rvv.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_sve.cc	[AArch64] Add SVE2 impls for {UYVY,YUY2}ToARGBRow	2024-06-13 22:06:46 +00:00
row_win.cc	Fix tidy warning that uint32_t dither4 should not be const	2023-06-02 00:42:02 +00:00
scale_any.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_argb.cc	[AArch64] Add SVE implementation for I422ToARGBRow	2024-04-27 18:26:11 +00:00
scale_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
scale_gcc.cc	ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows	2022-10-21 19:35:17 +00:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
scale_neon64.cc	[AArch64] Replace instances of ORR with MOV where possible	2024-04-25 20:48:16 +00:00
scale_neon.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_rgb.cc	RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24	2022-03-19 01:44:06 +00:00
scale_rvv.cc	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default	2023-12-07 22:54:23 +00:00
scale_uv.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
scale_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
scale.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00