libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-06-18 01:46:11 +08:00

History

George Steed 4ad050b5ec [AArch64] Unroll {I422,I422Alpha}ToARGBRow_SVE2 Since the UV components are duplicated in I422 we end up wasting half of the vector bandwidth processing the same elements twice. By unrolling the kernel to process two vectors of Y per iteration we can fill a whole vector of U/V components. Rather than packing RGBA components into pairs during the narrowing we now just narrow into individual component vectors and use ST4B instead. This by itself is slower on some micro-architectures like Cortex-A510 but the benefit from unrolling significantly outweights this. \| I422AlphaToARGBRow_SVE2 \| I422ToARGBRow_SVE2 Cortex-A510 \| -46.2% \| -48.8% Cortex-A720 \| -20.8% \| -21.0% Cortex-X2 \| -11.3% \| -7.5% Cortex-X4 \| -15.4% \| -15.5% Bug: libyuv:973 Change-Id: I69389c4279861f7a460ae0c28186f023c728c4e8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5725173 Reviewed-by: Frank Barchard <fbarchard@chromium.org>		2024-07-19 15:55:59 +00:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
compare_msa.cc	use unix line endings	2018-06-20 23:19:59 +00:00
compare_neon64.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
compare_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
compare_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
compare.cc	[AArch64] Add Neon implementation of HashDjb2	2024-05-01 19:37:31 +00:00
convert_argb.cc	[AArch64] Add SVE2 implementation of RGB24ToARGBRow	2024-07-08 20:12:05 +00:00
convert_from_argb.cc	[AArch64] Add I8MM implementation of ARGBToUV444Row	2024-07-16 17:32:52 +00:00
convert_from.cc	Change ScalePlane,ScalePlane_16,... to return int	2023-11-03 23:53:24 +00:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Remove M420 and refactor NV12ToI420	2020-05-26 18:48:00 +00:00
convert_to_i420.cc	Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.	2021-07-19 22:22:22 +00:00
convert.cc	[AArch64] Add SVE2 implementations for AYUVTo{UV,VU}Row	2024-06-04 18:18:07 +00:00
cpu_id.cc	[AArch64] Enable SME feature detection on Apple Silicon	2024-07-08 16:19:27 +00:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	[AArch64] Add SVE2 implementation of RAWToRGB24Row	2024-07-08 15:55:14 +00:00
rotate_any.cc	[AArch64] Fix rotate by odd sizes	2024-07-15 18:13:31 +00:00
rotate_argb.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_common.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_gcc.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
rotate_lsx.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_msa.cc	cpuid show vector length on ARM and RISCV	2024-07-02 18:10:56 +00:00
rotate_neon64.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
rotate_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
rotate_sme.cc	[AArch64] Add SME implementation of TransposeUVWxH	2024-07-19 12:15:40 +00:00
rotate_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
rotate.cc	[AArch64] Add SME implementation of TransposeUVWxH	2024-07-19 12:15:40 +00:00
row_any.cc	[AArch64] Add I8MM implementation of ARGBToUV444Row	2024-07-16 17:32:52 +00:00
row_common.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_gcc.cc	[AArch64] Fix SVE/SME vector length printing in cpuid	2024-07-02 19:44:41 +00:00
row_lasx.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
row_lsx.cc	[AArch64] Fix SVE/SME vector length printing in cpuid	2024-07-02 19:44:41 +00:00
row_msa.cc	Fix Bugs on mips platform V2.	2022-03-01 13:16:31 +00:00
row_neon64.cc	[AArch64] Add I8MM implementation of ARGBToUV444Row	2024-07-16 17:32:52 +00:00
row_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
row_rvv.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
row_sve.cc	[AArch64] Unroll {I422,I422Alpha}ToARGBRow_SVE2	2024-07-19 15:55:59 +00:00
row_win.cc	Fix tidy warning that uint32_t dither4 should not be const	2023-06-02 00:42:02 +00:00
scale_any.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_argb.cc	[AArch64] Add SVE implementation for I422ToARGBRow	2024-04-27 18:26:11 +00:00
scale_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
scale_gcc.cc	cpuid show vector length on ARM and RISCV	2024-07-02 18:10:56 +00:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
scale_neon64.cc	[AArch64] Rework data loading in ScaleARGBFilterCols_NEON	2024-07-10 23:10:43 +00:00
scale_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
scale_rgb.cc	RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24	2022-03-19 01:44:06 +00:00
scale_rvv.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
scale_uv.cc	Disable RVV ScaleDownBy4 if compiler option is not enabled	2024-06-18 01:52:40 +00:00
scale_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
scale.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00