libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-07-31 08:46:21 +08:00

History

George Steed 5bac99fe09 [AArch64] Rework data loading in ScaleARGBFilterCols_NEON The existing code makes use of lane-indexed LD2 instructions to load the input data however this creates a strong dependency chain between consecutive load instructions. We can reduce this dependency chain by instead loading two vectors with wider lane-indexed LD1 instructions and then performing a permute to unzip the data. We can also avoid the need for a complex sequence of DUP + EXT instructions by using TBL to permute the data exactly as we want it. Reduction in runtimes observed compared to the existing Neon implementation: Cortex-A55: =0.0% Cortex-A510: -44.2% Cortex-A520: -47.6% Cortex-A76: -45.8% Cortex-A715: -58.3% Cortex-A720: -58.4% Cortex-X1: -66.7% Cortex-X2: -68.0% Cortex-X3: -67.9% Cortex-X4: -70.0% Change-Id: I8a1d1fe08d8a2ddb0b86d4a44f0d49b69ab03ece Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5683126 Reviewed-by: Frank Barchard <fbarchard@chromium.org>		2024-07-10 23:10:43 +00:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
compare_msa.cc	use unix line endings	2018-06-20 23:19:59 +00:00
compare_neon64.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
compare_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
compare_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
compare.cc	[AArch64] Add Neon implementation of HashDjb2	2024-05-01 19:37:31 +00:00
convert_argb.cc	[AArch64] Add SVE2 implementation of RGB24ToARGBRow	2024-07-08 20:12:05 +00:00
convert_from_argb.cc	[AArch64] Add SVE2 implementations of ARGBTo{RAW,RGB24}Row	2024-07-08 20:27:54 +00:00
convert_from.cc	Change ScalePlane,ScalePlane_16,... to return int	2023-11-03 23:53:24 +00:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Remove M420 and refactor NV12ToI420	2020-05-26 18:48:00 +00:00
convert_to_i420.cc	Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.	2021-07-19 22:22:22 +00:00
convert.cc	[AArch64] Add SVE2 implementations for AYUVTo{UV,VU}Row	2024-06-04 18:18:07 +00:00
cpu_id.cc	[AArch64] Enable SME feature detection on Apple Silicon	2024-07-08 16:19:27 +00:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	[AArch64] Add SVE2 implementation of RAWToRGB24Row	2024-07-08 15:55:14 +00:00
rotate_any.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_argb.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_common.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_gcc.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
rotate_lsx.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_msa.cc	cpuid show vector length on ARM and RISCV	2024-07-02 18:10:56 +00:00
rotate_neon64.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
rotate_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
rotate_sme.cc	[AArch64] Add initial build system support for SME	2024-06-08 23:32:41 +00:00
rotate_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
rotate.cc	[AArch64] Remove unused code from TransposeUVWx8_NEON	2024-05-27 21:52:56 +00:00
row_any.cc	[AArch64] Use full vectors in ARGB1555To{Y,UV}Row_NEON	2024-07-10 23:09:53 +00:00
row_common.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_gcc.cc	[AArch64] Fix SVE/SME vector length printing in cpuid	2024-07-02 19:44:41 +00:00
row_lasx.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
row_lsx.cc	[AArch64] Fix SVE/SME vector length printing in cpuid	2024-07-02 19:44:41 +00:00
row_msa.cc	Fix Bugs on mips platform V2.	2022-03-01 13:16:31 +00:00
row_neon64.cc	[AArch64] Use full vectors in ARGB1555To{Y,UV}Row_NEON	2024-07-10 23:09:53 +00:00
row_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
row_rvv.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
row_sve.cc	[AArch64] Add SVE2 implementations of ARGBTo{RAW,RGB24}Row	2024-07-08 20:27:54 +00:00
row_win.cc	Fix tidy warning that uint32_t dither4 should not be const	2023-06-02 00:42:02 +00:00
scale_any.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_argb.cc	[AArch64] Add SVE implementation for I422ToARGBRow	2024-04-27 18:26:11 +00:00
scale_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
scale_gcc.cc	cpuid show vector length on ARM and RISCV	2024-07-02 18:10:56 +00:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
scale_neon64.cc	[AArch64] Rework data loading in ScaleARGBFilterCols_NEON	2024-07-10 23:10:43 +00:00
scale_neon.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
scale_rgb.cc	RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24	2022-03-19 01:44:06 +00:00
scale_rvv.cc	Add volatile for gcc inline to avoid being removed	2024-07-02 01:25:24 +00:00
scale_uv.cc	Disable RVV ScaleDownBy4 if compiler option is not enabled	2024-06-18 01:52:40 +00:00
scale_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
scale.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00