libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-01-01 03:12:16 +08:00

History

George Steed d0c28db56c [AArch64] Optimize Merge{ARGB,XRGB}16To8Row_NEON Rather than shifting the data into the low half of each lane and then using a saturating narrowing operation, we can do the saturation as part of a shift into the highest half of the lane and then use a simpler TRN2 instruction to extract pairs of high halves into full vectors. This also has the nice advantage of allowing us to use ST2 rather than ST4 for storing the result, since ST4 is known to be slow on some micro-architectures. Reduction in runtimes observed for the two kernels: \| MergeARGB16To8Row_NEON \| MergeXRGB16To8Row_NEON Cortex-A55 \| -8.0% \| -12.2% Cortex-A510 \| -29.9% \| -31.4% Cortex-A76 \| -29.0% \| -32.0% Cortex-X2 \| -33.5% \| -43.4% Bug: libyuv:976 Change-Id: I9da3beedc27ab43527b3642aa6d4decf3b5b6683 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5509198 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>		2024-05-21 07:55:03 +00:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	MT2T Warning fixes for fuchsia	2022-12-06 19:54:40 +00:00
compare_msa.cc	use unix line endings	2018-06-20 23:19:59 +00:00
compare_neon64.cc	[AArch64] Add Neon implementation of HashDjb2	2024-05-01 19:37:31 +00:00
compare_neon.cc	Scale by even factor low level row function	2020-11-03 21:25:18 +00:00
compare_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
compare.cc	[AArch64] Add Neon implementation of HashDjb2	2024-05-01 19:37:31 +00:00
convert_argb.cc	Remove unneeded #ifdef HAVE_JPEG code	2024-05-09 23:02:18 +00:00
convert_from_argb.cc	[AArch64] Add Neon implementations for {ARGB,ABGR}ToAR30Row	2024-05-21 07:35:07 +00:00
convert_from.cc	Change ScalePlane,ScalePlane_16,... to return int	2023-11-03 23:53:24 +00:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Remove M420 and refactor NV12ToI420	2020-05-26 18:48:00 +00:00
convert_to_i420.cc	Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.	2021-07-19 22:22:22 +00:00
convert.cc	[AArch64] Add SVE2 implementations for ARGBToUVRow and similar	2024-05-01 19:46:43 +00:00
cpu_id.cc	[AArch64] Impose feature dependencies in detection code	2024-05-21 07:21:49 +00:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	Remove unneeded #ifdef HAVE_JPEG code	2024-05-09 23:02:18 +00:00
rotate_any.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_argb.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
rotate_common.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_gcc.cc	Transpose 4x4 for SSE2 and AVX2	2023-03-03 17:46:23 +00:00
rotate_lsx.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_msa.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_neon64.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_neon.cc	GCC warning fix for MT2T	2023-03-16 06:57:20 +00:00
rotate_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
rotate.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
row_any.cc	[AArch64] Add Neon implementations for {ARGB,ABGR}ToAR30Row	2024-05-21 07:35:07 +00:00
row_common.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_gcc.cc	YUY2ToARGB use ymm6/7 for shuffle constants	2024-01-22 21:47:23 +00:00
row_lasx.cc	AVX10 cpuid detect added	2024-01-10 00:08:22 +00:00
row_lsx.cc	Fix compilation errors.	2024-01-03 19:15:56 +00:00
row_msa.cc	Fix Bugs on mips platform V2.	2022-03-01 13:16:31 +00:00
row_neon64.cc	[AArch64] Optimize Merge{ARGB,XRGB}16To8Row_NEON	2024-05-21 07:55:03 +00:00
row_neon.cc	[AArch64] Replace instances of ORR with MOV where possible	2024-04-25 20:48:16 +00:00
row_rvv.cc	[RVV] Support AR64ToAB64 and RGBA-family color conversions	2023-09-05 22:44:48 +00:00
row_sve.cc	[AArch64] Fix naming in ARGBToUVMatrixRow_SVE2 etc constants	2024-05-03 17:25:14 +00:00
row_win.cc	Fix tidy warning that uint32_t dither4 should not be const	2023-06-02 00:42:02 +00:00
scale_any.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_argb.cc	[AArch64] Add SVE implementation for I422ToARGBRow	2024-04-27 18:26:11 +00:00
scale_common.cc	Fix warnings for missing prototypes	2023-06-30 17:46:56 +00:00
scale_gcc.cc	ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows	2022-10-21 19:35:17 +00:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_msa.cc	Switch to C99 types	2018-01-23 19:16:05 +00:00
scale_neon64.cc	[AArch64] Replace instances of ORR with MOV where possible	2024-04-25 20:48:16 +00:00
scale_neon.cc	UVScale down by 2 fix for C and optimize for NEON	2023-04-12 22:49:20 +00:00
scale_rgb.cc	RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24	2022-03-19 01:44:06 +00:00
scale_rvv.cc	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default	2023-12-07 22:54:23 +00:00
scale_uv.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
scale_win.cc	Switch win32 to row_gcc for clangcl.	2021-04-22 19:32:32 +00:00
scale.cc	malloc return 1 for failures and assert for internal functions	2023-12-04 22:55:20 +00:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00