libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-08-01 09:16:25 +08:00

History

Frank Barchard e034c41661 Port ARGBToUVMatrixRow from AVX2 to AVX512BW Benchmark on Icelake Xeon Now AVX512BW: [ OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms) Was AVX2: [ OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms) - Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`. - Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions. - Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`. - Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`. - Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately. - Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support. - Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment. Bug: 477295731 R=dalecurtis@chromium.org, rrwinterton@gmail.com Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>		2026-04-14 16:15:31 -07:00
..
compare_common.cc	clang-tidy applied	2021-04-01 21:42:47 +00:00
compare_gcc.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
compare_neon64.cc	Add hybrid detect for Intel laptop cpus	2025-06-13 13:22:54 -07:00
compare_neon.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
compare_win.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
compare.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
convert_argb.cc	Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions	2026-04-13 18:28:37 -07:00
convert_from_argb.cc	Port ARGBToUVMatrixRow from AVX2 to AVX512BW	2026-04-14 16:15:31 -07:00
convert_from.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
convert_jpeg.cc	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.	2020-10-28 20:41:59 +00:00
convert_to_argb.cc	Apply clang format	2025-01-02 13:31:20 -08:00
convert_to_i420.cc	Apply clang format	2025-01-02 13:31:20 -08:00
convert.cc	Port ARGBToUVMatrixRow from AVX2 to AVX512BW	2026-04-14 16:15:31 -07:00
cpu_id.cc	Replace strtok_r with strchr in RISC-V CPU capability detection	2026-04-10 12:33:43 -07:00
mjpeg_decoder.cc	Add AMXINT8 cpu detect	2024-02-15 21:44:47 +00:00
mjpeg_validate.cc	Update to r1732 for more robust jpeg	2019-07-01 22:32:36 +00:00
planar_functions.cc	Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions	2026-04-13 18:28:37 -07:00
rotate_any.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
rotate_argb.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
rotate_common.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_gcc.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
rotate_lsx.cc	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON	2024-05-21 07:46:42 +00:00
rotate_neon64.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
rotate_neon.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
rotate_sme.cc	[AArch64] Re-enable SME only for Linux and new versions of Clang	2024-09-23 09:29:53 +00:00
rotate_win.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
rotate.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
row_any.cc	Port ARGBToUVMatrixRow from AVX2 to AVX512BW	2026-04-14 16:15:31 -07:00
row_common.cc	Add Gemini implementation for NEON32 RGB to YUV matrix operations	2026-03-23 16:30:44 -07:00
row_gcc.cc	Port ARGBToUVMatrixRow from AVX2 to AVX512BW	2026-04-14 16:15:31 -07:00
row_lasx.cc	Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions	2026-04-13 18:28:37 -07:00
row_lsx.cc	Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions	2026-04-13 18:28:37 -07:00
row_neon64.cc	[AArch64] Fix compilation due to incorrect register constraint	2025-08-05 11:23:20 -07:00
row_neon.cc	Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions	2026-04-13 18:28:37 -07:00
row_rvv.cc	RVV: Enable RVV on GCC	2026-01-06 11:16:24 -08:00
row_sme.cc	Convert8To16 use VPSRLW instead of VPMULHUW for better lunarlake performance	2025-08-04 12:42:50 -07:00
row_sve.cc	Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions	2026-04-13 18:28:37 -07:00
row_win.cc	Convert8To16 use VPSRLW instead of VPMULHUW for better lunarlake performance	2025-08-04 12:42:50 -07:00
scale_any.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
scale_argb.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
scale_common.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
scale_gcc.cc	ARGBToUV 64 bit use ymm8 for shuffler	2025-05-12 15:09:40 -07:00
scale_lsx.cc	DetilePlane and unittest for NEON	2022-01-31 20:05:55 +00:00
scale_neon64.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
scale_neon.cc	Apply format with no code changes	2025-02-24 23:57:01 -08:00
scale_rgb.cc	Apply clang format	2025-01-02 13:31:20 -08:00
scale_rvv.cc	RVV: Enable RVV on GCC	2026-01-06 11:16:24 -08:00
scale_sme.cc	Apply clang format	2025-01-02 13:31:20 -08:00
scale_uv.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
scale_win.cc	ARGBToJ444 use 256 for fixed point scale UV	2025-02-27 13:04:15 -08:00
scale.cc	Deprecate MIPS and MSA support.	2025-10-16 12:20:40 -07:00
test.sh	Optimze ABGRToI420 for AVX2	2020-06-04 18:24:45 +00:00
video_common.cc	Lint cleanup after C99 change CL	2018-01-24 19:16:03 +00:00