mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2026-04-30 19:09:18 +08:00
Benchmark on Icelake Xeon Now AVX512BW: [ OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms) Was AVX2: [ OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms) - Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`. - Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions. - Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`. - Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`. - Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately. - Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support. - Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment. Bug: 477295731 R=dalecurtis@chromium.org, rrwinterton@gmail.com Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Dale Curtis <dalecurtis@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> |
||
|---|---|---|
| .. | ||
| compare_common.cc | ||
| compare_gcc.cc | ||
| compare_neon64.cc | ||
| compare_neon.cc | ||
| compare_win.cc | ||
| compare.cc | ||
| convert_argb.cc | ||
| convert_from_argb.cc | ||
| convert_from.cc | ||
| convert_jpeg.cc | ||
| convert_to_argb.cc | ||
| convert_to_i420.cc | ||
| convert.cc | ||
| cpu_id.cc | ||
| mjpeg_decoder.cc | ||
| mjpeg_validate.cc | ||
| planar_functions.cc | ||
| rotate_any.cc | ||
| rotate_argb.cc | ||
| rotate_common.cc | ||
| rotate_gcc.cc | ||
| rotate_lsx.cc | ||
| rotate_neon64.cc | ||
| rotate_neon.cc | ||
| rotate_sme.cc | ||
| rotate_win.cc | ||
| rotate.cc | ||
| row_any.cc | ||
| row_common.cc | ||
| row_gcc.cc | ||
| row_lasx.cc | ||
| row_lsx.cc | ||
| row_neon64.cc | ||
| row_neon.cc | ||
| row_rvv.cc | ||
| row_sme.cc | ||
| row_sve.cc | ||
| row_win.cc | ||
| scale_any.cc | ||
| scale_argb.cc | ||
| scale_common.cc | ||
| scale_gcc.cc | ||
| scale_lsx.cc | ||
| scale_neon64.cc | ||
| scale_neon.cc | ||
| scale_rgb.cc | ||
| scale_rvv.cc | ||
| scale_sme.cc | ||
| scale_uv.cc | ||
| scale_win.cc | ||
| scale.cc | ||
| test.sh | ||
| video_common.cc | ||