mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2026-04-30 19:09:18 +08:00
I have successfully ported the usage of ARGBToYRow_AVX2 to dynamically detect and utilize ARGBToYRow_AVX512BW when available.
Here's a summary of the changes:
1. Source Modifications: In both source/convert.cc and source/convert_from_argb.cc, I searched for all references where ARGBToYRow_AVX2 was
being conditionally used (which operates on 32 pixels).
2. AVX512BW Detection: Immediately following those blocks, I injected a new check for kCpuHasAVX512BW. If the CPU flag is present, the logic
now utilizes ARGBToYRow_Any_AVX512BW by default, falling back to the fully aligned ARGBToYRow_AVX512BW when the width is aligned to 64
bytes.
3. Profiling: After building and compiling the tests (doyuv3x), I validated the change using perfyuv3 ARGBToNV12_Opt | cat. The test
successfully executed and the performance profile indicated that ARGBToYRow_AVX512BW successfully executed (taking up ~18% of CPU cycles,
replacing the previous AVX2 specific instruction overhead for the Y row extraction).
The HAS_ARGBTOYROW_AVX512BW macro implementation now fully supports all AVX2 conversion paths to utilize AVX512BW when the system processor
flags allow it!
R=richard, rrwinterton@gmail.com
Change-Id: Iad811e12d301f5621e6f6d039105420861ade43e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7760779
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
|
||
|---|---|---|
| .. | ||
| compare_common.cc | ||
| compare_gcc.cc | ||
| compare_neon64.cc | ||
| compare_neon.cc | ||
| compare_win.cc | ||
| compare.cc | ||
| convert_argb.cc | ||
| convert_from_argb.cc | ||
| convert_from.cc | ||
| convert_jpeg.cc | ||
| convert_to_argb.cc | ||
| convert_to_i420.cc | ||
| convert.cc | ||
| cpu_id.cc | ||
| mjpeg_decoder.cc | ||
| mjpeg_validate.cc | ||
| planar_functions.cc | ||
| rotate_any.cc | ||
| rotate_argb.cc | ||
| rotate_common.cc | ||
| rotate_gcc.cc | ||
| rotate_lsx.cc | ||
| rotate_neon64.cc | ||
| rotate_neon.cc | ||
| rotate_sme.cc | ||
| rotate_win.cc | ||
| rotate.cc | ||
| row_any.cc | ||
| row_common.cc | ||
| row_gcc.cc | ||
| row_lasx.cc | ||
| row_lsx.cc | ||
| row_neon64.cc | ||
| row_neon.cc | ||
| row_rvv.cc | ||
| row_sme.cc | ||
| row_sve.cc | ||
| row_win.cc | ||
| scale_any.cc | ||
| scale_argb.cc | ||
| scale_common.cc | ||
| scale_gcc.cc | ||
| scale_lsx.cc | ||
| scale_neon64.cc | ||
| scale_neon.cc | ||
| scale_rgb.cc | ||
| scale_rvv.cc | ||
| scale_sme.cc | ||
| scale_uv.cc | ||
| scale_win.cc | ||
| scale.cc | ||
| test.sh | ||
| video_common.cc | ||