Frank Barchard 679e851f65 Convert16To8Row_AVX512BW using vpmovuswb
- avx2 is pack/perm is mutating order
- cvt method maintains channel order on avx512

Sapphire Rapids

Benchmark of 640x360 on Sapphire Rapids
AVX512BW
[       OK ] LibYUVConvertTest.I010ToNV12_Opt (3547 ms)
[       OK ] LibYUVConvertTest.P010ToNV12_Opt (3186 ms)

AVX2
[       OK ] LibYUVConvertTest.I010ToNV12_Opt (4000 ms)
[       OK ] LibYUVConvertTest.P010ToNV12_Opt (3190 ms)

SSE2
[       OK ] LibYUVConvertTest.I010ToNV12_Opt (5433 ms)
[       OK ] LibYUVConvertTest.P010ToNV12_Opt (4840 ms)

Skylake Xeon
Now vpmovuswb
[       OK ] LibYUVConvertTest.I010ToNV12_Opt (7946 ms)
[       OK ] LibYUVConvertTest.P010ToNV12_Opt (7071 ms)

Was vpackuswb
[       OK ] LibYUVConvertTest.I010ToNV12_Opt (7684 ms)
[       OK ] LibYUVConvertTest.P010ToNV12_Opt (7059 ms)

Switch from vpunpcklwd to vpbroadcastw for scale value parameter
Was
vpunpcklwd  %%xmm2,%%xmm2,%%xmm2
vbroadcastss %%xmm2,%%ymm2

Now
vpbroadcastw %%xmm2,%%ymm2

Bug: 357439226, 357721018
Change-Id: Ifc9c82ab70dba58af6efa0f57f5f7a344014652e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5787040
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2024-08-15 20:13:33 +00:00
..
libyuv Convert16To8Row_AVX512BW using vpmovuswb 2024-08-15 20:13:33 +00:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00