Frank Barchard 88b050f337 MergeUV AVX512BW use assembly
- Convert MergeUVRow_AVX512BW to assembly
- Enable MergeUVRow_AVX512BW for Windows with clangcl
- MergeUVRow_AVX2 use vpmovzxbw and vpsllw
- MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V

AMD Zen 4 640x360 100000 iterations
Was
AVX512 MergeUVPlane_Opt (884 ms)
AVX2   MergeUVPlane_Opt (945 ms)
AVX2   MergeUVPlane_16_Opt (2167 ms)

Now
AVX512 MergeUVPlane_Opt (865 ms)
AVX2   MergeUVPlane_Opt (943 ms)
SSE2   MergeUVPlane_Opt (973 ms)
AVX2   MergeUVPlane_16_Opt (2102 ms)

Bug: None
Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230
Reviewed-by: Fritz Koenig <frkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-22 21:19:08 +00:00
..
libyuv MergeUV AVX512BW use assembly 2023-02-22 21:19:08 +00:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00