Frank Barchard dfaf210a19 I420ToRAW and I420ToRGB24 1 pass AVX2
Replaced the 2-pass conversion (I420 -> ARGB -> RGB24/RAW) with a
    highly optimized 1-pass AVX2 implementation. This avoids intermediate
    stack buffering and significantly reduces memory bandwidth.

    Implemented `I422ToRGB24Row_AVX2` in:
    - `row_gcc.cc`: Inline assembly for GCC/Clang.
    - `row_win.cc`: C++ intrinsics for MSVC (also verified with Clang).

    Optimized the width alignment requirement: changed from 32-pixel to
    16-pixel alignment in `convert_argb.cc` and `row_any.cc`. This allows
    the optimized AVX2 path to be used for more common video resolutions.

    Performance results (1080p, 100 iterations):
    - C Reference: ~18.5 ms
    - AVX2 2-Pass (Baseline): ~412 us (~45x speedup)
    - AVX2 1-Pass (GCC Assembly): ~411 us (~s45x speedup)
    - AVX2 1-Pass (Intrinsics): ~365 us (~50x speedup, 11% faster than asm)

    Test: libyuv_unittest --gunit_filter=*I420ToRGB24*
    Test: libyuv_unittest --gunit_filter=*I420ToRAW*

Bug: 42280902
Change-Id: I07c0505c95410ea16a6218c858844791a11ef073
2026-06-08 19:58:13 -07:00
..
libyuv I420ToRAW and I420ToRGB24 1 pass AVX2 2026-06-08 19:58:13 -07:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00