Frank Barchard 03d8b0990b I420ToRAW and I420ToRGB24 1 pass AVX2
Replaced the 2-pass conversion (I420 -> ARGB -> RGB24/RAW) with a
    highly optimized 1-pass AVX2 implementation. This avoids intermediate
    stack buffering and significantly reduces memory bandwidth.

    Implemented `I422ToRGB24Row_AVX2` in:
    - `row_gcc.cc`: Inline assembly for GCC/Clang.
    - `row_win.cc`: C++ intrinsics for MSVC (also verified with Clang).

    Optimized the width alignment requirement: changed from 32-pixel to
    16-pixel alignment in `convert_argb.cc` and `row_any.cc`. This allows
    the optimized AVX2 path to be used for more common video resolutions.

    Performance results (1080p, 100 iterations):
    - C Reference: ~18.5 ms
    - AVX2 2-Pass (Baseline): ~412 us (~45x speedup)
    - AVX2 1-Pass (GCC Assembly): ~411 us (~s45x speedup)
    - AVX2 1-Pass (Intrinsics): ~365 us (~50x speedup, 11% faster than asm)

    Test: libyuv_unittest --gunit_filter=*I420ToRGB24*
    Test: libyuv_unittest --gunit_filter=*I420ToRAW*

Bug: 42280902
Change-Id: I07c0505c95410ea16a6218c858844791a11ef073
2026-06-08 19:33:58 -07:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
compare_neon64.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
compare_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
compare_win.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
compare.cc Don't coalesce rows if width*height would overflow 2026-05-29 11:57:47 -07:00
convert_argb.cc I420ToRAW and I420ToRGB24 1 pass AVX2 2026-06-08 19:33:58 -07:00
convert_from_argb.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
convert_from.cc Fix integer overflow when flipping negative height 2026-06-03 16:17:37 -07:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc ConvertToARGB: compute buffer offsets in ptrdiff_t 2026-06-05 18:38:42 -07:00
convert_to_i420.cc Fix int negation overflow in ConvertToARGB/I420 2026-06-05 12:34:38 -07:00
convert.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
cpu_id.cc Replace strtok_r with strchr in RISC-V CPU capability detection 2026-04-10 12:33:43 -07:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
rotate_any.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
rotate_argb.cc Fix integer overflow when flipping negative height 2026-06-03 16:17:37 -07:00
rotate_common.cc Remove redundant #include <stddef.h> 2026-05-28 17:10:22 -07:00
rotate_gcc.cc Use ptrdiff_t for buffer offsets 2026-04-28 18:21:42 -07:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_neon64.cc Fix integer overflow in multiplications of stride 2026-05-28 14:12:37 -07:00
rotate_neon.cc Fix integer overflow in multiplications of stride 2026-05-28 14:12:37 -07:00
rotate_sme.cc [AArch64] Re-enable SME only for Linux and new versions of Clang 2024-09-23 09:29:53 +00:00
rotate_win.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
rotate.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_any.cc I420ToRAW and I420ToRGB24 1 pass AVX2 2026-06-08 19:33:58 -07:00
row_common.cc I420ToRAW and I420ToRGB24 1 pass AVX2 2026-06-08 19:33:58 -07:00
row_gcc.cc I420ToRAW and I420ToRGB24 1 pass AVX2 2026-06-08 19:33:58 -07:00
row_lasx.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_lsx.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_neon64.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_neon.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_rvv.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_sme.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_sve.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
row_win.cc I420ToRAW and I420ToRGB24 1 pass AVX2 2026-06-08 19:33:58 -07:00
scale_any.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
scale_argb.cc Fix integer overflow when flipping negative height 2026-06-03 16:17:37 -07:00
scale_common.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
scale_gcc.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_neon64.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
scale_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
scale_rgb.cc Fix integer overflow when flipping negative height 2026-06-03 16:17:37 -07:00
scale_rvv.cc Replace RAWToY/RGB24ToY with RGBToYMatrix 2026-04-21 17:11:14 -07:00
scale_sme.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_uv.cc Validate int param is not INT_MIN before negating 2026-06-04 21:55:57 -07:00
scale_win.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
scale.cc BGRAToI420 use BgraConstants for a direct conversion using AVX512BW 2026-06-08 12:21:47 -07:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00