Frank Barchard 9f13b2814d add RGBToYMatrixRow_AVX2
Adds RGBToYMatrixRow_AVX2 which reads 24 bit RGB values by reading 3 vectors instead of 4 and permutes them into 4 ARGB vectors before conversion.
Also adds RGBToYMatrixRow_Opt and RGBToYMatrixRow_2Step_Opt to convert_argb_test.cc to benchmark and compare the direct AVX2 conversion vs a 2-step approach.

./libyuv_test '--gunit_filter=*RAWToJ400_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1

AMD Zen 5
Was LibYUVConvertTest.RAWToJ400_Opt (757 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (699 ms)

Intel Skylake
Was LibYUVConvertTest.RAWToJ400_Opt (1705 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (1426 ms)

Bug: 477295731
Change-Id: I29866baf4ad5fe7a3725e4a01f2fe24649510a7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7777325
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-20 12:52:44 -07:00
..
compare_common.cc clang-tidy applied 2021-04-01 21:42:47 +00:00
compare_gcc.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
compare_neon64.cc Add hybrid detect for Intel laptop cpus 2025-06-13 13:22:54 -07:00
compare_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
compare_win.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
compare.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
convert_argb.cc Fix typo 2026-04-16 14:27:40 -07:00
convert_from_argb.cc add RGBToYMatrixRow_AVX2 2026-04-20 12:52:44 -07:00
convert_from.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
convert_jpeg.cc PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. 2020-10-28 20:41:59 +00:00
convert_to_argb.cc Apply clang format 2025-01-02 13:31:20 -08:00
convert_to_i420.cc Apply clang format 2025-01-02 13:31:20 -08:00
convert.cc add RGBToYMatrixRow_AVX2 2026-04-20 12:52:44 -07:00
cpu_id.cc Replace strtok_r with strchr in RISC-V CPU capability detection 2026-04-10 12:33:43 -07:00
mjpeg_decoder.cc Add AMXINT8 cpu detect 2024-02-15 21:44:47 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions 2026-04-13 18:28:37 -07:00
rotate_any.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
rotate_argb.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
rotate_common.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_gcc.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
rotate_lsx.cc [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate_neon64.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
rotate_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
rotate_sme.cc [AArch64] Re-enable SME only for Linux and new versions of Clang 2024-09-23 09:29:53 +00:00
rotate_win.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
rotate.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
row_any.cc add RGBToYMatrixRow_AVX2 2026-04-20 12:52:44 -07:00
row_common.cc add RGBToYMatrixRow_AVX2 2026-04-20 12:52:44 -07:00
row_gcc.cc add RGBToYMatrixRow_AVX2 2026-04-20 12:52:44 -07:00
row_lasx.cc Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions 2026-04-13 18:28:37 -07:00
row_lsx.cc Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions 2026-04-13 18:28:37 -07:00
row_neon64.cc [AArch64] Fix compilation due to incorrect register constraint 2025-08-05 11:23:20 -07:00
row_neon.cc Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions 2026-04-13 18:28:37 -07:00
row_rvv.cc ARGBToUVMatrixRow_RVV replace vlseg8 with vlseg4, 2026-04-17 15:04:45 -07:00
row_sme.cc Convert8To16 use VPSRLW instead of VPMULHUW for better lunarlake performance 2025-08-04 12:42:50 -07:00
row_sve.cc Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions 2026-04-13 18:28:37 -07:00
row_win.cc row_win.cc rewrite into intrinsics 2026-04-15 19:53:16 -07:00
scale_any.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
scale_argb.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
scale_common.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
scale_gcc.cc ARGBToUV 64 bit use ymm8 for shuffler 2025-05-12 15:09:40 -07:00
scale_lsx.cc DetilePlane and unittest for NEON 2022-01-31 20:05:55 +00:00
scale_neon64.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
scale_neon.cc Apply format with no code changes 2025-02-24 23:57:01 -08:00
scale_rgb.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_rvv.cc ARGBToUVMatrixRow_RVV replace vlseg8 with vlseg4, 2026-04-17 15:04:45 -07:00
scale_sme.cc Apply clang format 2025-01-02 13:31:20 -08:00
scale_uv.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
scale_win.cc ARGBToJ444 use 256 for fixed point scale UV 2025-02-27 13:04:15 -08:00
scale.cc Deprecate MIPS and MSA support. 2025-10-16 12:20:40 -07:00
test.sh Optimze ABGRToI420 for AVX2 2020-06-04 18:24:45 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00