1931 Commits

Author SHA1 Message Date
Frank Barchard
d23308a2a7 add bmm detect and vdpphps in util/cpuid
Bug: None
Change-Id: I9954f96a74e653e3ecd3fbeba533299fa8e57d95
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7914867
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-09 14:52:48 -07:00
Frank Barchard
3bdb3b94ca I420ToRAW use 2 step AVX512
On Icelake
Was AVX2
I420ToRAW_Opt (283 ms)
  67.55%  I422ToARGBRow_AVX2
  26.46%  ARGBToRGB24Row_AVX2

Now AVX512VBMI
I420ToRAW_Opt (238 ms)
  73.08%  I422ToARGBRow_AVX512BW
  21.59%  ARGBToRGB24Row_AVX512VBMI

Bug: 42280902
Change-Id: I9d4d21faed30c529a5e593819f103be115709f37
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7909924
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-08 14:32:13 -07:00
Frank Barchard
4be798d7c5 BGRAToI420 use BgraConstants for a direct conversion using AVX512BW
row win (msvc)
Was C/SSSE3
BGRAToARGB_Opt (594 ms)
BGRAToARGB_Endswap_Opt (609 ms)
BGRAToI420_Opt (122 ms)

Now AVX2
BGRAToARGB_Opt (100 ms)
BGRAToARGB_Endswap_Opt (99 ms)
BGRAToI420_Opt (115 ms)

Clang/GCC AVX512BW
BGRAToARGB_Opt (86 ms)
BGRAToARGB_Endswap_Opt (91 ms)
BGRAToI420_Opt (110 ms)


Bug: 42280902
Change-Id: I52cb2b0cacea8f2f0b138ec3cc521185dbef8595
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7905821
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-08 12:21:47 -07:00
Frank Barchard
e14b0e2c60 RGB565ToARGB use AVX2 instead of SSE2
Now AVX2/AVX512
ARGB4444ToI420_Opt (204 ms)
RGB565ToI420_Opt (211 ms)
ARGB1555ToI420_Opt (231 ms)
RAWToI420_Opt (197 ms)
RGB24ToI420_Opt (197 ms)

Was SSE2/AVX2
ARGB4444ToI420_Opt (276 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (332 ms)
RAWToI420_Opt (237 ms)
RGB24ToI420_Opt (232 ms)

Bug: libyuv:508639302
Change-Id: I2005189d1b6af15cb5ebef1f6d66b426fa9df8eb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891416
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-02 18:28:02 -07:00
Frank Barchard
3c5fa6ef27 libyuv] Replace hardcoded RGB to YUV functions with Matrix variants
Removes non-matrix implementations for RGB24, RAW, RGB565, ARGB1555,
and ARGB4444 conversions. Introduces RGBToYMatrixRow, RGBToUVMatrixRow,
and equivalent functions for 16-bit and 24-bit formats. These functions
utilize a 2-step conversion internally (to ARGB, then to YUV) inside
row_common.cc for C, AVX2, and NEON, allowing the high-level
convert.cc logic to execute in a single pass using ArgbConstants.

Benchmark on Zen4
Test: libyuv_unittest --gtest_filter=*RGB*ToI420*

Was BT.601-only
ARGBToI420_Opt (115 ms)
ARGB4444ToI420_Opt (190 ms)
RGB565ToI420_Opt (194 ms)
ARGB1555ToI420_Opt (207 ms)
RGB24ToI420_Opt (143 ms)
RGBAToI420_Opt (167 ms)
28.07% ARGBToUVMatrixRow_AVX512BW
19.65% ARGBToYMatrixRow_AVX512BW
11.32% RGBAToUVRow_SSSE3
10.24% ARGB1555ToARGBRow_SSE2
 8.56% ARGB4444ToARGBRow_SSE2
 8.47% RGB565ToARGBRow_SSE2
 4.17% RGBAToYRow_AVX512BW
 4.04% RGB24ToARGBRow_AVX512BW

Now Matrix
ARGBToI420_Opt (124 ms)
ARGB4444ToI420_Opt (287 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (324 ms)
RGB24ToI420_Opt (236 ms)
RGBAToI420_Opt (126 ms)
29.74% ARGBToUVMatrixRow_AVX2
14.58% ARGB1555ToARGBRow_SSE2
12.59% RGB565ToARGBRow_SSE2
11.32% ARGB4444ToARGBRow_SSE2
 9.35% ARGBToYMatrixRow_AVX2
 8.45% RGB24ToARGBRow_SSSE3
 5.56% ARGBToYMatrixRow_AVX512BW
 1.37% ARGBToUVMatrixRow_Any_AVX2
 0.74% ARGBToYMatrixRow_Any_AVX2
 0.49% ARGB4444ToARGBRow_Any_SSE2
 0.46% RGB565ToARGBRow_Any_SSE2
 0.39% ARGB1555ToARGBRow_Any_SSE2
 0.28% RGB24ToARGBRow_Any_SSSE3
 0.11% ARGB4444ToYMatrixRow_AVX2
 0.09% RGB565ToUVMatrixRow_AVX2
 0.09% RGB565ToYMatrixRow_AVX2
 0.07% RGBToYMatrixRow_AVX2
 0.05% ARGB1555ToUVMatrixRow_AVX2
 0.04% ARGB1555ToYMatrixRow_AVX2
 0.03% RGBToUVMatrixRow_AVX2
 0.02% ARGB4444ToUVMatrixRow_AVX2

Bug: libyuv:508639302
Change-Id: I362c0cfe4c86ee1f3ffb569fa4f784b84148f11a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891045
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-01 14:04:07 -07:00
Wan-Teh Chang
d2c6dd5e6a Fix integer overflow in two convert functions
Fix integer overflow in buffer allocation size calculations in the
align_buffer_64() macro and the I422ToNV21() and
Android420ToARGBMatrix() functions.

Based on a CL autogenerated by MendIt (go/androidmendit):
https://googleplex-android-review.googlesource.com/c/platform/external/libyuv/+/39981732

Bug: 511821134
Change-Id: Ie1728c3ad337d460d9b85979489a817cc97e3bf3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7886817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-05-29 19:26:14 -07:00
Wan-Teh Chang
c98edcc8dc Don't coalesce rows if width*height would overflow
Audit all occurrences of "width *= height;" in the libyuv source code.
Make sure height > 0 and (ptrdiff_t)width * height <= INT_MAX before
executing width *= height.

Bug: chromium:517339758
Change-Id: I143a41c66492a6e4c48b6aa2a1c4a2ae974ceeb1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7883816
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-05-29 11:57:47 -07:00
Frank Barchard
e449eb2172 J400ToARGB switch from SSE2 to AVX2
- port for row_win
- remove unused HAS_ macros

Was C/SSE2
MSVC  J400ToARGB_Opt (1967 ms)
Clang J400ToARGB_Opt (568 ms)

Now AVX2
MSVC  J400ToARGB_Opt (411 ms)
Clang J400ToARGB_Opt (418 ms)

Test: libyuv_unittest --gtest_filter=*J400ToARGB*
Bug: libyuv:508639302

Change-Id: Ifdfb026832b708b61f55477250cc5ee52449f421
TAG=agy
CONV=186608fc-966a-4ea7-bf57-9fe07cc1383c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7877368
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
2026-05-28 21:24:32 -07:00
Frank Barchard
9d98aaefe7 InterpolateRow for Visual C
- remove InterpolateRow_SSSE3
- optimize ARGBToUV444MatrixRow_AVX2 to use unsigned pixels

5.7x faster on AMD Zen4

Was C
TestInterpolatePlane (144 ms)
TestInterpolatePlane_16 (142 ms)

Now AVX2
TestInterpolatePlane (25 ms)
TestInterpolatePlane_16 (48 ms)

Was signed
ARGBToJ444_Opt (157 ms)
Now unsigned
ARGBToJ444_Opt (155 ms)

Bug: None
Change-Id: I903109668ff9cfedaddad1ad75411393b3226f41
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7856498
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-18 17:28:46 -07:00
Frank Barchard
9f751100d2 InterpolateRow_16_AVX2 for row_gcc
On AMD Zen4
Was C
TestInterpolatePlane_16 (143 ms)
Now AVX2
TestInterpolatePlane_16 (48 ms)

Was
I210ToI420_Opt (87 ms)
 35.60% InterpolateRow_16To8_AVX2
 31.03% Convert16To8Row_AVX512BW
 21.35% Convert16To8Row_AVX2

Now
I210ToI420_Opt (69 ms)
 37.57% Convert16To8Row_AVX512BW
 32.69% InterpolateRow_16_AVX2
  7.18% Convert16To8Row_AVX2
  5.23% InterpolateRow_16To8_AVX2

Bug: None
Change-Id: Ica9b9c5dbd847068ae076b682c487e1753d3c812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7855648
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-18 14:29:36 -07:00
Frank Barchard
cda55fcf53 Mirrow AVX2 functions for Visual C
Bug: libyuv:42280902
Change-Id: Iabbec9af3a4f4dd89294e60145823c7fc4dd6ec6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7843378
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-15 15:05:31 -07:00
Frank Barchard
dd8b46630a ARGBToUV444MatrixRow_AVX2 intrinsics for Visual C
Was C
LibYUVConvertTest.ARGBToI444_Opt (1027 ms)

Now AVX2
LibYUVConvertTest.ARGBToI444_Opt (310 ms)

Bug: libyuv:508639302
Change-Id: I0bc7f5c5b72160d24226a98d5fddb184a004ed00
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7841655
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-12 14:19:58 -07:00
Frank Barchard
cb061d0378 Unittests use ASSERT instead of EXPECT
Bug: libyuv:508639302
Change-Id: I22c35e08f3b6db1a656192877c1fb1bf4e96d6f5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7838659
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-11 19:10:47 -07:00
Frank Barchard
e23282704f ARGBToYRow_AVX512BW preserve XMM6-XMM15 due to Windows stack alignment
Bug: 505124541
Change-Id: Id5ae539f57b314980182bec76a788e33273b2392
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7835639
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-11 13:12:22 -07:00
Frank Barchard
4b4e68b372 ABGRToJ420 call ARGBToI420Matrix
- Standardize libyuv ARGB-family (ARGB, ABGR, RGBA, BGRA) to YUV conversion by utilizing the generic MatrixRow architecture and explicit ArgbConstants.
- Consolidated ARGBToI420, ABGRToI420, BGRAToI420, and RGBAToI420 as wrappers for ARGBToI420Matrix.
- Refactored ABGRToJ420, ABGRToJ422, and ABGRToI422 to use generic matrix functions.
- Added matrix-based versions for NV21, I400, YUY2, and UYVY.
- Updated RAW and RGB24 to I420/I422/I444 dispatchers to use MatrixRow logic and explicit constants.
- Fixed parameter swap bugs in ARGBToI422, ARGBToJ422, and ABGRToJ422.
- Fixed a bug in the generic C implementation of matrix row functions ensuring all 4 channels are processed correctly for all ARGB-family formats.
- Moved kShuffleAARRGGBB in row_gcc.cc to the top of the libyuv namespace for visibility.
- Cleaned up redundant format-specific row implementations.

Bug: libyuv:42280902
Change-Id: I67ffa4c476abc0d2dcc4650510d7bda91b65988e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7830291
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-08 15:23:30 -07:00
Frank Barchard
4aacbbdfb4 Refactored RGB/RAW to YUV color conversion functions to use generic Matrix-based functions parameterized by ArgbConstants.
This consolidation standardizes conversion logic, improves code
maintainability, and provides flexible support for various color spaces
(e.g., BT.601, JPEG full
  range).

Key Modifications:
 - Function Consolidation: Refactored several high-level conversion functions into lightweight wrappers around generic Matrix variants:
     - ARGBToI420 → ARGBToI420Matrix
     - ARGBToI444 → ARGBToI444Matrix
     - ARGBToI422 → ARGBToI422Matrix
     - ARGBToNV12 → ARGBToNV12Matrix
     - RAWToJ400, RGB24ToJ400 → RGBToI400Matrix
     - RAWToI444, RAWToJ444 → RGBToI444Matrix
 - 2-Pass Conversions: Updated RGB565ToI420, ARGB1555ToI420, and ARGB4444ToI420 to utilize 2-pass conversions via RGBToI420Matrix.
 - Standardization: Refactored ARGBToNV21, ARGBToYUY2, and ARGBToUYVY to use parameterized matrix row functions (ARGBToYMatrixRow,
   ARGBToUVMatrixRow).
 - Legacy Cleanup: Replaced legacy calls to ARGBToYJRow with the parameterized ARGBToYMatrixRow in the ARGBSobelize helper.
 - Internal Integration: Included libyuv/convert_from_argb.h in planar_functions.cc and ensured all new matrix symbols are properly
   declared/exported (LIBYUV_API).

Bug: libyuv:42280902
Change-Id: Ied5fd9899767427e3a03cdcfbeaff3e9d502374a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7822033
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-06 20:02:47 -07:00
Frank Barchard
561a9780e2 YUV to RGB avoid avx assist
Here are the functions flagged for mixing both SSE and AVX (or AVX-512)
instructions, which can trigger an AVX transition/assist performance
penalty:

Libyuv Functions addressed in this CL
   * I422ToARGBRow_AVX512BW
   * HalfFloatRow_SSE2

Not addressed:
   * ScaleFilterCols_SSSE3

Bug: libyuv:509681367
Change-Id: I8ced6065dfe0c516d05857086393782c8590062a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7814945
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-05 12:57:55 -07:00
Frank Barchard
2143edfa7a ARGBToUVMatrixRow_NEON arm32 reimplemented for GCC
Bug: libyuv:508639302
Change-Id: Ib120373d799c66926a64c980873034be262d8848
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7810481
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
2026-05-04 11:38:45 -07:00
Frank Barchard
f2ac6db694 RAWToNV21 using SME, SVE, I8MM or Neon
Pixel 9 Now SVE2 2 pass LibYUVConvertTest.RAWToNV21_Opt (364 ms)
 31.76% libyuv::ARGBToUVMatrixRow_SVE_SC()
 30.38% RAWToARGBRow_SVE2
 26.81% ARGBToYMatrixRow_NEON_DotProd
  3.26% MergeUVRow_NEON

Was NEON 1 pass LibYUVConvertTest.RAWToJNV21_Opt (295 ms)
 44.14% RAWToYJRow_NEON
 41.91% RAWToUVJRow_NEON
  5.11% MergeUVRow_NEON

Clang on Intel Skylake clang [ OK ] LibYUVConvertTest.RAWToJNV21_Opt
(301 ms) visual c (row_win) [ OK ] LibYUVConvertTest.RAWToJNV21_Opt
(2056 ms)

clang [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (275 ms) visual c [ OK ]
LibYUVConvertTest.RAWToJNV21_Opt (365 ms)

Bug: libyuv:42280902
Change-Id: Iaba558ebe96ce6b9881ee9335ba72b8aac390cde
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7802432
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-29 13:11:04 -07:00
Wan-Teh Chang
a7849e8a5e Fix yi * src_stride overflow in ScalePlaneVertical
Fix int overflow of yi * src_stride overflow in ScalePlaneVertical(),
ScalePlaneVertical_16(), and ScalePlaneVertical_16To8() by casting the
operand src_stride to ptrdiff_t.

Adapted from the patches by Victor Miura <vmiura@google.com>.

Bug: 505814332
Change-Id: I4a4751041a213f7208b01eb18c43c9e196a36261
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7796558
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-04-28 12:34:12 -07:00
Frank Barchard
4afb965416 RAWToARGB use AVX512BW
Bug: libyuv:42280902
Change-Id: I7a80fd64d97b6d411316819df0fd917d609a173b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7787163
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-22 16:56:46 -07:00
Frank Barchard
bd2c4c76ec RAWToARGB AVX512VBMI
Bug: libyuv:42280902
Change-Id: I1c7f432f004079357a00515785bc524c459ed4b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7787160
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-22 14:48:29 -07:00
Frank Barchard
d445250d8b Replace RAWToY/RGB24ToY with RGBToYMatrix
Bug: libyuv:42280902
Change-Id: I6ddebd492036c416550fc045eb39493dea73246b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7784094
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-21 17:11:14 -07:00
Frank Barchard
81f698829b Add RGBToNV21Matrix function
- implement wrappers with RAW, RGB24, NV21 and JNV21 to call it.

Zen5
Was [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1146 ms)
Now [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1446 ms)
reason - the new code uses 1 pass for RAWToY but 2 pass for RAWToARGB,ARGBToUV.  needs 1 RGBToUV

Bug: libyuv:42280902
Change-Id: Ife6fbed0829484045409e6d42b85cec1d1fd6052
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7780026
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-20 18:03:34 -07:00
Frank Barchard
9f13b2814d add RGBToYMatrixRow_AVX2
Adds RGBToYMatrixRow_AVX2 which reads 24 bit RGB values by reading 3 vectors instead of 4 and permutes them into 4 ARGB vectors before conversion.
Also adds RGBToYMatrixRow_Opt and RGBToYMatrixRow_2Step_Opt to convert_argb_test.cc to benchmark and compare the direct AVX2 conversion vs a 2-step approach.

./libyuv_test '--gunit_filter=*RAWToJ400_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1

AMD Zen 5
Was LibYUVConvertTest.RAWToJ400_Opt (757 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (699 ms)

Intel Skylake
Was LibYUVConvertTest.RAWToJ400_Opt (1705 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (1426 ms)

Bug: 477295731
Change-Id: I29866baf4ad5fe7a3725e4a01f2fe24649510a7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7777325
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-20 12:52:44 -07:00
Frank Barchard
ddc6764d13 ARGBToUVMatrixRow_RVV replace vlseg8 with vlseg4,
implementing horizontal paired adds and accumulation to improve
performance on SiFive x280, and fixes the remainder logic to use valid
vlseg4 loads. Adds TestARGBToUVRow_Any to test odd-width remainder
handling.

Also fixes a build break for non-RVV compilations by ensuring all RVV
functions and their closing cplusplus braces are correctly wrapped in
#if !defined(LIBYUV_DISABLE_RVV).

Also adds NV12ToNV21 as a macro alias for NV21ToNV12 in
planar_functions.h, as the conversion is bidirectional (swapping byte
pairs in the interleaved chroma plane). (Patch from
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7762904)

Bug: libyuv:42280902
Change-Id: If2d6cbb3e232d63d43e32aba33fa9b2eee8190e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7772164
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-17 15:04:45 -07:00
Frank Barchard
ace7c4573c Add ARGBToUV444MatrixRow_RVV, ARGBToUVMatrixRow_RVV, and wrappers
This change implements ARGBToUV444MatrixRow_RVV, ARGBToUVMatrixRow_RVV,
and their wrappers (ARGBToUVRow_RVV, ARGBToUVJRow_RVV, etc.) using RVV
intrinsics, mirroring the NEON/AVX2 designs. It wires them into the
build and dispatch systems.

LIBYUV_RVV_HAS_TUPLE_TYPE is always true on new compilers. This macro
has been removed, assuming it is true everywhere, reducing the amount of
code in row_rvv.cc, scale_rvv.cc, and row.h.

Tested via: ~/bin/doyuv3v && ~/bin/runyuv3v TestARGBToI444Matrix
~/bin/doyuv3av

Bug: libyuv:42280902
Change-Id: I36d305386b297d69023c068aa9c62ab6b2ad039c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7769956
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-16 20:52:43 -07:00
Frank Barchard
94644361b4 row_win.cc rewrite into intrinsics
- remove inline asm which was only for 32 bit
- add ARGBToYMatrixRow_AVX2
- add gn flag libyuv_enable_rowwin=true

Example of building with GN and Ninja:

Without the new flag:
  gn gen out/Release "--args=is_debug=false"
  ninja -C out/Release

With the new flag:
 gn gen out/Release "--args=is_debug=false libyuv_enable_rowwin=true"
 ninja -C out/Release

Bug: libyuv:42280806, 477295731, libyuv:42280902, libyuv:439628764
R=​dalecurtis@chromium.org, rrwinterton@gmail.com

Change-Id: I451bf814622fba690005c02fbf5816819c6a08c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7765790
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-15 19:53:16 -07:00
Frank Barchard
e034c41661 Port ARGBToUVMatrixRow from AVX2 to AVX512BW
Benchmark on Icelake Xeon
Now AVX512BW:
[       OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms)
Was AVX2:
[       OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms)

- Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`.
- Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions.
- Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`.
- Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`.
- Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately.
- Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support.
- Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment.

Bug: 477295731
R=dalecurtis@chromium.org, rrwinterton@gmail.com

Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-14 16:15:31 -07:00
Frank Barchard
59ca5d8074 Fix parameter names and comments for ARGB/BGRA/RGBA/ABGR functions
In all functions that start with ARGB, BGRA, RGBA or ABGR in the include/libyuv/ headers, make sure the parameter variable name has the same 4 letters, but lower case, and the comment before the function should have the same matching name. Then make sure the implementation in source/ folder has the same variable names.

Change-Id: Idadbbbb993156eea16e318719f4888cb3bed5f6a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7760057
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-13 18:28:37 -07:00
Frank Barchard
893eacf9b4 ARGBToY for AVX512
- add ARGBToYMatrixRow_AVX512BW
- refactor SSE and AVX to use Matrix functions, making old functions
  call the new ones.

Zen5 1280x720
Was AVX2   LibYUVConvertTest.ARGBToI444_Opt (1125 ms)
Now AVX512 LibYUVConvertTest.ARGBToI444_Opt (641 ms)

Details by Gemini:
  1. Created 3 new Matrix functions:
    Added ARGBToYMatrixRow_SSSE3, ARGBToYMatrixRow_AVX2, and
    ARGBToYMatrixRow_AVX512BW to source/row_gcc.cc. These take the
    const struct ArgbConstants* c parameter similarly to
    ARGBToUV444MatrixRow_*. The x86 vector instructions dynamically
    calculate the needed values using the properties of the constants
    struct, including using vpmaddwd inside the AVX512 code to offset
    the lack of a native vphaddw.

  2. Replaced Old Functions with Wrappers:
    Modified the existing implementations of ARGBToYRow_SSSE3,
    ARGBToYJRow_SSSE3, ABGRToYRow_SSSE3, ABGRToYJRow_SSSE3,
    RGBAToYRow_SSSE3, RGBAToYJRow_SSSE3, BGRAToYRow_SSSE3 (and their
    _AVX2 equivalents) in source/row_gcc.cc to act as inline wrappers
    calling the new ARGBToYMatrixRow_* functions, passing the right
    matrix parameters (e.g. &kArgbI601Constants, &kArgbJPEGConstants,
    &kAbgrI601Constants).

  3. Added row_any.cc Handlers:
    Added ANY11MC definitions to source/row_any.cc to autogenerate
    ARGBToYMatrixRow_Any_SSSE3, ARGBToYMatrixRow_Any_AVX2, and
    ARGBToYMatrixRow_Any_AVX512BW which safely handles non-aligned
    tails.

  4. Updated include/libyuv/row.h:
    Updated the headers with the proper void declarations for all newly
    generated Matrix and Any_ variants. Also defined
    HAS_ARGBTOYROW_AVX512BW in the CPU macros.

  5. Tested the Implementations:
    Compiled and tested on Linux x86, which resulted in all tests passing
    cleanly. Also successfully completed all Windows 32-bit build checks
    ensuring 32-bit regression prevention without issues.

Bug: 477295731
Change-Id: I4f5eec9a961e24a9d760d0a1c0810fb5e29a0bd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7759494
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-13 17:26:07 -07:00
Frank Barchard
4f4e1ac553 Fix 2 failing golden tests
- Add ifdef for LIBYUV_UNLIMITED_DATA

Fixed by Gemini just telling it how to build and run the test and to fix it.

Bug: libyuv:353545922
Change-Id: I117a25b75b9616ee2ce6122aa163c2085ed4dc7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7742120
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-09 11:51:13 -07:00
Sam Maier
e3ceea1e67 Forward-declare ArgbConstants in convert.h to fix visibility error
The libyuv into Chromium roller is currently broken, see bug 500795092.

This change adds a forward declaration for struct ArgbConstants in
include/libyuv/convert.h. This resolves a -Wvisibility error where the
struct was being declared within a function prototype, making it
invisible outside that scope and breaking automated binding generation
(e.g., for crabbyavif).

Verified building crabbyavif_libyuv_bindings locally and this patch
fixed it.

Bug: 500795092
Change-Id: Ie0126650ab346940f4610bd4d2e8a5b3ef9ce103
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7739974
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-09 08:53:56 -07:00
Frank Barchard
4c3d7d517a ARGBToUV444 for AVX512
1.27x faster on AMD Zen5 (turin)

Now AVX512
perf record ./libyuv_test '--gunit_filter=*ARGBToI444_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1

[       OK ] LibYUVConvertTest.ARGBToI444_Opt (1071 ms)
Overhead  Symbol
  53.49%  ARGBToYRow_AVX2
  44.70%  ARGBToUV444Row_AVX512BW

Was AVX2
[       OK ] LibYUVConvertTest.ARGBToI444_Opt (1369 ms)
  61.06%  ARGBToUV444Row_AVX2
  37.67%  ARGBToYRow_AVX2

Bug:  libyuv:42280902
Change-Id: I306fbac656d6f7834ce1559e86d01eb34931ec3c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7738362
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-08 19:25:41 -07:00
Dale Curtis
1170363ce5 Add Gemini implementation for NEON32 RGB to YUV matrix operations
These are about 25% faster than the C versions.

Bug: libyuv:42280902

Change-Id: I8b298670ee5f3ed5db35527fc41d6d9a51b020a1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7573682
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
2026-03-23 16:30:44 -07:00
Dale Curtis
b1cacfb38f Unify X86/X64 versions of ARGBToI4xxMatrix functions
Change-Id: Iead13414414543e5f10ba9ba47a6ceaeb3113dee
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7562443
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2026-03-18 16:27:07 -07:00
Dale Curtis
f69a479f04 Add ARGBToNV12Matrix implementation
This one reuses the SIMD implementations for MergeUVRow_ from the
existing ARGBToNV12 functions.

Bug: libyuv:42280902
Change-Id: If0a4be133d657ed0262f29fdd568dac90b49636c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564317
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
2026-03-18 16:26:59 -07:00
Dale Curtis
2c21d57319 Add ABGR versions of the ArgbConstants structures
This allows for ABGR conversion using the same methods

Bug: libyuv:42280902
Change-Id: I5566e3150b30573a2326a900ce31ab095f8935f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564316
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2026-03-17 17:28:51 -07:00
Dale Curtis
30809ff64a Add ARGBToI4xxMatrix variants
This was implemented by Gemini followed by manual review and some
tweaking for style. The 601 and JPEG constants are fully verified
against the existing non-matrix implementations. On x86 the C-only
versions appear to be about 25% slower than the optimized ones.

Bug: libyuv:42280902
Change-Id: Ia5b7cb499bad5c76faec53f36086ebb18f2b530f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512030
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
2026-03-04 10:55:06 -08:00
Frank Barchard
900da61d3c Experimental SVE FMMLA detect
Detect if arm cpu support FMMLA instruction

Bug: None
Change-Id: Ia7b83bf2735ddeeb8a85da44177e708c34e4b1fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7085486
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-10-27 14:34:55 -07:00
Frank Barchard
500f45652c For for ARM32 build when built with __SOFTFP__
planar_test.cc was
  Error: selected processor does not support `vmrs r3,fpscr' in ARM mode
  Error: selected processor does not support `vmsr fpscr,r3' in ARM mode

Bug: None
Change-Id: I2ee0e7191c372277901c94e29d9ed91bbac71af2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7063737
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-10-20 11:54:25 -07:00
Mark Zhuang
e237e8d7fb RVV: Enable some function for intrinsic >= v1.0
According to README of rvv-intrinsic-doc,
Clang 19 and GCC 14 supports the v1.0 version.
But __riscv_v_intrinsic is 12000 on Clang 19,
so need Clang >= 20 to test this patch.
I test it with Clang 21.

Change-Id: I0e75efcdab3e7bc0ce1acd19eca3568b47c84cbf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6995438
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-10-17 11:44:14 -07:00
Wan-Teh Chang
fcd7060e0d Bump LIBYUV_VERSION for removal of MIPS support
Bump LIBYUV_VERSION to 1921. Missed in
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7045953.

Bug: 434383432
Change-Id: If51122f1b744718551b0b601ead7cacb8c46c20d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7050411
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-10-16 13:32:52 -07:00
Frank Barchard
2b4453d46f Deprecate MIPS and MSA support.
- Remove *_msa.cc source files
- Update build files
- Update header references, planar ifdefs for row functions
- Update documentation on supported platforms
- Version bumped to 1921
- clang-format applied

Bug: 434383432
Change-Id: I072d6aac4956f0ed668e64614ac8557612171f76
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7045953
Reviewed-by: Justin Green <greenjustin@google.com>
2025-10-16 12:20:40 -07:00
Frank Barchard
94417b9d21 Pass rgbconstants via struct pointer instead of elements with m
Now 66 instructions
SYM ARGBToUVRow_SSSE3:
62ccd0: BASE       push ebp
62ccd1: BASE       mov ebp, esp
62ccd3: BASE       push ebx
62ccd4: BASE       push edi
62ccd5: BASE       push esi
62ccd6: BASE       and esp, 0xfffffffc
62ccd9: BASE       sub esp, 0xc
62ccdc: BASE       call 0x62cce1 <ARGBToUVRow_SSSE3+0x11>
62cce1: BASE       pop eax
62cce2: BASE       add eax, 0xe1c27
62cce8: BASE       mov ecx, dword ptr [ebp+0xc]
62cceb: BASE       mov edx, dword ptr [ebp+0x8]
62ccee: BASE       mov esi, dword ptr [ebp+0x10]
62ccf1: BASE       mov edi, dword ptr [ebp+0x18]
62ccf4: BASE       mov dword ptr [esp+0x8], edi
62ccf8: BASE       mov edi, dword ptr [ebp+0x14]
62ccfb: BASE       lea ebx, ptr [eax-0x5ecf88]
62cd01: SSE2       movdqa xmm4, xmmword ptr [ebx]
62cd05: SSE2       movdqa xmm5, xmmword ptr [ebx+0x10]
62cd0a: SSE2       pcmpeqb xmm6, xmm6
62cd0e: SSSE3      pabsb xmm6, xmm6
62cd13: SSE2       movdqa xmm7, xmmword ptr [eax-0x5ecfa8]
62cd1b: BASE       sub edi, esi

62cd1d: SSE2       movdqu xmm0, xmmword ptr [edx]
62cd21: SSE2       movdqu xmm1, xmmword ptr [edx+0x10]
62cd26: SSE2       movdqu xmm2, xmmword ptr [edx+ecx*1]
62cd2b: SSE2       movdqu xmm3, xmmword ptr [edx+ecx*1+0x10]
62cd31: SSSE3      pshufb xmm0, xmm7
62cd36: SSSE3      pshufb xmm1, xmm7
62cd3b: SSSE3      pshufb xmm2, xmm7
62cd40: SSSE3      pshufb xmm3, xmm7
62cd45: SSSE3      pmaddubsw xmm0, xmm6
62cd4a: SSSE3      pmaddubsw xmm1, xmm6
62cd4f: SSSE3      pmaddubsw xmm2, xmm6
62cd54: SSSE3      pmaddubsw xmm3, xmm6
62cd59: SSE2       paddw xmm0, xmm2
62cd5d: SSE2       paddw xmm1, xmm3
62cd61: SSE2       pxor xmm2, xmm2
62cd65: SSE2       psrlw xmm0, 0x1
62cd6a: SSE2       psrlw xmm1, 0x1
62cd6f: SSE2       pavgw xmm0, xmm2
62cd73: SSE2       pavgw xmm1, xmm2
62cd77: SSE2       packuswb xmm0, xmm1
62cd7b: SSE2       movdqa xmm2, xmm6
62cd7f: SSE2       psllw xmm2, 0xf
62cd84: SSE2       movdqa xmm1, xmm0
62cd88: SSSE3      pmaddubsw xmm1, xmm5
62cd8d: SSSE3      pmaddubsw xmm0, xmm4
62cd92: SSSE3      phaddw xmm0, xmm1
62cd97: SSE2       psubw xmm2, xmm0
62cd9b: SSE2       psrlw xmm2, 0x8
62cda0: SSE2       packuswb xmm2, xmm2
62cda4: SSE2       movd dword ptr [esi], xmm2
62cda8: SSE2       pshufd xmm2, xmm2, 0x55
62cdad: SSE2       movd dword ptr [esi+edi*1], xmm2
62cdb2: BASE       lea edx, ptr [edx+0x20]
62cdb5: BASE       lea esi, ptr [esi+0x4]
62cdb8: BASE       sub dword ptr [esp+0x8], 0x8
62cdbd: BASE       jnle 0x62cd1d <ARGBToUVRow_SSSE3+0x4d>

62cdc3: BASE       lea esp, ptr [ebp-0xc]
62cdc6: BASE       pop esi
62cdc7: BASE       pop edi
62cdc8: BASE       pop ebx
62cdc9: BASE       pop ebp
62cdca: BASE       ret

Was 68 instructions
ARGBToUVRow_SSSE3:
62ccd0: BASE       push ebp
62ccd1: BASE       mov ebp, esp
62ccd3: BASE       push edi
62ccd4: BASE       push esi
62ccd5: BASE       and esp, 0xfffffff0
62ccd8: BASE       sub esp, 0x30
62ccdb: BASE       call 0x62cce0 <ARGBToUVRow_SSSE3+0x10>
62cce0: BASE       pop eax
62cce1: BASE       add eax, 0xe1c28
62cce7: BASE       mov ecx, dword ptr [ebp+0xc]
62ccea: BASE       mov edx, dword ptr [ebp+0x8]
62cced: BASE       mov esi, dword ptr [ebp+0x10]
62ccf0: BASE       mov edi, dword ptr [ebp+0x18]
62ccf3: BASE       mov dword ptr [esp+0xc], edi
62ccf7: BASE       mov edi, dword ptr [ebp+0x14]
62ccfa: SSE        movaps xmm0, xmmword ptr [eax-0x5ecf88]
62cd01: SSE        movaps xmmword ptr [esp+0x20], xmm0
62cd06: SSE        movaps xmm0, xmmword ptr [eax-0x5ecf78]
62cd0d: SSE        movaps xmmword ptr [esp+0x10], xmm0
62cd12: SSE2       movdqa xmm4, xmmword ptr [esp+0x20]
62cd18: SSE2       movdqa xmm5, xmmword ptr [esp+0x10]
62cd1e: SSE2       pcmpeqb xmm6, xmm6
62cd22: SSSE3      pabsb xmm6, xmm6
62cd27: SSE2       movdqa xmm7, xmmword ptr [eax-0x5ecfa8]
62cd2f: BASE       sub edi, esi

62cd31: SSE2       movdqu xmm0, xmmword ptr [edx]
62cd35: SSE2       movdqu xmm1, xmmword ptr [edx+0x10]
62cd3a: SSE2       movdqu xmm2, xmmword ptr [edx+ecx*1]
62cd3f: SSE2       movdqu xmm3, xmmword ptr [edx+ecx*1+0x10]
62cd45: SSSE3      pshufb xmm0, xmm7
62cd4a: SSSE3      pshufb xmm1, xmm7
62cd4f: SSSE3      pshufb xmm2, xmm7
62cd54: SSSE3      pshufb xmm3, xmm7
62cd59: SSSE3      pmaddubsw xmm0, xmm6
62cd5e: SSSE3      pmaddubsw xmm1, xmm6
62cd63: SSSE3      pmaddubsw xmm2, xmm6
62cd68: SSSE3      pmaddubsw xmm3, xmm6
62cd6d: SSE2       paddw xmm0, xmm2
62cd71: SSE2       paddw xmm1, xmm3
62cd75: SSE2       pxor xmm2, xmm2
62cd79: SSE2       psrlw xmm0, 0x1
62cd7e: SSE2       psrlw xmm1, 0x1
62cd83: SSE2       pavgw xmm0, xmm2
62cd87: SSE2       pavgw xmm1, xmm2
62cd8b: SSE2       packuswb xmm0, xmm1
62cd8f: SSE2       movdqa xmm2, xmm6
62cd93: SSE2       psllw xmm2, 0xf
62cd98: SSE2       movdqa xmm1, xmm0
62cd9c: SSSE3      pmaddubsw xmm1, xmm5
62cda1: SSSE3      pmaddubsw xmm0, xmm4
62cda6: SSSE3      phaddw xmm0, xmm1
62cdab: SSE2       psubw xmm2, xmm0
62cdaf: SSE2       psrlw xmm2, 0x8
62cdb4: SSE2       packuswb xmm2, xmm2
62cdb8: SSE2       movd dword ptr [esi], xmm2
62cdbc: SSE2       pshufd xmm2, xmm2, 0x55
62cdc1: SSE2       movd dword ptr [esi+edi*1], xmm2
62cdc6: BASE       lea edx, ptr [edx+0x20]
62cdc9: BASE       lea esi, ptr [esi+0x4]
62cdcc: BASE       sub dword ptr [esp+0xc], 0x8
62cdd1: BASE       jnle 0x62cd31 <ARGBToUVRow_SSSE3+0x61>

62cdd7: BASE       lea esp, ptr [ebp-0x8]
62cdda: BASE       pop esi
62cddb: BASE       pop edi
62cddc: BASE       pop ebp
62cddd: BASE       ret
62cdde: BASE       int3
BUG=444157316

Change-Id: Iad044f851359f5b052091c7bdab9b96946fc3682
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6987370
Reviewed-by: Justin Green <greenjustin@google.com>
2025-09-29 12:34:36 -07:00
Frank Barchard
7155afc5ca ARGBToUV AVX2 for x86 32 bit
- Reduce to 10 ymm registers - 2 constants generated on the fly

Change-Id: Ib25a0cf7c93e5048270735410ccf6723b3949454
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6967319
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-09-18 13:14:45 -07:00
Frank Barchard
142db12947 ARGBToUV use AVX2 for 64 bit x86
Skylake
Was ARGBToJ420_Opt (312 ms)
Now ARGBToJ420_Opt (242 ms)

Icelake
Was ARGBToJ420_Opt (302 ms)
Now ARGBToJ420_Opt (220 ms)

AMD Zen3 on Windows
Was ARGBToJ420_Opt (305 ms)
Now ARGBToJ420_Opt (216 ms)
32 bit x86 uses SSE
Now ARGBToJ420_Opt (326 ms)

MCA analysis of new AVX, SSE and old AVX
https://godbolt.org/z/37bdazWYr

Bug: None
Change-Id: I72f5504407751e164c3558aebe836dd15223d65f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6957477
Reviewed-by: Justin Green <greenjustin@google.com>
2025-09-17 14:39:53 -07:00
Mark Zhuang
b33794a586 RVV: Don't disable all rvv optimize when RVV >= v0.12
Disabled since Patch v2 of
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6385788

Change-Id: Id30a62c8f164830204dde02a443f5e4f04d757db
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6953818
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-09-16 18:17:02 -07:00
Frank Barchard
a61882c049 ARGBToUV AVX2 for x86_64
Icelake
Was SSSE3+SSSE3 ARGBToJ420_Opt (356 ms)
Was SSSE3+AVX2  ARGBToJ420_Opt (301 ms)
Now AVX2+AVX2   ARGBToJ420_Opt (227 ms)

Change-Id: I2cb427bc164b225b3ad4c5f43c09d6da6ca496d5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6943036
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-09-16 11:33:54 -07:00
Frank Barchard
0f795672ae Reduce ARGBToUV SSSE3 register usage for clang build error on x64
Bug: 444157316
Change-Id: I2ae9f3dbfb373bb874a3d9699987f7d5b63f2610
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6937665
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-09-10 18:40:06 -07:00