1522 Commits

Author SHA1 Message Date
Frank Barchard
3bdb3b94ca I420ToRAW use 2 step AVX512
On Icelake
Was AVX2
I420ToRAW_Opt (283 ms)
  67.55%  I422ToARGBRow_AVX2
  26.46%  ARGBToRGB24Row_AVX2

Now AVX512VBMI
I420ToRAW_Opt (238 ms)
  73.08%  I422ToARGBRow_AVX512BW
  21.59%  ARGBToRGB24Row_AVX512VBMI

Bug: 42280902
Change-Id: I9d4d21faed30c529a5e593819f103be115709f37
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7909924
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-08 14:32:13 -07:00
Frank Barchard
4be798d7c5 BGRAToI420 use BgraConstants for a direct conversion using AVX512BW
row win (msvc)
Was C/SSSE3
BGRAToARGB_Opt (594 ms)
BGRAToARGB_Endswap_Opt (609 ms)
BGRAToI420_Opt (122 ms)

Now AVX2
BGRAToARGB_Opt (100 ms)
BGRAToARGB_Endswap_Opt (99 ms)
BGRAToI420_Opt (115 ms)

Clang/GCC AVX512BW
BGRAToARGB_Opt (86 ms)
BGRAToARGB_Endswap_Opt (91 ms)
BGRAToI420_Opt (110 ms)


Bug: 42280902
Change-Id: I52cb2b0cacea8f2f0b138ec3cc521185dbef8595
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7905821
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-08 12:21:47 -07:00
Frank Barchard
e14b0e2c60 RGB565ToARGB use AVX2 instead of SSE2
Now AVX2/AVX512
ARGB4444ToI420_Opt (204 ms)
RGB565ToI420_Opt (211 ms)
ARGB1555ToI420_Opt (231 ms)
RAWToI420_Opt (197 ms)
RGB24ToI420_Opt (197 ms)

Was SSE2/AVX2
ARGB4444ToI420_Opt (276 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (332 ms)
RAWToI420_Opt (237 ms)
RGB24ToI420_Opt (232 ms)

Bug: libyuv:508639302
Change-Id: I2005189d1b6af15cb5ebef1f6d66b426fa9df8eb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891416
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-02 18:28:02 -07:00
Frank Barchard
3c5fa6ef27 libyuv] Replace hardcoded RGB to YUV functions with Matrix variants
Removes non-matrix implementations for RGB24, RAW, RGB565, ARGB1555,
and ARGB4444 conversions. Introduces RGBToYMatrixRow, RGBToUVMatrixRow,
and equivalent functions for 16-bit and 24-bit formats. These functions
utilize a 2-step conversion internally (to ARGB, then to YUV) inside
row_common.cc for C, AVX2, and NEON, allowing the high-level
convert.cc logic to execute in a single pass using ArgbConstants.

Benchmark on Zen4
Test: libyuv_unittest --gtest_filter=*RGB*ToI420*

Was BT.601-only
ARGBToI420_Opt (115 ms)
ARGB4444ToI420_Opt (190 ms)
RGB565ToI420_Opt (194 ms)
ARGB1555ToI420_Opt (207 ms)
RGB24ToI420_Opt (143 ms)
RGBAToI420_Opt (167 ms)
28.07% ARGBToUVMatrixRow_AVX512BW
19.65% ARGBToYMatrixRow_AVX512BW
11.32% RGBAToUVRow_SSSE3
10.24% ARGB1555ToARGBRow_SSE2
 8.56% ARGB4444ToARGBRow_SSE2
 8.47% RGB565ToARGBRow_SSE2
 4.17% RGBAToYRow_AVX512BW
 4.04% RGB24ToARGBRow_AVX512BW

Now Matrix
ARGBToI420_Opt (124 ms)
ARGB4444ToI420_Opt (287 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (324 ms)
RGB24ToI420_Opt (236 ms)
RGBAToI420_Opt (126 ms)
29.74% ARGBToUVMatrixRow_AVX2
14.58% ARGB1555ToARGBRow_SSE2
12.59% RGB565ToARGBRow_SSE2
11.32% ARGB4444ToARGBRow_SSE2
 9.35% ARGBToYMatrixRow_AVX2
 8.45% RGB24ToARGBRow_SSSE3
 5.56% ARGBToYMatrixRow_AVX512BW
 1.37% ARGBToUVMatrixRow_Any_AVX2
 0.74% ARGBToYMatrixRow_Any_AVX2
 0.49% ARGB4444ToARGBRow_Any_SSE2
 0.46% RGB565ToARGBRow_Any_SSE2
 0.39% ARGB1555ToARGBRow_Any_SSE2
 0.28% RGB24ToARGBRow_Any_SSSE3
 0.11% ARGB4444ToYMatrixRow_AVX2
 0.09% RGB565ToUVMatrixRow_AVX2
 0.09% RGB565ToYMatrixRow_AVX2
 0.07% RGBToYMatrixRow_AVX2
 0.05% ARGB1555ToUVMatrixRow_AVX2
 0.04% ARGB1555ToYMatrixRow_AVX2
 0.03% RGBToUVMatrixRow_AVX2
 0.02% ARGB4444ToUVMatrixRow_AVX2

Bug: libyuv:508639302
Change-Id: I362c0cfe4c86ee1f3ffb569fa4f784b84148f11a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891045
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-01 14:04:07 -07:00
Wan-Teh Chang
d2c6dd5e6a Fix integer overflow in two convert functions
Fix integer overflow in buffer allocation size calculations in the
align_buffer_64() macro and the I422ToNV21() and
Android420ToARGBMatrix() functions.

Based on a CL autogenerated by MendIt (go/androidmendit):
https://googleplex-android-review.googlesource.com/c/platform/external/libyuv/+/39981732

Bug: 511821134
Change-Id: Ie1728c3ad337d460d9b85979489a817cc97e3bf3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7886817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-05-29 19:26:14 -07:00
Wan-Teh Chang
c98edcc8dc Don't coalesce rows if width*height would overflow
Audit all occurrences of "width *= height;" in the libyuv source code.
Make sure height > 0 and (ptrdiff_t)width * height <= INT_MAX before
executing width *= height.

Bug: chromium:517339758
Change-Id: I143a41c66492a6e4c48b6aa2a1c4a2ae974ceeb1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7883816
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-05-29 11:57:47 -07:00
Frank Barchard
e449eb2172 J400ToARGB switch from SSE2 to AVX2
- port for row_win
- remove unused HAS_ macros

Was C/SSE2
MSVC  J400ToARGB_Opt (1967 ms)
Clang J400ToARGB_Opt (568 ms)

Now AVX2
MSVC  J400ToARGB_Opt (411 ms)
Clang J400ToARGB_Opt (418 ms)

Test: libyuv_unittest --gtest_filter=*J400ToARGB*
Bug: libyuv:508639302

Change-Id: Ifdfb026832b708b61f55477250cc5ee52449f421
TAG=agy
CONV=186608fc-966a-4ea7-bf57-9fe07cc1383c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7877368
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
2026-05-28 21:24:32 -07:00
Frank Barchard
9d98aaefe7 InterpolateRow for Visual C
- remove InterpolateRow_SSSE3
- optimize ARGBToUV444MatrixRow_AVX2 to use unsigned pixels

5.7x faster on AMD Zen4

Was C
TestInterpolatePlane (144 ms)
TestInterpolatePlane_16 (142 ms)

Now AVX2
TestInterpolatePlane (25 ms)
TestInterpolatePlane_16 (48 ms)

Was signed
ARGBToJ444_Opt (157 ms)
Now unsigned
ARGBToJ444_Opt (155 ms)

Bug: None
Change-Id: I903109668ff9cfedaddad1ad75411393b3226f41
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7856498
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-18 17:28:46 -07:00
Frank Barchard
9f751100d2 InterpolateRow_16_AVX2 for row_gcc
On AMD Zen4
Was C
TestInterpolatePlane_16 (143 ms)
Now AVX2
TestInterpolatePlane_16 (48 ms)

Was
I210ToI420_Opt (87 ms)
 35.60% InterpolateRow_16To8_AVX2
 31.03% Convert16To8Row_AVX512BW
 21.35% Convert16To8Row_AVX2

Now
I210ToI420_Opt (69 ms)
 37.57% Convert16To8Row_AVX512BW
 32.69% InterpolateRow_16_AVX2
  7.18% Convert16To8Row_AVX2
  5.23% InterpolateRow_16To8_AVX2

Bug: None
Change-Id: Ica9b9c5dbd847068ae076b682c487e1753d3c812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7855648
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-18 14:29:36 -07:00
Frank Barchard
cda55fcf53 Mirrow AVX2 functions for Visual C
Bug: libyuv:42280902
Change-Id: Iabbec9af3a4f4dd89294e60145823c7fc4dd6ec6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7843378
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-15 15:05:31 -07:00
Frank Barchard
dd8b46630a ARGBToUV444MatrixRow_AVX2 intrinsics for Visual C
Was C
LibYUVConvertTest.ARGBToI444_Opt (1027 ms)

Now AVX2
LibYUVConvertTest.ARGBToI444_Opt (310 ms)

Bug: libyuv:508639302
Change-Id: I0bc7f5c5b72160d24226a98d5fddb184a004ed00
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7841655
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-12 14:19:58 -07:00
Frank Barchard
cb061d0378 Unittests use ASSERT instead of EXPECT
Bug: libyuv:508639302
Change-Id: I22c35e08f3b6db1a656192877c1fb1bf4e96d6f5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7838659
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-11 19:10:47 -07:00
Frank Barchard
e23282704f ARGBToYRow_AVX512BW preserve XMM6-XMM15 due to Windows stack alignment
Bug: 505124541
Change-Id: Id5ae539f57b314980182bec76a788e33273b2392
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7835639
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-11 13:12:22 -07:00
Frank Barchard
4b4e68b372 ABGRToJ420 call ARGBToI420Matrix
- Standardize libyuv ARGB-family (ARGB, ABGR, RGBA, BGRA) to YUV conversion by utilizing the generic MatrixRow architecture and explicit ArgbConstants.
- Consolidated ARGBToI420, ABGRToI420, BGRAToI420, and RGBAToI420 as wrappers for ARGBToI420Matrix.
- Refactored ABGRToJ420, ABGRToJ422, and ABGRToI422 to use generic matrix functions.
- Added matrix-based versions for NV21, I400, YUY2, and UYVY.
- Updated RAW and RGB24 to I420/I422/I444 dispatchers to use MatrixRow logic and explicit constants.
- Fixed parameter swap bugs in ARGBToI422, ARGBToJ422, and ABGRToJ422.
- Fixed a bug in the generic C implementation of matrix row functions ensuring all 4 channels are processed correctly for all ARGB-family formats.
- Moved kShuffleAARRGGBB in row_gcc.cc to the top of the libyuv namespace for visibility.
- Cleaned up redundant format-specific row implementations.

Bug: libyuv:42280902
Change-Id: I67ffa4c476abc0d2dcc4650510d7bda91b65988e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7830291
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-08 15:23:30 -07:00
Frank Barchard
4aacbbdfb4 Refactored RGB/RAW to YUV color conversion functions to use generic Matrix-based functions parameterized by ArgbConstants.
This consolidation standardizes conversion logic, improves code
maintainability, and provides flexible support for various color spaces
(e.g., BT.601, JPEG full
  range).

Key Modifications:
 - Function Consolidation: Refactored several high-level conversion functions into lightweight wrappers around generic Matrix variants:
     - ARGBToI420 → ARGBToI420Matrix
     - ARGBToI444 → ARGBToI444Matrix
     - ARGBToI422 → ARGBToI422Matrix
     - ARGBToNV12 → ARGBToNV12Matrix
     - RAWToJ400, RGB24ToJ400 → RGBToI400Matrix
     - RAWToI444, RAWToJ444 → RGBToI444Matrix
 - 2-Pass Conversions: Updated RGB565ToI420, ARGB1555ToI420, and ARGB4444ToI420 to utilize 2-pass conversions via RGBToI420Matrix.
 - Standardization: Refactored ARGBToNV21, ARGBToYUY2, and ARGBToUYVY to use parameterized matrix row functions (ARGBToYMatrixRow,
   ARGBToUVMatrixRow).
 - Legacy Cleanup: Replaced legacy calls to ARGBToYJRow with the parameterized ARGBToYMatrixRow in the ARGBSobelize helper.
 - Internal Integration: Included libyuv/convert_from_argb.h in planar_functions.cc and ensured all new matrix symbols are properly
   declared/exported (LIBYUV_API).

Bug: libyuv:42280902
Change-Id: Ied5fd9899767427e3a03cdcfbeaff3e9d502374a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7822033
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-06 20:02:47 -07:00
Frank Barchard
561a9780e2 YUV to RGB avoid avx assist
Here are the functions flagged for mixing both SSE and AVX (or AVX-512)
instructions, which can trigger an AVX transition/assist performance
penalty:

Libyuv Functions addressed in this CL
   * I422ToARGBRow_AVX512BW
   * HalfFloatRow_SSE2

Not addressed:
   * ScaleFilterCols_SSSE3

Bug: libyuv:509681367
Change-Id: I8ced6065dfe0c516d05857086393782c8590062a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7814945
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-05 12:57:55 -07:00
Frank Barchard
f2ac6db694 RAWToNV21 using SME, SVE, I8MM or Neon
Pixel 9 Now SVE2 2 pass LibYUVConvertTest.RAWToNV21_Opt (364 ms)
 31.76% libyuv::ARGBToUVMatrixRow_SVE_SC()
 30.38% RAWToARGBRow_SVE2
 26.81% ARGBToYMatrixRow_NEON_DotProd
  3.26% MergeUVRow_NEON

Was NEON 1 pass LibYUVConvertTest.RAWToJNV21_Opt (295 ms)
 44.14% RAWToYJRow_NEON
 41.91% RAWToUVJRow_NEON
  5.11% MergeUVRow_NEON

Clang on Intel Skylake clang [ OK ] LibYUVConvertTest.RAWToJNV21_Opt
(301 ms) visual c (row_win) [ OK ] LibYUVConvertTest.RAWToJNV21_Opt
(2056 ms)

clang [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (275 ms) visual c [ OK ]
LibYUVConvertTest.RAWToJNV21_Opt (365 ms)

Bug: libyuv:42280902
Change-Id: Iaba558ebe96ce6b9881ee9335ba72b8aac390cde
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7802432
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-29 13:11:04 -07:00
Wan-Teh Chang
a7849e8a5e Fix yi * src_stride overflow in ScalePlaneVertical
Fix int overflow of yi * src_stride overflow in ScalePlaneVertical(),
ScalePlaneVertical_16(), and ScalePlaneVertical_16To8() by casting the
operand src_stride to ptrdiff_t.

Adapted from the patches by Victor Miura <vmiura@google.com>.

Bug: 505814332
Change-Id: I4a4751041a213f7208b01eb18c43c9e196a36261
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7796558
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-04-28 12:34:12 -07:00
Frank Barchard
bd2c4c76ec RAWToARGB AVX512VBMI
Bug: libyuv:42280902
Change-Id: I1c7f432f004079357a00515785bc524c459ed4b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7787160
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-22 14:48:29 -07:00
Frank Barchard
d445250d8b Replace RAWToY/RGB24ToY with RGBToYMatrix
Bug: libyuv:42280902
Change-Id: I6ddebd492036c416550fc045eb39493dea73246b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7784094
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-21 17:11:14 -07:00
Frank Barchard
81f698829b Add RGBToNV21Matrix function
- implement wrappers with RAW, RGB24, NV21 and JNV21 to call it.

Zen5
Was [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1146 ms)
Now [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1446 ms)
reason - the new code uses 1 pass for RAWToY but 2 pass for RAWToARGB,ARGBToUV.  needs 1 RGBToUV

Bug: libyuv:42280902
Change-Id: Ife6fbed0829484045409e6d42b85cec1d1fd6052
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7780026
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-20 18:03:34 -07:00
Frank Barchard
ddc6764d13 ARGBToUVMatrixRow_RVV replace vlseg8 with vlseg4,
implementing horizontal paired adds and accumulation to improve
performance on SiFive x280, and fixes the remainder logic to use valid
vlseg4 loads. Adds TestARGBToUVRow_Any to test odd-width remainder
handling.

Also fixes a build break for non-RVV compilations by ensuring all RVV
functions and their closing cplusplus braces are correctly wrapped in
#if !defined(LIBYUV_DISABLE_RVV).

Also adds NV12ToNV21 as a macro alias for NV21ToNV12 in
planar_functions.h, as the conversion is bidirectional (swapping byte
pairs in the interleaved chroma plane). (Patch from
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7762904)

Bug: libyuv:42280902
Change-Id: If2d6cbb3e232d63d43e32aba33fa9b2eee8190e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7772164
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-17 15:04:45 -07:00
Frank Barchard
94644361b4 row_win.cc rewrite into intrinsics
- remove inline asm which was only for 32 bit
- add ARGBToYMatrixRow_AVX2
- add gn flag libyuv_enable_rowwin=true

Example of building with GN and Ninja:

Without the new flag:
  gn gen out/Release "--args=is_debug=false"
  ninja -C out/Release

With the new flag:
 gn gen out/Release "--args=is_debug=false libyuv_enable_rowwin=true"
 ninja -C out/Release

Bug: libyuv:42280806, 477295731, libyuv:42280902, libyuv:439628764
R=​dalecurtis@chromium.org, rrwinterton@gmail.com

Change-Id: I451bf814622fba690005c02fbf5816819c6a08c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7765790
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-15 19:53:16 -07:00
Frank Barchard
e034c41661 Port ARGBToUVMatrixRow from AVX2 to AVX512BW
Benchmark on Icelake Xeon
Now AVX512BW:
[       OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms)
Was AVX2:
[       OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms)

- Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`.
- Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions.
- Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`.
- Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`.
- Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately.
- Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support.
- Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment.

Bug: 477295731
R=dalecurtis@chromium.org, rrwinterton@gmail.com

Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-14 16:15:31 -07:00
Frank Barchard
893eacf9b4 ARGBToY for AVX512
- add ARGBToYMatrixRow_AVX512BW
- refactor SSE and AVX to use Matrix functions, making old functions
  call the new ones.

Zen5 1280x720
Was AVX2   LibYUVConvertTest.ARGBToI444_Opt (1125 ms)
Now AVX512 LibYUVConvertTest.ARGBToI444_Opt (641 ms)

Details by Gemini:
  1. Created 3 new Matrix functions:
    Added ARGBToYMatrixRow_SSSE3, ARGBToYMatrixRow_AVX2, and
    ARGBToYMatrixRow_AVX512BW to source/row_gcc.cc. These take the
    const struct ArgbConstants* c parameter similarly to
    ARGBToUV444MatrixRow_*. The x86 vector instructions dynamically
    calculate the needed values using the properties of the constants
    struct, including using vpmaddwd inside the AVX512 code to offset
    the lack of a native vphaddw.

  2. Replaced Old Functions with Wrappers:
    Modified the existing implementations of ARGBToYRow_SSSE3,
    ARGBToYJRow_SSSE3, ABGRToYRow_SSSE3, ABGRToYJRow_SSSE3,
    RGBAToYRow_SSSE3, RGBAToYJRow_SSSE3, BGRAToYRow_SSSE3 (and their
    _AVX2 equivalents) in source/row_gcc.cc to act as inline wrappers
    calling the new ARGBToYMatrixRow_* functions, passing the right
    matrix parameters (e.g. &kArgbI601Constants, &kArgbJPEGConstants,
    &kAbgrI601Constants).

  3. Added row_any.cc Handlers:
    Added ANY11MC definitions to source/row_any.cc to autogenerate
    ARGBToYMatrixRow_Any_SSSE3, ARGBToYMatrixRow_Any_AVX2, and
    ARGBToYMatrixRow_Any_AVX512BW which safely handles non-aligned
    tails.

  4. Updated include/libyuv/row.h:
    Updated the headers with the proper void declarations for all newly
    generated Matrix and Any_ variants. Also defined
    HAS_ARGBTOYROW_AVX512BW in the CPU macros.

  5. Tested the Implementations:
    Compiled and tested on Linux x86, which resulted in all tests passing
    cleanly. Also successfully completed all Windows 32-bit build checks
    ensuring 32-bit regression prevention without issues.

Bug: 477295731
Change-Id: I4f5eec9a961e24a9d760d0a1c0810fb5e29a0bd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7759494
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-13 17:26:07 -07:00
Frank Barchard
4f4e1ac553 Fix 2 failing golden tests
- Add ifdef for LIBYUV_UNLIMITED_DATA

Fixed by Gemini just telling it how to build and run the test and to fix it.

Bug: libyuv:353545922
Change-Id: I117a25b75b9616ee2ce6122aa163c2085ed4dc7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7742120
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-09 11:51:13 -07:00
Frank Barchard
4c3d7d517a ARGBToUV444 for AVX512
1.27x faster on AMD Zen5 (turin)

Now AVX512
perf record ./libyuv_test '--gunit_filter=*ARGBToI444_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1

[       OK ] LibYUVConvertTest.ARGBToI444_Opt (1071 ms)
Overhead  Symbol
  53.49%  ARGBToYRow_AVX2
  44.70%  ARGBToUV444Row_AVX512BW

Was AVX2
[       OK ] LibYUVConvertTest.ARGBToI444_Opt (1369 ms)
  61.06%  ARGBToUV444Row_AVX2
  37.67%  ARGBToYRow_AVX2

Bug:  libyuv:42280902
Change-Id: I306fbac656d6f7834ce1559e86d01eb34931ec3c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7738362
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-08 19:25:41 -07:00
Jordan
917276084a Set Update Mechanism: Manual
This CL sets the Update Mechanism to Manual in README files.

Bug: 445311061
Change-Id: I4df6c5815b85c04b047b39b4352ba43789702d26
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512992
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Owners-Override: Jordan Brown <rop@google.com>
2026-01-28 00:04:45 -08:00
Frank Barchard
500f45652c For for ARM32 build when built with __SOFTFP__
planar_test.cc was
  Error: selected processor does not support `vmrs r3,fpscr' in ARM mode
  Error: selected processor does not support `vmsr fpscr,r3' in ARM mode

Bug: None
Change-Id: I2ee0e7191c372277901c94e29d9ed91bbac71af2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7063737
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-10-20 11:54:25 -07:00
Frank Barchard
2b4453d46f Deprecate MIPS and MSA support.
- Remove *_msa.cc source files
- Update build files
- Update header references, planar ifdefs for row functions
- Update documentation on supported platforms
- Version bumped to 1921
- clang-format applied

Bug: 434383432
Change-Id: I072d6aac4956f0ed668e64614ac8557612171f76
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7045953
Reviewed-by: Justin Green <greenjustin@google.com>
2025-10-16 12:20:40 -07:00
Frank Barchard
94417b9d21 Pass rgbconstants via struct pointer instead of elements with m
Now 66 instructions
SYM ARGBToUVRow_SSSE3:
62ccd0: BASE       push ebp
62ccd1: BASE       mov ebp, esp
62ccd3: BASE       push ebx
62ccd4: BASE       push edi
62ccd5: BASE       push esi
62ccd6: BASE       and esp, 0xfffffffc
62ccd9: BASE       sub esp, 0xc
62ccdc: BASE       call 0x62cce1 <ARGBToUVRow_SSSE3+0x11>
62cce1: BASE       pop eax
62cce2: BASE       add eax, 0xe1c27
62cce8: BASE       mov ecx, dword ptr [ebp+0xc]
62cceb: BASE       mov edx, dword ptr [ebp+0x8]
62ccee: BASE       mov esi, dword ptr [ebp+0x10]
62ccf1: BASE       mov edi, dword ptr [ebp+0x18]
62ccf4: BASE       mov dword ptr [esp+0x8], edi
62ccf8: BASE       mov edi, dword ptr [ebp+0x14]
62ccfb: BASE       lea ebx, ptr [eax-0x5ecf88]
62cd01: SSE2       movdqa xmm4, xmmword ptr [ebx]
62cd05: SSE2       movdqa xmm5, xmmword ptr [ebx+0x10]
62cd0a: SSE2       pcmpeqb xmm6, xmm6
62cd0e: SSSE3      pabsb xmm6, xmm6
62cd13: SSE2       movdqa xmm7, xmmword ptr [eax-0x5ecfa8]
62cd1b: BASE       sub edi, esi

62cd1d: SSE2       movdqu xmm0, xmmword ptr [edx]
62cd21: SSE2       movdqu xmm1, xmmword ptr [edx+0x10]
62cd26: SSE2       movdqu xmm2, xmmword ptr [edx+ecx*1]
62cd2b: SSE2       movdqu xmm3, xmmword ptr [edx+ecx*1+0x10]
62cd31: SSSE3      pshufb xmm0, xmm7
62cd36: SSSE3      pshufb xmm1, xmm7
62cd3b: SSSE3      pshufb xmm2, xmm7
62cd40: SSSE3      pshufb xmm3, xmm7
62cd45: SSSE3      pmaddubsw xmm0, xmm6
62cd4a: SSSE3      pmaddubsw xmm1, xmm6
62cd4f: SSSE3      pmaddubsw xmm2, xmm6
62cd54: SSSE3      pmaddubsw xmm3, xmm6
62cd59: SSE2       paddw xmm0, xmm2
62cd5d: SSE2       paddw xmm1, xmm3
62cd61: SSE2       pxor xmm2, xmm2
62cd65: SSE2       psrlw xmm0, 0x1
62cd6a: SSE2       psrlw xmm1, 0x1
62cd6f: SSE2       pavgw xmm0, xmm2
62cd73: SSE2       pavgw xmm1, xmm2
62cd77: SSE2       packuswb xmm0, xmm1
62cd7b: SSE2       movdqa xmm2, xmm6
62cd7f: SSE2       psllw xmm2, 0xf
62cd84: SSE2       movdqa xmm1, xmm0
62cd88: SSSE3      pmaddubsw xmm1, xmm5
62cd8d: SSSE3      pmaddubsw xmm0, xmm4
62cd92: SSSE3      phaddw xmm0, xmm1
62cd97: SSE2       psubw xmm2, xmm0
62cd9b: SSE2       psrlw xmm2, 0x8
62cda0: SSE2       packuswb xmm2, xmm2
62cda4: SSE2       movd dword ptr [esi], xmm2
62cda8: SSE2       pshufd xmm2, xmm2, 0x55
62cdad: SSE2       movd dword ptr [esi+edi*1], xmm2
62cdb2: BASE       lea edx, ptr [edx+0x20]
62cdb5: BASE       lea esi, ptr [esi+0x4]
62cdb8: BASE       sub dword ptr [esp+0x8], 0x8
62cdbd: BASE       jnle 0x62cd1d <ARGBToUVRow_SSSE3+0x4d>

62cdc3: BASE       lea esp, ptr [ebp-0xc]
62cdc6: BASE       pop esi
62cdc7: BASE       pop edi
62cdc8: BASE       pop ebx
62cdc9: BASE       pop ebp
62cdca: BASE       ret

Was 68 instructions
ARGBToUVRow_SSSE3:
62ccd0: BASE       push ebp
62ccd1: BASE       mov ebp, esp
62ccd3: BASE       push edi
62ccd4: BASE       push esi
62ccd5: BASE       and esp, 0xfffffff0
62ccd8: BASE       sub esp, 0x30
62ccdb: BASE       call 0x62cce0 <ARGBToUVRow_SSSE3+0x10>
62cce0: BASE       pop eax
62cce1: BASE       add eax, 0xe1c28
62cce7: BASE       mov ecx, dword ptr [ebp+0xc]
62ccea: BASE       mov edx, dword ptr [ebp+0x8]
62cced: BASE       mov esi, dword ptr [ebp+0x10]
62ccf0: BASE       mov edi, dword ptr [ebp+0x18]
62ccf3: BASE       mov dword ptr [esp+0xc], edi
62ccf7: BASE       mov edi, dword ptr [ebp+0x14]
62ccfa: SSE        movaps xmm0, xmmword ptr [eax-0x5ecf88]
62cd01: SSE        movaps xmmword ptr [esp+0x20], xmm0
62cd06: SSE        movaps xmm0, xmmword ptr [eax-0x5ecf78]
62cd0d: SSE        movaps xmmword ptr [esp+0x10], xmm0
62cd12: SSE2       movdqa xmm4, xmmword ptr [esp+0x20]
62cd18: SSE2       movdqa xmm5, xmmword ptr [esp+0x10]
62cd1e: SSE2       pcmpeqb xmm6, xmm6
62cd22: SSSE3      pabsb xmm6, xmm6
62cd27: SSE2       movdqa xmm7, xmmword ptr [eax-0x5ecfa8]
62cd2f: BASE       sub edi, esi

62cd31: SSE2       movdqu xmm0, xmmword ptr [edx]
62cd35: SSE2       movdqu xmm1, xmmword ptr [edx+0x10]
62cd3a: SSE2       movdqu xmm2, xmmword ptr [edx+ecx*1]
62cd3f: SSE2       movdqu xmm3, xmmword ptr [edx+ecx*1+0x10]
62cd45: SSSE3      pshufb xmm0, xmm7
62cd4a: SSSE3      pshufb xmm1, xmm7
62cd4f: SSSE3      pshufb xmm2, xmm7
62cd54: SSSE3      pshufb xmm3, xmm7
62cd59: SSSE3      pmaddubsw xmm0, xmm6
62cd5e: SSSE3      pmaddubsw xmm1, xmm6
62cd63: SSSE3      pmaddubsw xmm2, xmm6
62cd68: SSSE3      pmaddubsw xmm3, xmm6
62cd6d: SSE2       paddw xmm0, xmm2
62cd71: SSE2       paddw xmm1, xmm3
62cd75: SSE2       pxor xmm2, xmm2
62cd79: SSE2       psrlw xmm0, 0x1
62cd7e: SSE2       psrlw xmm1, 0x1
62cd83: SSE2       pavgw xmm0, xmm2
62cd87: SSE2       pavgw xmm1, xmm2
62cd8b: SSE2       packuswb xmm0, xmm1
62cd8f: SSE2       movdqa xmm2, xmm6
62cd93: SSE2       psllw xmm2, 0xf
62cd98: SSE2       movdqa xmm1, xmm0
62cd9c: SSSE3      pmaddubsw xmm1, xmm5
62cda1: SSSE3      pmaddubsw xmm0, xmm4
62cda6: SSSE3      phaddw xmm0, xmm1
62cdab: SSE2       psubw xmm2, xmm0
62cdaf: SSE2       psrlw xmm2, 0x8
62cdb4: SSE2       packuswb xmm2, xmm2
62cdb8: SSE2       movd dword ptr [esi], xmm2
62cdbc: SSE2       pshufd xmm2, xmm2, 0x55
62cdc1: SSE2       movd dword ptr [esi+edi*1], xmm2
62cdc6: BASE       lea edx, ptr [edx+0x20]
62cdc9: BASE       lea esi, ptr [esi+0x4]
62cdcc: BASE       sub dword ptr [esp+0xc], 0x8
62cdd1: BASE       jnle 0x62cd31 <ARGBToUVRow_SSSE3+0x61>

62cdd7: BASE       lea esp, ptr [ebp-0x8]
62cdda: BASE       pop esi
62cddb: BASE       pop edi
62cddc: BASE       pop ebp
62cddd: BASE       ret
62cdde: BASE       int3
BUG=444157316

Change-Id: Iad044f851359f5b052091c7bdab9b96946fc3682
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6987370
Reviewed-by: Justin Green <greenjustin@google.com>
2025-09-29 12:34:36 -07:00
Frank Barchard
142db12947 ARGBToUV use AVX2 for 64 bit x86
Skylake
Was ARGBToJ420_Opt (312 ms)
Now ARGBToJ420_Opt (242 ms)

Icelake
Was ARGBToJ420_Opt (302 ms)
Now ARGBToJ420_Opt (220 ms)

AMD Zen3 on Windows
Was ARGBToJ420_Opt (305 ms)
Now ARGBToJ420_Opt (216 ms)
32 bit x86 uses SSE
Now ARGBToJ420_Opt (326 ms)

MCA analysis of new AVX, SSE and old AVX
https://godbolt.org/z/37bdazWYr

Bug: None
Change-Id: I72f5504407751e164c3558aebe836dd15223d65f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6957477
Reviewed-by: Justin Green <greenjustin@google.com>
2025-09-17 14:39:53 -07:00
Frank Barchard
a61882c049 ARGBToUV AVX2 for x86_64
Icelake
Was SSSE3+SSSE3 ARGBToJ420_Opt (356 ms)
Was SSSE3+AVX2  ARGBToJ420_Opt (301 ms)
Now AVX2+AVX2   ARGBToJ420_Opt (227 ms)

Change-Id: I2cb427bc164b225b3ad4c5f43c09d6da6ca496d5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6943036
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-09-16 11:33:54 -07:00
Frank Barchard
0f795672ae Reduce ARGBToUV SSSE3 register usage for clang build error on x64
Bug: 444157316
Change-Id: I2ae9f3dbfb373bb874a3d9699987f7d5b63f2610
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6937665
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-09-10 18:40:06 -07:00
Frank Barchard
d71cda1bb0 Rollback util cpuid hybrid detect due to android build errors
Bug: 438241552
Change-Id: Ie56aa7296e796e44e63d0dd913120b897b12cc9b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6843504
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-08-12 14:13:24 -07:00
Frank Barchard
cdd3bae848 TestI400LargeSize fix for warning message build error
- change %ld to %zd for size_t printf warnings
- disable TestI400LargeSize when disabling SLOW_TESTS
- disable cpuid tests that read proc/cpuinfo test data files
- add ifdef around timers to allow hexagon build
- remove faulty hybrid detect
- remove old mips LIBYUV_DISABLE_DSPR2 reference in gyp build
- apply clang-format

Bug: 434382656
Change-Id: Id74812e6ef29d4a8d0ff967a9189d249b80816d4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6812825
Reviewed-by: Jeremy Leconte <jleconte@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-08-01 12:03:11 -07:00
Frank Barchard
3ff31b2a5f Make LibYUVConvertTest.TestI400LargeSize skip test on low end arm cpu
- detect lack of dot product instruction to infer the cpu is low end
- only run the test on higher end arm

Bug: 416842099
Change-Id: Idd2dd16a624bbba280cf531644440024b12f7ecf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6804632
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2025-07-31 02:41:17 -07:00
Frank Barchard
6f729fbe65 ARGBToUV SSE use average of 4 pixels
- Was using avgb twice for non-exact and C for exact.

On Skylake Xeon:

Now SSE3
ARGBToJ420_Opt (326 ms)

Was
Exact C
ARGBToJ420_Opt (871 ms)
Not exact AVX2
ARGBToJ420_Opt (237 ms)
Not exact SSSE3
ARGBToJ420_Opt (312 ms)

Bug: 381138208
Change-Id: I6d1081bb52e36f06736c0c6575fa82bb2268629b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6629605
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ben Weiss <bweiss@google.com>
2025-06-17 11:55:27 -07:00
Frank Barchard
889613683a Add hybrid detect for Intel laptop cpus
- Add +i8mm build option for sve ARGBToUV which uses usdot
- util/cpuid Get cpu count (windows, macos, linux)
- For each x86 cpu, detect hybrid (e-core)
- Includes a comment fix for ubsan unittest
- Bump version
- Apply clang format to util/*.c as well as all *.cc/*.h

Bug: 424637372
Change-Id: I08310e18051fff62c9e4e4a10d1e4361871119ac
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6635640
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-06-13 13:22:54 -07:00
Frank Barchard
4ac0a3ae3d ubsan compliant '_any' functions using ptrdiff_t for pointer math
Bug: 416842099
Change-Id: I1e3c7bc1b363c11baeb3b529ee78e5ac8878c359
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6634217
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-06-10 15:01:52 -07:00
Frank Barchard
0853c9353f ARGBToUV 64 bit use ymm8 for shuffler
Bug: 381138208
Change-Id: I5e69bc1610bd6269bf9a4113e729cf307dd36f60
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6536833
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-05-12 15:09:40 -07:00
Frank Barchard
9f9b5cf660 ARGBToUV allow 32 bit x86 build
- make width loop count on stack
- set YMM constants in its own asm block
- make struct for shuffle and add constants
- disable clang format on row_neon.cc function

Bug: 413781394
Change-Id: I263f6862cb7589dc31ac65d118f7ebeb65dbb24a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6495259
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-04-28 12:11:00 -07:00
Frank Barchard
23d416d6f3 Detect SME without SVE dependency
Bug: None
Change-Id: Ibe29488e893a493699ea3fae1a1a54a4fff5969c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6418571
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-03-31 17:27:40 -07:00
Frank Barchard
5f284054cb RVV disable 64 bit elements and vcombine_v
Bug: 405451074
Change-Id: I8e4437be92934b3c367c94d867d7967c32747260
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6385788
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-03-25 12:51:25 -07:00
Jordan
0fd4581a51 Updating license id for libyuv
Bug: b/358504615

Change-Id: I93fecd22c16df8949a8ebe85aabe539c0231985e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6275535
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-03-18 16:37:39 -07:00
Frank Barchard
c060118bea ARGBToJ444 use 256 for fixed point scale UV
- use negative coefficients for UV to allow -128
- change shift to truncate instead of round for UV
- adapt all row_gcc RGB to UV into matrix functions
- add -DLIBYUV_ENABLE_ROWWIN to allow clang on Windows to use row_win.cc

Bug: 381138208
Change-Id: I6016062c859faf147a8a2cdea6c09976cbf2963c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6277710
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-02-27 13:04:15 -08:00
Frank Barchard
61354d2671 ARGBToUV Matrix for AVX2 and SSSE3
- Round before shifting to 8 bit to match NEON
  - RAWToARGB use unaligned loads and port to AVX2

Was C/SSSE/AVX2
ARGBToI444_Opt (343 ms)
ARGBToJ444_Opt (677 ms)
RAWToI444_Opt (405 ms)
RAWToJ444_Opt (803 ms)

Now AVX2
ARGBToI444_Opt (283 ms)
ARGBToJ444_Opt (284 ms)
RAWToI444_Opt (316 ms)
RAWToJ444_Opt (339 ms)

Profile Now AVX2
  38.31%  ARGBToUVJ444Row_AVX2
  32.31%  RAWToARGBRow_AVX2
  23.99%  ARGBToYJRow_AVX2

Profile Was C/SSSE/AVX2
    73.15%  ARGBToUVJ444Row_C
    15.74%  RAWToARGBRow_SSSE3
     8.87%  ARGBToYJRow_AVX2

Bug: 381138208
Change-Id: I696b2d83435bc985aa38df831e01ff1a658da56e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6231592
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Ben Weiss <bweiss@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-02-10 18:36:18 -08:00
Frank Barchard
d32d19ccf2 UV subsample on ARM use rounding average of 4 pixels
Performance on Samsung S22 Exynos (SVE2+I8MM+DOTPROD+Neon)
AArch64
ARGBToI400_Opt (168 ms)
ARGBToJ400_Opt (103 ms)
ABGRToJ400_Opt (81 ms)
RGBAToJ400_Opt (82 ms)
RGB24ToJ400_Opt (176 ms)
RAWToJ400_Opt (176 ms)
ABGRToI420_Opt (258 ms)
ARGBToI420_Opt (259 ms)
ARGBToI422_Opt (403 ms)
ARGBToI444_Opt (213 ms)
ARGBToJ420_Opt (257 ms)
ARGBToJ422_Opt (403 ms)
ARGBToJ444_Opt (214 ms)
ABGRToJ420_Opt (255 ms)
ABGRToJ422_Opt (399 ms)
ARGB4444ToI420_Opt (285 ms)
RGB565ToI420_Opt (316 ms)
ARGB1555ToI420_Opt (324 ms)
BGRAToI420_Opt (260 ms)
RAWToI420_Opt (303 ms)
RAWToI444_Opt (303 ms)
RAWToJ420_Opt (335 ms)
RAWToJ444_Opt (308 ms)
RGB24ToI420_Opt (372 ms)
RGB24ToJ420_Opt (365 ms)
RGBAToI420_Opt (259 ms)

AArch32 (Neon)
ARGBToI400_Opt (496 ms)
ARGBToJ400_Opt (478 ms)
ABGRToJ400_Opt (483 ms)
RGBAToJ400_Opt (493 ms)
RGB24ToJ400_Opt (343 ms)
RAWToJ400_Opt (341 ms)
ABGRToI420_Opt (993 ms)
ARGBToI420_Opt (992 ms)
ARGBToI422_Opt (1503 ms)
ARGBToI444_Opt (1257 ms)
ARGBToJ420_Opt (1006 ms)
ARGBToJ422_Opt (1521 ms)
ARGBToJ444_Opt (1267 ms)
ABGRToJ420_Opt (1002 ms)
ABGRToJ422_Opt (1504 ms)
ARGB4444ToI420_Opt (1180 ms)
RGB565ToI420_Opt (1112 ms)
ARGB1555ToI420_Opt (1115 ms)
BGRAToI420_Opt (993 ms)
RAWToI420_Opt (703 ms)
RAWToI444_Opt (1717 ms)
RAWToJ420_Opt (704 ms)
RAWToJ444_Opt (1739 ms)
RGB24ToI420_Opt (703 ms)
RGB24ToJ420_Opt (703 ms)
RGBAToI420_Opt (993 ms)

Bug: 381138208
Change-Id: I33728d5237f357362b0bfc509a9ebe6fe46f45d4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6228987
Reviewed-by: Ben Weiss <bweiss@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-02-04 15:19:19 -08:00
Frank Barchard
5a9a6ea936 Add RAWToI444
Skylake Xeon
  RAWToI444_Opt (433 ms)
  RAWToJ444_Opt (1781 ms)
  ARGBToI444_Opt (352 ms)
  ARGBToJ444_Opt (1577 ms)

Samsung S22 Exynos
  ARGBToI444_Opt (283 ms)
  ARGBToJ444_Opt (209 ms)
  RAWToI444_Opt (294 ms)
  RAWToJ444_Opt (293 ms)

Profiling on Samsung S22 Exynos
37.62%,  ARGBToUV444Row_NEON_I8MM
29.42%,  RAWToARGBRow_SVE2
19.61%,  ARGBToYRow_NEON_DotProd

Passing different --libyuv_cpu_info=N etc we can compare each ISA
C           1  RAWToI444_Opt (781 ms)
NEON      511  RAWToI444_Opt (757 ms)
NEONDOT  1023  RAWToI444_Opt (571 ms)
NEONI8MM 2047  RAWToI444_Opt (334 ms)
SVE2     8191  RAWToI444_Opt (307 ms)



Bug: 390247964
Change-Id: I0316fedd32222588455afa751f5b854f46bce024
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6223658
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-02-03 16:13:03 -08:00
Frank Barchard
c1bac9e6a5 RAWToJ444 and ARGBToJ444
- ARGBToJ444 implements ARGBToUVJ444Row_C
- RAWToJ444 implemented as 2 steps - RAWToARGB and ARGBToJ444

libyuv_test '--gunit_filter=*R*To?444_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
(with bit exact off)

Samsung S23
RAWToJ444_Opt (437 ms)
ARGBToJ444_Opt (337 ms)
ARGBToI444_Opt (196 ms)

Skylake Xeon
RAWToJ444_Opt (1699 ms)
ARGBToJ444_Opt (1559 ms)
ARGBToI444_Opt (346 ms)

Bug: 390247964
Change-Id: Id1b1b45a5e4512ab50830aadf62f780fbe631575
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6207845
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-01-29 15:18:38 -08:00