2061 Commits

Author SHA1 Message Date
Frank Barchard
d23308a2a7 add bmm detect and vdpphps in util/cpuid
Bug: None
Change-Id: I9954f96a74e653e3ecd3fbeba533299fa8e57d95
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7914867
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-09 14:52:48 -07:00
Frank Barchard
3bdb3b94ca I420ToRAW use 2 step AVX512
On Icelake
Was AVX2
I420ToRAW_Opt (283 ms)
  67.55%  I422ToARGBRow_AVX2
  26.46%  ARGBToRGB24Row_AVX2

Now AVX512VBMI
I420ToRAW_Opt (238 ms)
  73.08%  I422ToARGBRow_AVX512BW
  21.59%  ARGBToRGB24Row_AVX512VBMI

Bug: 42280902
Change-Id: I9d4d21faed30c529a5e593819f103be115709f37
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7909924
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-08 14:32:13 -07:00
Frank Barchard
4be798d7c5 BGRAToI420 use BgraConstants for a direct conversion using AVX512BW
row win (msvc)
Was C/SSSE3
BGRAToARGB_Opt (594 ms)
BGRAToARGB_Endswap_Opt (609 ms)
BGRAToI420_Opt (122 ms)

Now AVX2
BGRAToARGB_Opt (100 ms)
BGRAToARGB_Endswap_Opt (99 ms)
BGRAToI420_Opt (115 ms)

Clang/GCC AVX512BW
BGRAToARGB_Opt (86 ms)
BGRAToARGB_Endswap_Opt (91 ms)
BGRAToI420_Opt (110 ms)


Bug: 42280902
Change-Id: I52cb2b0cacea8f2f0b138ec3cc521185dbef8595
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7905821
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-08 12:21:47 -07:00
Wan-Teh Chang
95eedb9687 ConvertToARGB: compute buffer offsets in ptrdiff_t
Also validate crop_x, crop_y, crop_width, crop_height and make sure the
crop region stays inside the source rectangle.

Change-Id: I68748e14b21307b262d8b283147bce5ace8108d2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7904591
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-06-05 18:38:42 -07:00
Wan-Teh Chang
ccd415101d Fix int negation overflow in ConvertToARGB/I420
Fix int negation overflow in ConvertToARGB() and ConvertToI420().

Change-Id: Ia8e1f1a2994962a0372f4c31f6cc9c8972d8a954
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7904588
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-06-05 12:34:38 -07:00
Wan-Teh Chang
f722313c74 Validate int param is not INT_MIN before negating
Validate that an int parameter is not equal to INT_MIN before negating
it.

Remove redundant src_width > 32768 || src_height > 32768 checks in
callers of ScalePlane(), ScalePlane_16(), ScalePlane_12(), and
UVScale().

Change UVScale() to validate its parameters in the same way as
ScalePlane(), ScalePlane_16(), and ScalePlane_12().

Change-Id: I64e03257cf090760030c966b49c4d23e4cec25e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7902889
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-06-04 21:55:57 -07:00
Wan-Teh Chang
826ab02fcc Remove __attribute__((no_sanitize("cfi-icall")))
Remove __attribute__((no_sanitize("cfi-icall"))) from
ARGBToUVMatrixRow_AVX2(). This breaks MSVC compilation, and no other
libyuv function is marked with this attribute.

Change-Id: I2bb6a688e296dd4acff325c5bd750573a577f246
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7904777
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-06-04 18:04:11 -07:00
Wan-Teh Chang
62fffa9eeb Fix integer overflow when flipping negative height
Treat height == INT_MIN as invalid. Omit explicit height == INT_MIN
check if we disallow height < 32768.

Perform multiplications of stride in the ptrdiff_t type.

Add checks for invalid width and height to some functions.

Bug: 518806561
Change-Id: I5e39fffed7f806852a8758d4b59df919839c0a3b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891415
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-06-03 16:17:37 -07:00
Frank Barchard
e14b0e2c60 RGB565ToARGB use AVX2 instead of SSE2
Now AVX2/AVX512
ARGB4444ToI420_Opt (204 ms)
RGB565ToI420_Opt (211 ms)
ARGB1555ToI420_Opt (231 ms)
RAWToI420_Opt (197 ms)
RGB24ToI420_Opt (197 ms)

Was SSE2/AVX2
ARGB4444ToI420_Opt (276 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (332 ms)
RAWToI420_Opt (237 ms)
RGB24ToI420_Opt (232 ms)

Bug: libyuv:508639302
Change-Id: I2005189d1b6af15cb5ebef1f6d66b426fa9df8eb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891416
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-02 18:28:02 -07:00
Wan-Teh Chang
06cc67fd2f Don't ignore UVCopy() and UVCopy_16() return value
Change-Id: I9d7944da60bf73ec6a578a43540c5a247ad00417
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891418
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-06-01 16:28:08 -07:00
Frank Barchard
3c5fa6ef27 libyuv] Replace hardcoded RGB to YUV functions with Matrix variants
Removes non-matrix implementations for RGB24, RAW, RGB565, ARGB1555,
and ARGB4444 conversions. Introduces RGBToYMatrixRow, RGBToUVMatrixRow,
and equivalent functions for 16-bit and 24-bit formats. These functions
utilize a 2-step conversion internally (to ARGB, then to YUV) inside
row_common.cc for C, AVX2, and NEON, allowing the high-level
convert.cc logic to execute in a single pass using ArgbConstants.

Benchmark on Zen4
Test: libyuv_unittest --gtest_filter=*RGB*ToI420*

Was BT.601-only
ARGBToI420_Opt (115 ms)
ARGB4444ToI420_Opt (190 ms)
RGB565ToI420_Opt (194 ms)
ARGB1555ToI420_Opt (207 ms)
RGB24ToI420_Opt (143 ms)
RGBAToI420_Opt (167 ms)
28.07% ARGBToUVMatrixRow_AVX512BW
19.65% ARGBToYMatrixRow_AVX512BW
11.32% RGBAToUVRow_SSSE3
10.24% ARGB1555ToARGBRow_SSE2
 8.56% ARGB4444ToARGBRow_SSE2
 8.47% RGB565ToARGBRow_SSE2
 4.17% RGBAToYRow_AVX512BW
 4.04% RGB24ToARGBRow_AVX512BW

Now Matrix
ARGBToI420_Opt (124 ms)
ARGB4444ToI420_Opt (287 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (324 ms)
RGB24ToI420_Opt (236 ms)
RGBAToI420_Opt (126 ms)
29.74% ARGBToUVMatrixRow_AVX2
14.58% ARGB1555ToARGBRow_SSE2
12.59% RGB565ToARGBRow_SSE2
11.32% ARGB4444ToARGBRow_SSE2
 9.35% ARGBToYMatrixRow_AVX2
 8.45% RGB24ToARGBRow_SSSE3
 5.56% ARGBToYMatrixRow_AVX512BW
 1.37% ARGBToUVMatrixRow_Any_AVX2
 0.74% ARGBToYMatrixRow_Any_AVX2
 0.49% ARGB4444ToARGBRow_Any_SSE2
 0.46% RGB565ToARGBRow_Any_SSE2
 0.39% ARGB1555ToARGBRow_Any_SSE2
 0.28% RGB24ToARGBRow_Any_SSSE3
 0.11% ARGB4444ToYMatrixRow_AVX2
 0.09% RGB565ToUVMatrixRow_AVX2
 0.09% RGB565ToYMatrixRow_AVX2
 0.07% RGBToYMatrixRow_AVX2
 0.05% ARGB1555ToUVMatrixRow_AVX2
 0.04% ARGB1555ToYMatrixRow_AVX2
 0.03% RGBToUVMatrixRow_AVX2
 0.02% ARGB4444ToUVMatrixRow_AVX2

Bug: libyuv:508639302
Change-Id: I362c0cfe4c86ee1f3ffb569fa4f784b84148f11a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891045
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-01 14:04:07 -07:00
Frank Barchard
957f295ea9 [libyuv] Fix potential UV coalescing overflow in NV12ToI420
Adds a safety check to prevent signed integer overflow in the UV
plane coalescing logic within NV12ToI420. This ensures that
halfwidth * halfheight does not overflow INT_MAX, matching the Y
plane coalescing check and preventing potential undefined behavior
(signed integer overflow) which could lead to negative widths being
passed to SIMD functions.

Test: libyuv_unittest --gtest_filter=*NV12Crop*
Bug: None

CONV=6401df25-4d5d-4595-a231-f72c2c8e78df
TAG=agy
R=wtc@google.com

Change-Id: I15a51609a1e000a82f4b6958b4ada444efb1f2f4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7886824
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2026-05-29 22:53:29 -07:00
Wan-Teh Chang
d2c6dd5e6a Fix integer overflow in two convert functions
Fix integer overflow in buffer allocation size calculations in the
align_buffer_64() macro and the I422ToNV21() and
Android420ToARGBMatrix() functions.

Based on a CL autogenerated by MendIt (go/androidmendit):
https://googleplex-android-review.googlesource.com/c/platform/external/libyuv/+/39981732

Bug: 511821134
Change-Id: Ie1728c3ad337d460d9b85979489a817cc97e3bf3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7886817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-05-29 19:26:14 -07:00
Wan-Teh Chang
b7389e99be Fix integer overflows in ConvertToI420()
Validate the input parameters crop_x, crop_y, crop_width, crop_height.

Ensure all calculations of buffer sizes and offsets are performed using
the size_t or ptrdiff_t type.

Bug: 511820801
Change-Id: I43f82133c4049e2874c87d2ada147a7c3022f3c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7886366
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-05-29 19:22:01 -07:00
Frank Barchard
ef08f21f6d [libyuv] Fix security vulnerabilities in ScalePlane and ARGBAffineRow_C
This CL addresses two security findings related to integer overflows:

1. Input validation in ScalePlane, ScalePlane_16, and ScalePlane_12:
   Added checks to reject invalid dimensions (e.g. width <= 0, height
   == 0) and dimensions larger than 32768 (or smaller than -32768 for
   height). This prevents FixedDiv signed integer overflows that can
   lead to division by zero/overflow crashes (SIGFPE on x86) or
   incorrect step calculations.

2. Stride overflow in ARGBAffineRow_C:
   Casted pointer arithmetic operands to ptrdiff_t before multiplication
   (y * stride and x * 4) to ensure 64-bit calculations, preventing
   signed 32-bit integer overflow when calculating source pixel offsets.

Added unit tests to verify the input validation in ScalePlane functions.

Test: libyuv_unittest --gtest_filter=*InvalidInputs*
Test: libyuv_unittest --gtest_filter=*Scale*
Test: libyuv_unittest --gtest_filter=*TestAffine*
Bug: None

TAG=agy
CONV=0e990960-611b-4f38-94ec-24e79b66242e
R=wtc@google.com

Change-Id: I252af47a98e45dff8bb5f06308c3739c6eead741
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7886217
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-29 18:30:38 -07:00
Wan-Teh Chang
c98edcc8dc Don't coalesce rows if width*height would overflow
Audit all occurrences of "width *= height;" in the libyuv source code.
Make sure height > 0 and (ptrdiff_t)width * height <= INT_MAX before
executing width *= height.

Bug: chromium:517339758
Change-Id: I143a41c66492a6e4c48b6aa2a1c4a2ae974ceeb1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7883816
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-05-29 11:57:47 -07:00
Frank Barchard
e449eb2172 J400ToARGB switch from SSE2 to AVX2
- port for row_win
- remove unused HAS_ macros

Was C/SSE2
MSVC  J400ToARGB_Opt (1967 ms)
Clang J400ToARGB_Opt (568 ms)

Now AVX2
MSVC  J400ToARGB_Opt (411 ms)
Clang J400ToARGB_Opt (418 ms)

Test: libyuv_unittest --gtest_filter=*J400ToARGB*
Bug: libyuv:508639302

Change-Id: Ifdfb026832b708b61f55477250cc5ee52449f421
TAG=agy
CONV=186608fc-966a-4ea7-bf57-9fe07cc1383c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7877368
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
2026-05-28 21:24:32 -07:00
Wan-Teh Chang
904f562d86 Remove redundant #include <stddef.h>
"libyuv/basic_types.h" includes <stddef.h>. So it is not necessary to
include both <stddef.h> and "libyuv/basic_types.h".

Change-Id: I5a461258a3c6820d1007ac635838f910237f367f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7884381
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-05-28 17:10:22 -07:00
Wan-Teh Chang
ebe6fef903 Fix integer overflow in multiplications of stride
Audit all occurrences of "stride *" in the libyuv source tree. Ensure
that these multiplications are performed in the ptrdiff_t type.

For functions not declared in a public header (such as static
functions), prefer to declare the stride parameters (typically named
src_stride and dst_stride) and related stride local variables as
ptrdiff_t. If this is not possible, add ptrdiff_t casts to the stride
parameters in multiplications. If intptr_t or int64_t casts were used,
change them to ptrdiff_t casts.

Bug: chromium:516986556
Change-Id: I6cd8a8eb00cbb5380db828bf83e4d89ff95891f3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7882967
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-05-28 14:12:37 -07:00
Frank Barchard
9d98aaefe7 InterpolateRow for Visual C
- remove InterpolateRow_SSSE3
- optimize ARGBToUV444MatrixRow_AVX2 to use unsigned pixels

5.7x faster on AMD Zen4

Was C
TestInterpolatePlane (144 ms)
TestInterpolatePlane_16 (142 ms)

Now AVX2
TestInterpolatePlane (25 ms)
TestInterpolatePlane_16 (48 ms)

Was signed
ARGBToJ444_Opt (157 ms)
Now unsigned
ARGBToJ444_Opt (155 ms)

Bug: None
Change-Id: I903109668ff9cfedaddad1ad75411393b3226f41
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7856498
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-18 17:28:46 -07:00
Frank Barchard
9f751100d2 InterpolateRow_16_AVX2 for row_gcc
On AMD Zen4
Was C
TestInterpolatePlane_16 (143 ms)
Now AVX2
TestInterpolatePlane_16 (48 ms)

Was
I210ToI420_Opt (87 ms)
 35.60% InterpolateRow_16To8_AVX2
 31.03% Convert16To8Row_AVX512BW
 21.35% Convert16To8Row_AVX2

Now
I210ToI420_Opt (69 ms)
 37.57% Convert16To8Row_AVX512BW
 32.69% InterpolateRow_16_AVX2
  7.18% Convert16To8Row_AVX2
  5.23% InterpolateRow_16To8_AVX2

Bug: None
Change-Id: Ica9b9c5dbd847068ae076b682c487e1753d3c812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7855648
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-18 14:29:36 -07:00
Frank Barchard
cda55fcf53 Mirrow AVX2 functions for Visual C
Bug: libyuv:42280902
Change-Id: Iabbec9af3a4f4dd89294e60145823c7fc4dd6ec6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7843378
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-15 15:05:31 -07:00
Sergey Silkin
0f320a03f7 Fix linear interpolation
C interpolator applied to chroma plane at scaling NV12 on Mac/ARM used
(0x7f ^ f) which is (127-f) instead of (128-f). This resulted in changes
like 128 -> 127 when scaling flat colors and caused visually noticeable
difference.

Bug: b/465721312
Change-Id: Iecf5d2ca2a85602de4146cba7e0f64ecb4b2c1fe
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7830198
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
2026-05-13 05:33:33 -07:00
Frank Barchard
c6c8689c74 Fix I444 and J444 parameter names/order
Bug: libyuv:42280902
Change-Id: Ia2c45f2d996d071534b08381f61adf8cb8ef35b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7841767
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-12 15:30:35 -07:00
Frank Barchard
dd8b46630a ARGBToUV444MatrixRow_AVX2 intrinsics for Visual C
Was C
LibYUVConvertTest.ARGBToI444_Opt (1027 ms)

Now AVX2
LibYUVConvertTest.ARGBToI444_Opt (310 ms)

Bug: libyuv:508639302
Change-Id: I0bc7f5c5b72160d24226a98d5fddb184a004ed00
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7841655
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-12 14:19:58 -07:00
Frank Barchard
cb061d0378 Unittests use ASSERT instead of EXPECT
Bug: libyuv:508639302
Change-Id: I22c35e08f3b6db1a656192877c1fb1bf4e96d6f5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7838659
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-11 19:10:47 -07:00
Frank Barchard
e23282704f ARGBToYRow_AVX512BW preserve XMM6-XMM15 due to Windows stack alignment
Bug: 505124541
Change-Id: Id5ae539f57b314980182bec76a788e33273b2392
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7835639
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-11 13:12:22 -07:00
Frank Barchard
4b4e68b372 ABGRToJ420 call ARGBToI420Matrix
- Standardize libyuv ARGB-family (ARGB, ABGR, RGBA, BGRA) to YUV conversion by utilizing the generic MatrixRow architecture and explicit ArgbConstants.
- Consolidated ARGBToI420, ABGRToI420, BGRAToI420, and RGBAToI420 as wrappers for ARGBToI420Matrix.
- Refactored ABGRToJ420, ABGRToJ422, and ABGRToI422 to use generic matrix functions.
- Added matrix-based versions for NV21, I400, YUY2, and UYVY.
- Updated RAW and RGB24 to I420/I422/I444 dispatchers to use MatrixRow logic and explicit constants.
- Fixed parameter swap bugs in ARGBToI422, ARGBToJ422, and ABGRToJ422.
- Fixed a bug in the generic C implementation of matrix row functions ensuring all 4 channels are processed correctly for all ARGB-family formats.
- Moved kShuffleAARRGGBB in row_gcc.cc to the top of the libyuv namespace for visibility.
- Cleaned up redundant format-specific row implementations.

Bug: libyuv:42280902
Change-Id: I67ffa4c476abc0d2dcc4650510d7bda91b65988e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7830291
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-08 15:23:30 -07:00
Frank Barchard
4aacbbdfb4 Refactored RGB/RAW to YUV color conversion functions to use generic Matrix-based functions parameterized by ArgbConstants.
This consolidation standardizes conversion logic, improves code
maintainability, and provides flexible support for various color spaces
(e.g., BT.601, JPEG full
  range).

Key Modifications:
 - Function Consolidation: Refactored several high-level conversion functions into lightweight wrappers around generic Matrix variants:
     - ARGBToI420 → ARGBToI420Matrix
     - ARGBToI444 → ARGBToI444Matrix
     - ARGBToI422 → ARGBToI422Matrix
     - ARGBToNV12 → ARGBToNV12Matrix
     - RAWToJ400, RGB24ToJ400 → RGBToI400Matrix
     - RAWToI444, RAWToJ444 → RGBToI444Matrix
 - 2-Pass Conversions: Updated RGB565ToI420, ARGB1555ToI420, and ARGB4444ToI420 to utilize 2-pass conversions via RGBToI420Matrix.
 - Standardization: Refactored ARGBToNV21, ARGBToYUY2, and ARGBToUYVY to use parameterized matrix row functions (ARGBToYMatrixRow,
   ARGBToUVMatrixRow).
 - Legacy Cleanup: Replaced legacy calls to ARGBToYJRow with the parameterized ARGBToYMatrixRow in the ARGBSobelize helper.
 - Internal Integration: Included libyuv/convert_from_argb.h in planar_functions.cc and ensured all new matrix symbols are properly
   declared/exported (LIBYUV_API).

Bug: libyuv:42280902
Change-Id: Ied5fd9899767427e3a03cdcfbeaff3e9d502374a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7822033
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-06 20:02:47 -07:00
Mirko Bonadei
8773064a72 Revert "Fix rounding in scaling routines."
This reverts commit 37a848135b2b5df6177a6fd42e1d60eb0fa9539d.

Reason for revert: The CL was not ready to land.

Original change's description:
> Fix rounding in scaling routines.
>
> * before: https://screenshot.googleplex.com/3ujxU7drx8J9aVv
> * after: https://screenshot.googleplex.com/5twPmxuBUKjvVD9
> * source: https://screenshot.googleplex.com/9yevVP7URe3XfSm
>
> Bug: b/465721312
> Change-Id: I2aede005db252b2912ceef23379463f176675205
> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7813417
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Reviewed-by: richard winterton <rrwinterton@gmail.com>

Bug: b/465721312
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Change-Id: Ic3d0de4d4475942bc91fbee17d012bc1b656589f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7816864
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Bot-Commit: rubber-stamper@appspot.gserviceaccount.com <rubber-stamper@appspot.gserviceaccount.com>
2026-05-06 03:30:24 -07:00
Sergey Silkin
37a848135b Fix rounding in scaling routines.
* before: https://screenshot.googleplex.com/3ujxU7drx8J9aVv
* after: https://screenshot.googleplex.com/5twPmxuBUKjvVD9
* source: https://screenshot.googleplex.com/9yevVP7URe3XfSm

Bug: b/465721312
Change-Id: I2aede005db252b2912ceef23379463f176675205
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7813417
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-05 19:05:43 -07:00
Frank Barchard
125f151316 ARGBToNV12 use Matrix
Refactored Matrix functions (ARGBToI420Matrix, ARGBToI422Matrix, ARGBToI444Matrix and ARGBToNV12Matrix)
  and updated their CPU dispatch logic.

ARGBToNV12 clang
 68.05% ARGBToUVMatrixRow_AVX512BW
 21.04% ARGBToYMatrixRow_AVX512BW
  2.88% MergeUVRow_AVX512BW

ARGBToNV12 rowwin
 61.26% ARGBToUVMatrixRow_AVX2
 25.43% ARGBToYMatrixRow_AVX2
  3.09% MergeUVRow_AVX2

ARM on One Plus 15
 42.98% libyuv::ARGBToUVMatrixRow_SVE_SC()
 38.95% ARGBToYMatrixRow_NEON_DotProd
  2.96% MergeUVRow_NEON
  0.18% ARGBToUVMatrixRow_SVE2

ARGBToI420
 72.28% ARGBToUVMatrixRow_AVX512BW
 19.04% ARGBToYMatrixRow_AVX512BW

ARGBToI422
 77.46% ARGBToUVMatrixRow_AVX512BW
 15.55% ARGBToYMatrixRow_AVX512BW

ARGBToI444
 67.03% ARGBToYMatrixRow_AVX512BW
 24.80% ARGBToUV444MatrixRow_AVX512BW

Bug: libyuv:42280902
Change-Id: I463ebcdb70cb669a1ce1a81102b8fd2fb3943bd3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7819051
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-05 18:17:19 -07:00
Frank Barchard
561a9780e2 YUV to RGB avoid avx assist
Here are the functions flagged for mixing both SSE and AVX (or AVX-512)
instructions, which can trigger an AVX transition/assist performance
penalty:

Libyuv Functions addressed in this CL
   * I422ToARGBRow_AVX512BW
   * HalfFloatRow_SSE2

Not addressed:
   * ScaleFilterCols_SSSE3

Bug: libyuv:509681367
Change-Id: I8ced6065dfe0c516d05857086393782c8590062a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7814945
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-05 12:57:55 -07:00
Frank Barchard
5a17753597 libyuv: Optimize Convert8To8Row_NEON for 32-bit ARM
Benchmark (Convert8To8Plane 1280x720, 1000 repeats):

32-bit: 106 ms -> 44 ms
64-bit: 52 ms (unchanged)
Bug: libyuv:42280902
Change-Id: I389a482f93404984759ef6223d7d191579d3578d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7812450
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-04 13:00:21 -07:00
Frank Barchard
2143edfa7a ARGBToUVMatrixRow_NEON arm32 reimplemented for GCC
Bug: libyuv:508639302
Change-Id: Ib120373d799c66926a64c980873034be262d8848
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7810481
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
2026-05-04 11:38:45 -07:00
Frank Barchard
f2ac6db694 RAWToNV21 using SME, SVE, I8MM or Neon
Pixel 9 Now SVE2 2 pass LibYUVConvertTest.RAWToNV21_Opt (364 ms)
 31.76% libyuv::ARGBToUVMatrixRow_SVE_SC()
 30.38% RAWToARGBRow_SVE2
 26.81% ARGBToYMatrixRow_NEON_DotProd
  3.26% MergeUVRow_NEON

Was NEON 1 pass LibYUVConvertTest.RAWToJNV21_Opt (295 ms)
 44.14% RAWToYJRow_NEON
 41.91% RAWToUVJRow_NEON
  5.11% MergeUVRow_NEON

Clang on Intel Skylake clang [ OK ] LibYUVConvertTest.RAWToJNV21_Opt
(301 ms) visual c (row_win) [ OK ] LibYUVConvertTest.RAWToJNV21_Opt
(2056 ms)

clang [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (275 ms) visual c [ OK ]
LibYUVConvertTest.RAWToJNV21_Opt (365 ms)

Bug: libyuv:42280902
Change-Id: Iaba558ebe96ce6b9881ee9335ba72b8aac390cde
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7802432
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2026-04-29 13:11:04 -07:00
Wan-Teh Chang
b438739c8b Use ptrdiff_t for buffer offsets
Use ptrdiff_t instead of intptr_t for buffer offsets, such as stride,
width_temp, and src_step*.

Change-Id: I64e6701fa71ab59c94325a6dad8762d040035208
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7800070
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-04-28 18:21:42 -07:00
Wan-Teh Chang
a7849e8a5e Fix yi * src_stride overflow in ScalePlaneVertical
Fix int overflow of yi * src_stride overflow in ScalePlaneVertical(),
ScalePlaneVertical_16(), and ScalePlaneVertical_16To8() by casting the
operand src_stride to ptrdiff_t.

Adapted from the patches by Victor Miura <vmiura@google.com>.

Bug: 505814332
Change-Id: I4a4751041a213f7208b01eb18c43c9e196a36261
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7796558
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2026-04-28 12:34:12 -07:00
Wan-Teh Chang
54d40344ca No need to cast ptrdiff_t src_stride to intptr_t
ptrdiff_t is the appropriate type for a buffer offset. intptr_t is
intended for a different purpose.

Change-Id: I475c548338b61f573fb11766c24cde6d31fbbed8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7796559
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2026-04-28 09:50:46 -07:00
Frank Barchard
4afb965416 RAWToARGB use AVX512BW
Bug: libyuv:42280902
Change-Id: I7a80fd64d97b6d411316819df0fd917d609a173b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7787163
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-22 16:56:46 -07:00
Frank Barchard
bd2c4c76ec RAWToARGB AVX512VBMI
Bug: libyuv:42280902
Change-Id: I1c7f432f004079357a00515785bc524c459ed4b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7787160
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-22 14:48:29 -07:00
Frank Barchard
d445250d8b Replace RAWToY/RGB24ToY with RGBToYMatrix
Bug: libyuv:42280902
Change-Id: I6ddebd492036c416550fc045eb39493dea73246b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7784094
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-21 17:11:14 -07:00
Frank Barchard
81f698829b Add RGBToNV21Matrix function
- implement wrappers with RAW, RGB24, NV21 and JNV21 to call it.

Zen5
Was [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1146 ms)
Now [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1446 ms)
reason - the new code uses 1 pass for RAWToY but 2 pass for RAWToARGB,ARGBToUV.  needs 1 RGBToUV

Bug: libyuv:42280902
Change-Id: Ife6fbed0829484045409e6d42b85cec1d1fd6052
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7780026
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-20 18:03:34 -07:00
Frank Barchard
9f13b2814d add RGBToYMatrixRow_AVX2
Adds RGBToYMatrixRow_AVX2 which reads 24 bit RGB values by reading 3 vectors instead of 4 and permutes them into 4 ARGB vectors before conversion.
Also adds RGBToYMatrixRow_Opt and RGBToYMatrixRow_2Step_Opt to convert_argb_test.cc to benchmark and compare the direct AVX2 conversion vs a 2-step approach.

./libyuv_test '--gunit_filter=*RAWToJ400_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1

AMD Zen 5
Was LibYUVConvertTest.RAWToJ400_Opt (757 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (699 ms)

Intel Skylake
Was LibYUVConvertTest.RAWToJ400_Opt (1705 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (1426 ms)

Bug: 477295731
Change-Id: I29866baf4ad5fe7a3725e4a01f2fe24649510a7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7777325
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-20 12:52:44 -07:00
Frank Barchard
ddc6764d13 ARGBToUVMatrixRow_RVV replace vlseg8 with vlseg4,
implementing horizontal paired adds and accumulation to improve
performance on SiFive x280, and fixes the remainder logic to use valid
vlseg4 loads. Adds TestARGBToUVRow_Any to test odd-width remainder
handling.

Also fixes a build break for non-RVV compilations by ensuring all RVV
functions and their closing cplusplus braces are correctly wrapped in
#if !defined(LIBYUV_DISABLE_RVV).

Also adds NV12ToNV21 as a macro alias for NV21ToNV12 in
planar_functions.h, as the conversion is bidirectional (swapping byte
pairs in the interleaved chroma plane). (Patch from
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7762904)

Bug: libyuv:42280902
Change-Id: If2d6cbb3e232d63d43e32aba33fa9b2eee8190e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7772164
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-17 15:04:45 -07:00
Frank Barchard
ace7c4573c Add ARGBToUV444MatrixRow_RVV, ARGBToUVMatrixRow_RVV, and wrappers
This change implements ARGBToUV444MatrixRow_RVV, ARGBToUVMatrixRow_RVV,
and their wrappers (ARGBToUVRow_RVV, ARGBToUVJRow_RVV, etc.) using RVV
intrinsics, mirroring the NEON/AVX2 designs. It wires them into the
build and dispatch systems.

LIBYUV_RVV_HAS_TUPLE_TYPE is always true on new compilers. This macro
has been removed, assuming it is true everywhere, reducing the amount of
code in row_rvv.cc, scale_rvv.cc, and row.h.

Tested via: ~/bin/doyuv3v && ~/bin/runyuv3v TestARGBToI444Matrix
~/bin/doyuv3av

Bug: libyuv:42280902
Change-Id: I36d305386b297d69023c068aa9c62ab6b2ad039c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7769956
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-16 20:52:43 -07:00
Chema Gonzalez
dec8272138 Fix typo
Change-Id: I4dea1bcacc7d10dd2db74f4b221db42e2deade83
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7762903
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-16 14:27:40 -07:00
Frank Barchard
94644361b4 row_win.cc rewrite into intrinsics
- remove inline asm which was only for 32 bit
- add ARGBToYMatrixRow_AVX2
- add gn flag libyuv_enable_rowwin=true

Example of building with GN and Ninja:

Without the new flag:
  gn gen out/Release "--args=is_debug=false"
  ninja -C out/Release

With the new flag:
 gn gen out/Release "--args=is_debug=false libyuv_enable_rowwin=true"
 ninja -C out/Release

Bug: libyuv:42280806, 477295731, libyuv:42280902, libyuv:439628764
R=​dalecurtis@chromium.org, rrwinterton@gmail.com

Change-Id: I451bf814622fba690005c02fbf5816819c6a08c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7765790
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-15 19:53:16 -07:00
Frank Barchard
e034c41661 Port ARGBToUVMatrixRow from AVX2 to AVX512BW
Benchmark on Icelake Xeon
Now AVX512BW:
[       OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms)
Was AVX2:
[       OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms)

- Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`.
- Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions.
- Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`.
- Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`.
- Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately.
- Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support.
- Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment.

Bug: 477295731
R=dalecurtis@chromium.org, rrwinterton@gmail.com

Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-14 16:15:31 -07:00
Frank Barchard
cbc64c353c Port ARGBToYRow_AVX2 usages to dynamically use ARGBToYRow_AVX512BW
I have successfully ported the usage of ARGBToYRow_AVX2 to dynamically detect and utilize ARGBToYRow_AVX512BW when available.

  Here's a summary of the changes:
   1. Source Modifications: In both source/convert.cc and source/convert_from_argb.cc, I searched for all references where ARGBToYRow_AVX2 was
      being conditionally used (which operates on 32 pixels).
   2. AVX512BW Detection: Immediately following those blocks, I injected a new check for kCpuHasAVX512BW. If the CPU flag is present, the logic
      now utilizes ARGBToYRow_Any_AVX512BW by default, falling back to the fully aligned ARGBToYRow_AVX512BW when the width is aligned to 64
      bytes.
   3. Profiling: After building and compiling the tests (doyuv3x), I validated the change using perfyuv3 ARGBToNV12_Opt | cat. The test
      successfully executed and the performance profile indicated that ARGBToYRow_AVX512BW successfully executed (taking up ~18% of CPU cycles,
      replacing the previous AVX2 specific instruction overhead for the Y row extraction).

  The HAS_ARGBTOYROW_AVX512BW macro implementation now fully supports all AVX2 conversion paths to utilize AVX512BW when the system processor
  flags allow it!

R=richard, rrwinterton@gmail.com

Change-Id: Iad811e12d301f5621e6f6d039105420861ade43e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7760779
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-14 11:42:59 -07:00