387 Commits

Author SHA1 Message Date
Frank Barchard
03d8b0990b I420ToRAW and I420ToRGB24 1 pass AVX2
Replaced the 2-pass conversion (I420 -> ARGB -> RGB24/RAW) with a
    highly optimized 1-pass AVX2 implementation. This avoids intermediate
    stack buffering and significantly reduces memory bandwidth.

    Implemented `I422ToRGB24Row_AVX2` in:
    - `row_gcc.cc`: Inline assembly for GCC/Clang.
    - `row_win.cc`: C++ intrinsics for MSVC (also verified with Clang).

    Optimized the width alignment requirement: changed from 32-pixel to
    16-pixel alignment in `convert_argb.cc` and `row_any.cc`. This allows
    the optimized AVX2 path to be used for more common video resolutions.

    Performance results (1080p, 100 iterations):
    - C Reference: ~18.5 ms
    - AVX2 2-Pass (Baseline): ~412 us (~45x speedup)
    - AVX2 1-Pass (GCC Assembly): ~411 us (~s45x speedup)
    - AVX2 1-Pass (Intrinsics): ~365 us (~50x speedup, 11% faster than asm)

    Test: libyuv_unittest --gunit_filter=*I420ToRGB24*
    Test: libyuv_unittest --gunit_filter=*I420ToRAW*

Bug: 42280902
Change-Id: I07c0505c95410ea16a6218c858844791a11ef073
2026-06-08 19:33:58 -07:00
Frank Barchard
3bdb3b94ca I420ToRAW use 2 step AVX512
On Icelake
Was AVX2
I420ToRAW_Opt (283 ms)
  67.55%  I422ToARGBRow_AVX2
  26.46%  ARGBToRGB24Row_AVX2

Now AVX512VBMI
I420ToRAW_Opt (238 ms)
  73.08%  I422ToARGBRow_AVX512BW
  21.59%  ARGBToRGB24Row_AVX512VBMI

Bug: 42280902
Change-Id: I9d4d21faed30c529a5e593819f103be115709f37
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7909924
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-08 14:32:13 -07:00
Frank Barchard
4be798d7c5 BGRAToI420 use BgraConstants for a direct conversion using AVX512BW
row win (msvc)
Was C/SSSE3
BGRAToARGB_Opt (594 ms)
BGRAToARGB_Endswap_Opt (609 ms)
BGRAToI420_Opt (122 ms)

Now AVX2
BGRAToARGB_Opt (100 ms)
BGRAToARGB_Endswap_Opt (99 ms)
BGRAToI420_Opt (115 ms)

Clang/GCC AVX512BW
BGRAToARGB_Opt (86 ms)
BGRAToARGB_Endswap_Opt (91 ms)
BGRAToI420_Opt (110 ms)


Bug: 42280902
Change-Id: I52cb2b0cacea8f2f0b138ec3cc521185dbef8595
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7905821
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-08 12:21:47 -07:00
Frank Barchard
e14b0e2c60 RGB565ToARGB use AVX2 instead of SSE2
Now AVX2/AVX512
ARGB4444ToI420_Opt (204 ms)
RGB565ToI420_Opt (211 ms)
ARGB1555ToI420_Opt (231 ms)
RAWToI420_Opt (197 ms)
RGB24ToI420_Opt (197 ms)

Was SSE2/AVX2
ARGB4444ToI420_Opt (276 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (332 ms)
RAWToI420_Opt (237 ms)
RGB24ToI420_Opt (232 ms)

Bug: libyuv:508639302
Change-Id: I2005189d1b6af15cb5ebef1f6d66b426fa9df8eb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891416
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-06-02 18:28:02 -07:00
Frank Barchard
3c5fa6ef27 libyuv] Replace hardcoded RGB to YUV functions with Matrix variants
Removes non-matrix implementations for RGB24, RAW, RGB565, ARGB1555,
and ARGB4444 conversions. Introduces RGBToYMatrixRow, RGBToUVMatrixRow,
and equivalent functions for 16-bit and 24-bit formats. These functions
utilize a 2-step conversion internally (to ARGB, then to YUV) inside
row_common.cc for C, AVX2, and NEON, allowing the high-level
convert.cc logic to execute in a single pass using ArgbConstants.

Benchmark on Zen4
Test: libyuv_unittest --gtest_filter=*RGB*ToI420*

Was BT.601-only
ARGBToI420_Opt (115 ms)
ARGB4444ToI420_Opt (190 ms)
RGB565ToI420_Opt (194 ms)
ARGB1555ToI420_Opt (207 ms)
RGB24ToI420_Opt (143 ms)
RGBAToI420_Opt (167 ms)
28.07% ARGBToUVMatrixRow_AVX512BW
19.65% ARGBToYMatrixRow_AVX512BW
11.32% RGBAToUVRow_SSSE3
10.24% ARGB1555ToARGBRow_SSE2
 8.56% ARGB4444ToARGBRow_SSE2
 8.47% RGB565ToARGBRow_SSE2
 4.17% RGBAToYRow_AVX512BW
 4.04% RGB24ToARGBRow_AVX512BW

Now Matrix
ARGBToI420_Opt (124 ms)
ARGB4444ToI420_Opt (287 ms)
RGB565ToI420_Opt (292 ms)
ARGB1555ToI420_Opt (324 ms)
RGB24ToI420_Opt (236 ms)
RGBAToI420_Opt (126 ms)
29.74% ARGBToUVMatrixRow_AVX2
14.58% ARGB1555ToARGBRow_SSE2
12.59% RGB565ToARGBRow_SSE2
11.32% ARGB4444ToARGBRow_SSE2
 9.35% ARGBToYMatrixRow_AVX2
 8.45% RGB24ToARGBRow_SSSE3
 5.56% ARGBToYMatrixRow_AVX512BW
 1.37% ARGBToUVMatrixRow_Any_AVX2
 0.74% ARGBToYMatrixRow_Any_AVX2
 0.49% ARGB4444ToARGBRow_Any_SSE2
 0.46% RGB565ToARGBRow_Any_SSE2
 0.39% ARGB1555ToARGBRow_Any_SSE2
 0.28% RGB24ToARGBRow_Any_SSSE3
 0.11% ARGB4444ToYMatrixRow_AVX2
 0.09% RGB565ToUVMatrixRow_AVX2
 0.09% RGB565ToYMatrixRow_AVX2
 0.07% RGBToYMatrixRow_AVX2
 0.05% ARGB1555ToUVMatrixRow_AVX2
 0.04% ARGB1555ToYMatrixRow_AVX2
 0.03% RGBToUVMatrixRow_AVX2
 0.02% ARGB4444ToUVMatrixRow_AVX2

Bug: libyuv:508639302
Change-Id: I362c0cfe4c86ee1f3ffb569fa4f784b84148f11a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7891045
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-06-01 14:04:07 -07:00
Frank Barchard
ef08f21f6d [libyuv] Fix security vulnerabilities in ScalePlane and ARGBAffineRow_C
This CL addresses two security findings related to integer overflows:

1. Input validation in ScalePlane, ScalePlane_16, and ScalePlane_12:
   Added checks to reject invalid dimensions (e.g. width <= 0, height
   == 0) and dimensions larger than 32768 (or smaller than -32768 for
   height). This prevents FixedDiv signed integer overflows that can
   lead to division by zero/overflow crashes (SIGFPE on x86) or
   incorrect step calculations.

2. Stride overflow in ARGBAffineRow_C:
   Casted pointer arithmetic operands to ptrdiff_t before multiplication
   (y * stride and x * 4) to ensure 64-bit calculations, preventing
   signed 32-bit integer overflow when calculating source pixel offsets.

Added unit tests to verify the input validation in ScalePlane functions.

Test: libyuv_unittest --gtest_filter=*InvalidInputs*
Test: libyuv_unittest --gtest_filter=*Scale*
Test: libyuv_unittest --gtest_filter=*TestAffine*
Bug: None

TAG=agy
CONV=0e990960-611b-4f38-94ec-24e79b66242e
R=wtc@google.com

Change-Id: I252af47a98e45dff8bb5f06308c3739c6eead741
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7886217
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-29 18:30:38 -07:00
Frank Barchard
9f751100d2 InterpolateRow_16_AVX2 for row_gcc
On AMD Zen4
Was C
TestInterpolatePlane_16 (143 ms)
Now AVX2
TestInterpolatePlane_16 (48 ms)

Was
I210ToI420_Opt (87 ms)
 35.60% InterpolateRow_16To8_AVX2
 31.03% Convert16To8Row_AVX512BW
 21.35% Convert16To8Row_AVX2

Now
I210ToI420_Opt (69 ms)
 37.57% Convert16To8Row_AVX512BW
 32.69% InterpolateRow_16_AVX2
  7.18% Convert16To8Row_AVX2
  5.23% InterpolateRow_16To8_AVX2

Bug: None
Change-Id: Ica9b9c5dbd847068ae076b682c487e1753d3c812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7855648
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-18 14:29:36 -07:00
Frank Barchard
cda55fcf53 Mirrow AVX2 functions for Visual C
Bug: libyuv:42280902
Change-Id: Iabbec9af3a4f4dd89294e60145823c7fc4dd6ec6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7843378
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-05-15 15:05:31 -07:00
Frank Barchard
4b4e68b372 ABGRToJ420 call ARGBToI420Matrix
- Standardize libyuv ARGB-family (ARGB, ABGR, RGBA, BGRA) to YUV conversion by utilizing the generic MatrixRow architecture and explicit ArgbConstants.
- Consolidated ARGBToI420, ABGRToI420, BGRAToI420, and RGBAToI420 as wrappers for ARGBToI420Matrix.
- Refactored ABGRToJ420, ABGRToJ422, and ABGRToI422 to use generic matrix functions.
- Added matrix-based versions for NV21, I400, YUY2, and UYVY.
- Updated RAW and RGB24 to I420/I422/I444 dispatchers to use MatrixRow logic and explicit constants.
- Fixed parameter swap bugs in ARGBToI422, ARGBToJ422, and ABGRToJ422.
- Fixed a bug in the generic C implementation of matrix row functions ensuring all 4 channels are processed correctly for all ARGB-family formats.
- Moved kShuffleAARRGGBB in row_gcc.cc to the top of the libyuv namespace for visibility.
- Cleaned up redundant format-specific row implementations.

Bug: libyuv:42280902
Change-Id: I67ffa4c476abc0d2dcc4650510d7bda91b65988e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7830291
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-05-08 15:23:30 -07:00
Frank Barchard
d445250d8b Replace RAWToY/RGB24ToY with RGBToYMatrix
Bug: libyuv:42280902
Change-Id: I6ddebd492036c416550fc045eb39493dea73246b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7784094
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2026-04-21 17:11:14 -07:00
Frank Barchard
81f698829b Add RGBToNV21Matrix function
- implement wrappers with RAW, RGB24, NV21 and JNV21 to call it.

Zen5
Was [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1146 ms)
Now [       OK ] LibYUVConvertTest.RAWToJNV21_Opt (1446 ms)
reason - the new code uses 1 pass for RAWToY but 2 pass for RAWToARGB,ARGBToUV.  needs 1 RGBToUV

Bug: libyuv:42280902
Change-Id: Ife6fbed0829484045409e6d42b85cec1d1fd6052
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7780026
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2026-04-20 18:03:34 -07:00
Frank Barchard
9f13b2814d add RGBToYMatrixRow_AVX2
Adds RGBToYMatrixRow_AVX2 which reads 24 bit RGB values by reading 3 vectors instead of 4 and permutes them into 4 ARGB vectors before conversion.
Also adds RGBToYMatrixRow_Opt and RGBToYMatrixRow_2Step_Opt to convert_argb_test.cc to benchmark and compare the direct AVX2 conversion vs a 2-step approach.

./libyuv_test '--gunit_filter=*RAWToJ400_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1

AMD Zen 5
Was LibYUVConvertTest.RAWToJ400_Opt (757 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (699 ms)

Intel Skylake
Was LibYUVConvertTest.RAWToJ400_Opt (1705 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (1426 ms)

Bug: 477295731
Change-Id: I29866baf4ad5fe7a3725e4a01f2fe24649510a7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7777325
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2026-04-20 12:52:44 -07:00
Dale Curtis
1170363ce5 Add Gemini implementation for NEON32 RGB to YUV matrix operations
These are about 25% faster than the C versions.

Bug: libyuv:42280902

Change-Id: I8b298670ee5f3ed5db35527fc41d6d9a51b020a1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7573682
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
2026-03-23 16:30:44 -07:00
Dale Curtis
b1cacfb38f Unify X86/X64 versions of ARGBToI4xxMatrix functions
Change-Id: Iead13414414543e5f10ba9ba47a6ceaeb3113dee
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7562443
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2026-03-18 16:27:07 -07:00
Dale Curtis
2c21d57319 Add ABGR versions of the ArgbConstants structures
This allows for ABGR conversion using the same methods

Bug: libyuv:42280902
Change-Id: I5566e3150b30573a2326a900ce31ab095f8935f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564316
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2026-03-17 17:28:51 -07:00
Dale Curtis
30809ff64a Add ARGBToI4xxMatrix variants
This was implemented by Gemini followed by manual review and some
tweaking for style. The 601 and JPEG constants are fully verified
against the existing non-matrix implementations. On x86 the C-only
versions appear to be about 25% slower than the optimized ones.

Bug: libyuv:42280902
Change-Id: Ia5b7cb499bad5c76faec53f36086ebb18f2b530f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512030
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
2026-03-04 10:55:06 -08:00
Frank Barchard
6f729fbe65 ARGBToUV SSE use average of 4 pixels
- Was using avgb twice for non-exact and C for exact.

On Skylake Xeon:

Now SSE3
ARGBToJ420_Opt (326 ms)

Was
Exact C
ARGBToJ420_Opt (871 ms)
Not exact AVX2
ARGBToJ420_Opt (237 ms)
Not exact SSSE3
ARGBToJ420_Opt (312 ms)

Bug: 381138208
Change-Id: I6d1081bb52e36f06736c0c6575fa82bb2268629b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6629605
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ben Weiss <bweiss@google.com>
2025-06-17 11:55:27 -07:00
Frank Barchard
c060118bea ARGBToJ444 use 256 for fixed point scale UV
- use negative coefficients for UV to allow -128
- change shift to truncate instead of round for UV
- adapt all row_gcc RGB to UV into matrix functions
- add -DLIBYUV_ENABLE_ROWWIN to allow clang on Windows to use row_win.cc

Bug: 381138208
Change-Id: I6016062c859faf147a8a2cdea6c09976cbf2963c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6277710
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-02-27 13:04:15 -08:00
Frank Barchard
5257ba4db0 Apply clang format
Bug: None
Change-Id: Ibd694d0351966a2b5812445de74bbced9c881a79
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6302317
Reviewed-by: James Zern <jzern@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-02-25 11:39:19 -08:00
Frank Barchard
61354d2671 ARGBToUV Matrix for AVX2 and SSSE3
- Round before shifting to 8 bit to match NEON
  - RAWToARGB use unaligned loads and port to AVX2

Was C/SSSE/AVX2
ARGBToI444_Opt (343 ms)
ARGBToJ444_Opt (677 ms)
RAWToI444_Opt (405 ms)
RAWToJ444_Opt (803 ms)

Now AVX2
ARGBToI444_Opt (283 ms)
ARGBToJ444_Opt (284 ms)
RAWToI444_Opt (316 ms)
RAWToJ444_Opt (339 ms)

Profile Now AVX2
  38.31%  ARGBToUVJ444Row_AVX2
  32.31%  RAWToARGBRow_AVX2
  23.99%  ARGBToYJRow_AVX2

Profile Was C/SSSE/AVX2
    73.15%  ARGBToUVJ444Row_C
    15.74%  RAWToARGBRow_SSSE3
     8.87%  ARGBToYJRow_AVX2

Bug: 381138208
Change-Id: I696b2d83435bc985aa38df831e01ff1a658da56e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6231592
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Ben Weiss <bweiss@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-02-10 18:36:18 -08:00
Frank Barchard
d32d19ccf2 UV subsample on ARM use rounding average of 4 pixels
Performance on Samsung S22 Exynos (SVE2+I8MM+DOTPROD+Neon)
AArch64
ARGBToI400_Opt (168 ms)
ARGBToJ400_Opt (103 ms)
ABGRToJ400_Opt (81 ms)
RGBAToJ400_Opt (82 ms)
RGB24ToJ400_Opt (176 ms)
RAWToJ400_Opt (176 ms)
ABGRToI420_Opt (258 ms)
ARGBToI420_Opt (259 ms)
ARGBToI422_Opt (403 ms)
ARGBToI444_Opt (213 ms)
ARGBToJ420_Opt (257 ms)
ARGBToJ422_Opt (403 ms)
ARGBToJ444_Opt (214 ms)
ABGRToJ420_Opt (255 ms)
ABGRToJ422_Opt (399 ms)
ARGB4444ToI420_Opt (285 ms)
RGB565ToI420_Opt (316 ms)
ARGB1555ToI420_Opt (324 ms)
BGRAToI420_Opt (260 ms)
RAWToI420_Opt (303 ms)
RAWToI444_Opt (303 ms)
RAWToJ420_Opt (335 ms)
RAWToJ444_Opt (308 ms)
RGB24ToI420_Opt (372 ms)
RGB24ToJ420_Opt (365 ms)
RGBAToI420_Opt (259 ms)

AArch32 (Neon)
ARGBToI400_Opt (496 ms)
ARGBToJ400_Opt (478 ms)
ABGRToJ400_Opt (483 ms)
RGBAToJ400_Opt (493 ms)
RGB24ToJ400_Opt (343 ms)
RAWToJ400_Opt (341 ms)
ABGRToI420_Opt (993 ms)
ARGBToI420_Opt (992 ms)
ARGBToI422_Opt (1503 ms)
ARGBToI444_Opt (1257 ms)
ARGBToJ420_Opt (1006 ms)
ARGBToJ422_Opt (1521 ms)
ARGBToJ444_Opt (1267 ms)
ABGRToJ420_Opt (1002 ms)
ABGRToJ422_Opt (1504 ms)
ARGB4444ToI420_Opt (1180 ms)
RGB565ToI420_Opt (1112 ms)
ARGB1555ToI420_Opt (1115 ms)
BGRAToI420_Opt (993 ms)
RAWToI420_Opt (703 ms)
RAWToI444_Opt (1717 ms)
RAWToJ420_Opt (704 ms)
RAWToJ444_Opt (1739 ms)
RGB24ToI420_Opt (703 ms)
RGB24ToJ420_Opt (703 ms)
RGBAToI420_Opt (993 ms)

Bug: 381138208
Change-Id: I33728d5237f357362b0bfc509a9ebe6fe46f45d4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6228987
Reviewed-by: Ben Weiss <bweiss@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-02-04 15:19:19 -08:00
Frank Barchard
c1bac9e6a5 RAWToJ444 and ARGBToJ444
- ARGBToJ444 implements ARGBToUVJ444Row_C
- RAWToJ444 implemented as 2 steps - RAWToARGB and ARGBToJ444

libyuv_test '--gunit_filter=*R*To?444_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
(with bit exact off)

Samsung S23
RAWToJ444_Opt (437 ms)
ARGBToJ444_Opt (337 ms)
ARGBToI444_Opt (196 ms)

Skylake Xeon
RAWToJ444_Opt (1699 ms)
ARGBToJ444_Opt (1559 ms)
ARGBToI444_Opt (346 ms)

Bug: 390247964
Change-Id: Id1b1b45a5e4512ab50830aadf62f780fbe631575
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6207845
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-01-29 15:18:38 -08:00
Frank Barchard
26277baf96 J420ToI420 using planar 8 bit scaling
- Add Convert8To8Plane which scale and add 8 bit values allowing full range
  YUV to be converted to limited range YUV

libyuv_test '--gunit_filter=*J420ToI420*' --gunit_also_run_disabled_tests --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1

Samsung S23
J420ToI420_Opt (45 ms)
I420ToI420_Opt (37 ms)

Skylake
J420ToI420_Opt (596 ms)
I420ToI420_Opt (99 ms)

Bug: 381327032
Change-Id: I380c3fa783491f2e3727af28b0ea9ce16d2bb8a4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6182631
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-01-22 02:50:24 -08:00
George Steed
02c6e8baca Change ARGBMultiplyRow_C to match Neon
The existing behaviour does not round correctly in all cases, so adjust
it to match the existing Neon implementation.

Update the tests to require bit-exactness and disable other
implementations that do not round correctly.

Change-Id: Ie790fb4b4805b555d74d689d83802e1dd4f33df5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5869115
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-09-23 21:48:33 +00:00
Wan-Teh Chang
e462de319c Fix -Wundef warnings
Change-Id: I803b70f66ca938665ba39b961bdb31625c6bc503
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5758156
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-08-02 17:39:59 +00:00
Bruce Lai
ec2e9ca000 [RVV] Support AR64ToAB64 and RGBA-family color conversions
Add scalar code for AR64ToAB64, ARGBToRGBA, ARGBToBGRA, ARGBToABGR, RGBAToARGB, BGRAToARGB, and ABGRToARGB.
They are originally implemented by ARGBShffle.
This CL independetly implements them, and only enables for risc-v now.
This CL also add RVV implementation for `RGBA-family <-> RGBA-family` color conversions.

* Run on SiFive internal FPGA(VLEN=128):

Test Case	Speedup
AR64ToAB64_Opt  x4.6
ARGBToRGBA_Opt  x6
ARGBToBGRA_Opt  x6
ARGBToABGR_Opt  x6
RGBAToARGB_Opt  x6

Change-Id: Ie0630901046084aa259699fcdeccc64170d7103f
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4797451
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-09-05 22:44:48 +00:00
Wan-Teh Chang
a8a37a25c9 Eliminate a common subexpression in YPixel()
Save the value of a common subexpression in a local variable.

Change-Id: I5724fcf341900cb2a65eb37b505194b8d3c3da9a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4735651
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2023-07-31 20:53:54 +00:00
Frank Barchard
a366ad714a ARGBAttenuate use (a + b + 255) >> 8
- Makes ARM and Intel match and fixes some off by 1 cases
- Add ARGBToUV444MatrixRow_NEON
- Add ConvertFP16ToFP32Column_NEON
- scale_rvv fix intinsic build error
- disable row_win version of ARGBAttenuate/Unattenuate

Bug: libyuv:936, libyuv:956
Change-Id: Ied99aaad3a11a8eb69212b628c58f86ec0723c38
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4617013
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-06-16 21:37:53 +00:00
Frank Barchard
157b153b60 Fix tidy warning that uint32_t dither4 should not be const
- Remove const from uint32_t dither4 parameter to fix clang-tidy warning
- Apply clang format
- Bump version
- Remove unused MMI source; superceded by MSA

Bug: None
Change-Id: Id49991db25bca4e99590b415312542d917471c62
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4581882
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-06-02 00:42:02 +00:00
Darren Hsieh
964d963afb Enable I422To{ARGB,RGBA,RGB24}Row_RVV
Run on SiFive internal FPGA:

I422ToARGB_Opt (~10x vs scalar)
I422ToRGBA_Opt (~10x vs scalar)
I420ToRGB24_Opt (~8x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

This CL manually sets rounding mode,
since we use fixed-point vector narrowing clip.
There is no definition about default value for fixed-point rounding mode.
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#38-vector-fixed-point-rounding-mode-register-vxrm
The behavior could be different on differet paltforms. To avoid unexpected behavior, we set rounding mode manually.

Change-Id: I90f0dcb90c37f7da7caab8eb1df6c9c7a3c874a8
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4512373
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-10 00:29:20 +00:00
Frank Barchard
cf21b5ea5c Rename variables to match layout of ABGR
Bug: None
Change-Id: Ia1d596b6e108307fe042a03c34162b25152293d4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4461967
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-26 16:57:33 +00:00
Frank Barchard
3f219a3501 GCC warning fix for MT2T
- Fix redundent assignment compile warning in GCC
- Apply clang-format
- Bump version to 1863

Bug: libyuv:955
Change-Id: If2b6588cd5a7f068a1745fe7763e90caa7277101
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4344729
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2023-03-16 06:57:20 +00:00
Justin Green
76468711d5 M2T2 Unpack fixes
Fix the algorithm for unpacking the lower 2 bits of M2T2 pixels.

Bug: b:258474032
Change-Id: Iea1d63f26e3f127a70ead26bc04ea3d939e793e3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4337978
Commit-Queue: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-03-14 14:59:26 +00:00
Sergio Garcia Murillo
f8626a7224 Add 10 bit rotate methods.
This initial implementation is based on current unoptimized code in webrtc using just plain for loops.

Bug: libyuv:949
Change-Id: Ic87ee49c3a0b62edbaaa4255c263c1f7be4ea02b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110782
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-04 21:10:01 +00:00
Frank Barchard
3abd6f36b6 Casting for scale functions
- MT2T support for source strides added, but only works for positive values.
- Reduced casting in row_common - one cast per assignment.
- scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t

Bug: libyuv:948, b/257266635, b/262468594
Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ernest Hua <ernesthua@google.com>
2022-12-15 22:34:22 +00:00
Frank Barchard
610e0cdead MT2T Warning fixes for fuchsia
Bug: b/258474032, b/257266635
Change-Id: Ic5cbbc60e2e1463361e359a2fe3e97976c1ea929
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4081348
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-12-06 19:54:40 +00:00
Frank Barchard
2d2cee418a Add Detile_16 planar function for 10 bit MT2T format
- Neon and SSE2
- Any for odd widths

Pixel 2 little core AArch32 build
C
TestDetilePlane_16 (1275 ms)
TestDetilePlane (1203 ms)
Neon
TestDetilePlane_16 (693 ms)
TestDetilePlane (660 ms)

Bug: b/258474032
Change-Id: Idbd09c5e9324e4deef5f1d54090d4b63cc7db812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4031848
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-17 02:47:57 +00:00
Frank Barchard
00950840d1 YUY2ToNV12 using YUY2ToY and YUY2ToNVUV
- Optimized YUY2ToNV12 that reduces it from 3 steps to 2 steps
  - Was SplitUV, memcpy Y, InterpolateUV
  - Now YUY2ToY, YUY2ToNVUV
- rollback LIBYUV_UNLIMITED_DATA

3840x2160 1000 iterations:

Pixel 2 Cortex A73
Was YUY2ToNV12_Opt (6515 ms)
Now YUY2ToNV12_Opt (3350 ms)

AB7 Mediatek P35 Cortex A53
Was YUY2ToNV12_Opt (6435 ms)
Now YUY2ToNV12_Opt (3301 ms)

Skylake AVX2 x64
Was YUY2ToNV12_Opt (1872 ms)
Now YUY2ToNV12_Opt (1657 ms)

SSE2 x64
Was YUY2ToNV12_Opt (2008 ms)
Now YUY2ToNV12_Opt (1691 ms)

Windows Skylake AVX2 32 bit x86
Was YUY2ToNV12_Opt (2161 ms)
Now YUY2ToNV12_Opt (1628 ms)

Bug: libyuv:943
Change-Id: I6c2ba2ae765413426baf770b837de114f808f6d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3929843
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-30 22:41:21 +00:00
Frank Barchard
b9adaef113 Enable unlimited data for YUV to RGB
- Provide LIBYUV_LIMITED_DATA macro for backwards compatiblity

Bug: b/474156256
Change-Id: I5d5d7fb640d51ae3c5ad363f2a28c8bfbd3048a5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3912081
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-23 12:51:37 +00:00
Frank Barchard
f9fda6e7d8 Fix shift amount for SSSE3 assembly for I012 format conversions
Bug: libyuv:938, libyuv:942
Change-Id: I6fb6e7e17fa941785e398bc630f465baf72fcabd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906091
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 23:07:53 +00:00
Frank Barchard
8fc02134c8 10/12 bit YUV replicate upper bits to low bits before converting to RGB
- shift high bits of 10 and 12 bit into lower bits

Bug: libyuv:941, libyuv:942,
Change-Id: I14381dbf226ef27dcce06893ea88860835639baa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906085
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 20:56:43 +00:00
Frank Barchard
248172e2ba I422ToRGB24, I422ToRAW, I422ToRGB24MatrixFilter conversion functions added.
- YUV to RGB use linear for first and last row.
- add assert(yuvconstants)
- rename pointers to match row functions.
- use macros that match row functions.
- use 12 bit upsampler for conversions of 10 and 12 bits

Cortex A53 AArch32
I420ToRGB24_Opt (3627 ms)
I422ToRGB24_Opt (4099 ms)
I444ToRGB24_Opt (4186 ms)
I420ToRGB24Filter_Opt (5451 ms)
I422ToRGB24Filter_Opt (5430 ms)

AVX2
Was I420ToRGB24Filter_Opt (583 ms)
Now I420ToRGB24Filter_Opt (560 ms)

Neon Cortex A7
Was I420ToRGB24Filter_Opt (5447 ms)
Now I420ToRGB24Filter_Opt (5439 ms)

Bug: libyuv:938


Change-Id: I1731f2dd591073ae11a756f06574103ba0f803c7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906082
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-20 02:00:52 +00:00
Frank Barchard
f71c83552d I420ToRGB24MatrixFilter function added
- Implemented as 3 steps: Upsample UV to 4:4:4, I444ToARGB, ARGBToRGB24
- Fix some build warnings for missing prototypes.

Pixel 4
I420ToRGB24_Opt (743 ms)
I420ToRGB24Filter_Opt (1331 ms)

Windows with skylake xeon:
x86 32 bit
I420ToRGB24_Opt (387 ms)
I420ToRGB24Filter_Opt (571 ms)
x64 64 bit
I420ToRGB24_Opt (384 ms)
I420ToRGB24Filter_Opt (582 ms)


Bug: libyuv:938, libyuv:830
Change-Id: Ie27f70816ec084437014f8a1c630ae011ee2348c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3900298
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-16 19:46:47 +00:00
Frank Barchard
65e7c9d570 MM21ToYUY2 and ABGRToJ420 conversion
MM21 to YUY2 use zip1 for performance

Cortex A510
Was MM21ToYUY2 (612 ms)
Now MM21ToYUY2 (573 ms)

Prefetches help Cortex A53
Was MM21ToYUY2 (4998 ms)
Now MM21ToYUY2 (1900 ms)

Pixel 4 Cortex A76
Was MM21ToYUY2 (215 ms)
Now MM21ToYUY2 (173 ms)

ABGRToJ420
- NEON, SSSE3 and AVX2 row functions
- J400, J420 and J422 formats.
- Added AVX2 for UV on ARGBToJ420.  Was SSSE3

Same code/performance as ARGBToJ420 but with constants re-ordered.
Pixel 4
ABGRToJ420_Opt (623 ms)
ABGRToJ422_Opt (702 ms)
ABGRToJ400_Opt (238 ms)

Skylake Xeon
With LIBYUV_BIT_EXACT which uses C for UV
ABGRToJ420_Opt (988 ms)
ABGRToJ422_Opt (1872 ms)
ABGRToJ400_Opt (186 ms)
Skylake Xeon using AVX2
ABGRToJ420_Opt (251 ms)
ABGRToJ422_Opt (245 ms)
ABGRToJ400_Opt (184 ms)
Skylake Xeon using SSSE3
ABGRToJ420_Opt (328 ms)
ABGRToJ422_Opt (362 ms)
ABGRToJ400_Opt (185 ms)

Bug: b/238137982
Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-16 22:07:38 +00:00
Frank Barchard
1c5a8bb17a AB64ToARGB fix for inplace conversion
- add tests for all single plane formats that reduce or stay same in size

Bug: b/242233673
Change-Id: Ic25d808114f11995ac56ea9c31b99f66ba36d345
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3828485
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-12 01:28:13 +00:00
Wan-Teh Chang
9892d70c96 Fix MSVC warnings by adding casts
Fix the following MSVC warnings:
src\source\row_win.cc(117): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(136): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(155): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(174): warning C4309: 'argument': truncation of
constant value
src\source\row_common.cc(1712): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1731): warning C4244: 'initializing':
conversion from 'int16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1786): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1805): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data

Bug: libyuv:939
Change-Id: Ie87ba6e716732d1ff1ae5c236dfd9cfdac13439d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3807105
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-03 21:24:21 +00:00
Frank Barchard
b028453ba6 Disable bilinear 16 bit scale up for SSE2
- Undefine HAS_SCALEROWUP2_BILINEAR_16_SSE2
- Save XMM7 in ScaleRowUp2_Bilinear_16_SSE2().
- Rename HAS_SCALEROWUP2LINEAR_xxx to HAS_SCALEROWUP2_LINEAR_xxx
- DetileSplitUVRow_C() is implemented using SplitUVRow_C().
- Changes to unit_test/planar_test.cc.

Bug: libyuv:882
Change-Id: I0a8e8e5fb43bdf58ded87244e802343eacb789f2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3795063
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-01 22:54:48 +00:00
Frank Barchard
6900494d90 Merge/SplitRGB fix -mcmodel=large x86 and InterpolateRow_16To8_NEON
MergeRGB and SplitRGB use a register to point to 9 shuffle tables.

- fixes an out of registers error with -mcmodel=large

InterpolateRow_16To8_NEON improves performance for I210ToI420:

On Pixel 4 for 720p x1000 images
Was I210ToI420_Opt (608 ms)
Now I210ToI420_Opt (336 ms)

On Skylake Xeon
Was I210ToI420_Opt (259 ms)
Now I210ToI420_Opt (209 ms)


Bug: libyuv:931, libyuv:930
Change-Id: I20f8244803f06da511299bf1a2ffc7945eb35221
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717054
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-06-29 00:00:46 +00:00
Frank Barchard
fe4a50df8e Bilinear scale up msan fix
- Avoid stepping to height + 1 for bilinear filter 2nd row for last row of source
- Box filter ubsan fix for 3/4 and 3/8 scaling for 16 bit planar
- Height 1 asan fixes

Bug: libyuv:935, b/206716399
Change-Id: I56088520f2a884a37b987ee5265def175047673e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717263
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-06-22 00:11:49 +00:00
Frank Barchard
30f9b28048 Add I210ToI420
Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: Ib135d0b4ff17665f6a4ab60edb782a7b314219a4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3696042
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2022-06-09 08:07:50 +00:00