Was C
LibYUVConvertTest.ARGBToI444_Opt (1027 ms)
Now AVX2
LibYUVConvertTest.ARGBToI444_Opt (310 ms)
Bug: libyuv:508639302
Change-Id: I0bc7f5c5b72160d24226a98d5fddb184a004ed00
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7841655
Reviewed-by: richard winterton <rrwinterton@gmail.com>
- Standardize libyuv ARGB-family (ARGB, ABGR, RGBA, BGRA) to YUV conversion by utilizing the generic MatrixRow architecture and explicit ArgbConstants.
- Consolidated ARGBToI420, ABGRToI420, BGRAToI420, and RGBAToI420 as wrappers for ARGBToI420Matrix.
- Refactored ABGRToJ420, ABGRToJ422, and ABGRToI422 to use generic matrix functions.
- Added matrix-based versions for NV21, I400, YUY2, and UYVY.
- Updated RAW and RGB24 to I420/I422/I444 dispatchers to use MatrixRow logic and explicit constants.
- Fixed parameter swap bugs in ARGBToI422, ARGBToJ422, and ABGRToJ422.
- Fixed a bug in the generic C implementation of matrix row functions ensuring all 4 channels are processed correctly for all ARGB-family formats.
- Moved kShuffleAARRGGBB in row_gcc.cc to the top of the libyuv namespace for visibility.
- Cleaned up redundant format-specific row implementations.
Bug: libyuv:42280902
Change-Id: I67ffa4c476abc0d2dcc4650510d7bda91b65988e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7830291
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
This consolidation standardizes conversion logic, improves code
maintainability, and provides flexible support for various color spaces
(e.g., BT.601, JPEG full
range).
Key Modifications:
- Function Consolidation: Refactored several high-level conversion functions into lightweight wrappers around generic Matrix variants:
- ARGBToI420 → ARGBToI420Matrix
- ARGBToI444 → ARGBToI444Matrix
- ARGBToI422 → ARGBToI422Matrix
- ARGBToNV12 → ARGBToNV12Matrix
- RAWToJ400, RGB24ToJ400 → RGBToI400Matrix
- RAWToI444, RAWToJ444 → RGBToI444Matrix
- 2-Pass Conversions: Updated RGB565ToI420, ARGB1555ToI420, and ARGB4444ToI420 to utilize 2-pass conversions via RGBToI420Matrix.
- Standardization: Refactored ARGBToNV21, ARGBToYUY2, and ARGBToUYVY to use parameterized matrix row functions (ARGBToYMatrixRow,
ARGBToUVMatrixRow).
- Legacy Cleanup: Replaced legacy calls to ARGBToYJRow with the parameterized ARGBToYMatrixRow in the ARGBSobelize helper.
- Internal Integration: Included libyuv/convert_from_argb.h in planar_functions.cc and ensured all new matrix symbols are properly
declared/exported (LIBYUV_API).
Bug: libyuv:42280902
Change-Id: Ied5fd9899767427e3a03cdcfbeaff3e9d502374a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7822033
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Here are the functions flagged for mixing both SSE and AVX (or AVX-512)
instructions, which can trigger an AVX transition/assist performance
penalty:
Libyuv Functions addressed in this CL
* I422ToARGBRow_AVX512BW
* HalfFloatRow_SSE2
Not addressed:
* ScaleFilterCols_SSSE3
Bug: libyuv:509681367
Change-Id: I8ced6065dfe0c516d05857086393782c8590062a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7814945
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Fix int overflow of yi * src_stride overflow in ScalePlaneVertical(),
ScalePlaneVertical_16(), and ScalePlaneVertical_16To8() by casting the
operand src_stride to ptrdiff_t.
Adapted from the patches by Victor Miura <vmiura@google.com>.
Bug: 505814332
Change-Id: I4a4751041a213f7208b01eb18c43c9e196a36261
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7796558
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
- implement wrappers with RAW, RGB24, NV21 and JNV21 to call it.
Zen5
Was [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (1146 ms)
Now [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (1446 ms)
reason - the new code uses 1 pass for RAWToY but 2 pass for RAWToARGB,ARGBToUV. needs 1 RGBToUV
Bug: libyuv:42280902
Change-Id: Ife6fbed0829484045409e6d42b85cec1d1fd6052
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7780026
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
implementing horizontal paired adds and accumulation to improve
performance on SiFive x280, and fixes the remainder logic to use valid
vlseg4 loads. Adds TestARGBToUVRow_Any to test odd-width remainder
handling.
Also fixes a build break for non-RVV compilations by ensuring all RVV
functions and their closing cplusplus braces are correctly wrapped in
#if !defined(LIBYUV_DISABLE_RVV).
Also adds NV12ToNV21 as a macro alias for NV21ToNV12 in
planar_functions.h, as the conversion is bidirectional (swapping byte
pairs in the interleaved chroma plane). (Patch from
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7762904)
Bug: libyuv:42280902
Change-Id: If2d6cbb3e232d63d43e32aba33fa9b2eee8190e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7772164
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
- remove inline asm which was only for 32 bit
- add ARGBToYMatrixRow_AVX2
- add gn flag libyuv_enable_rowwin=true
Example of building with GN and Ninja:
Without the new flag:
gn gen out/Release "--args=is_debug=false"
ninja -C out/Release
With the new flag:
gn gen out/Release "--args=is_debug=false libyuv_enable_rowwin=true"
ninja -C out/Release
Bug: libyuv:42280806, 477295731, libyuv:42280902, libyuv:439628764
R=dalecurtis@chromium.org, rrwinterton@gmail.com
Change-Id: I451bf814622fba690005c02fbf5816819c6a08c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7765790
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Benchmark on Icelake Xeon
Now AVX512BW:
[ OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms)
Was AVX2:
[ OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms)
- Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`.
- Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions.
- Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`.
- Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`.
- Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately.
- Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support.
- Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment.
Bug: 477295731
R=dalecurtis@chromium.org, rrwinterton@gmail.com
Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- add ARGBToYMatrixRow_AVX512BW
- refactor SSE and AVX to use Matrix functions, making old functions
call the new ones.
Zen5 1280x720
Was AVX2 LibYUVConvertTest.ARGBToI444_Opt (1125 ms)
Now AVX512 LibYUVConvertTest.ARGBToI444_Opt (641 ms)
Details by Gemini:
1. Created 3 new Matrix functions:
Added ARGBToYMatrixRow_SSSE3, ARGBToYMatrixRow_AVX2, and
ARGBToYMatrixRow_AVX512BW to source/row_gcc.cc. These take the
const struct ArgbConstants* c parameter similarly to
ARGBToUV444MatrixRow_*. The x86 vector instructions dynamically
calculate the needed values using the properties of the constants
struct, including using vpmaddwd inside the AVX512 code to offset
the lack of a native vphaddw.
2. Replaced Old Functions with Wrappers:
Modified the existing implementations of ARGBToYRow_SSSE3,
ARGBToYJRow_SSSE3, ABGRToYRow_SSSE3, ABGRToYJRow_SSSE3,
RGBAToYRow_SSSE3, RGBAToYJRow_SSSE3, BGRAToYRow_SSSE3 (and their
_AVX2 equivalents) in source/row_gcc.cc to act as inline wrappers
calling the new ARGBToYMatrixRow_* functions, passing the right
matrix parameters (e.g. &kArgbI601Constants, &kArgbJPEGConstants,
&kAbgrI601Constants).
3. Added row_any.cc Handlers:
Added ANY11MC definitions to source/row_any.cc to autogenerate
ARGBToYMatrixRow_Any_SSSE3, ARGBToYMatrixRow_Any_AVX2, and
ARGBToYMatrixRow_Any_AVX512BW which safely handles non-aligned
tails.
4. Updated include/libyuv/row.h:
Updated the headers with the proper void declarations for all newly
generated Matrix and Any_ variants. Also defined
HAS_ARGBTOYROW_AVX512BW in the CPU macros.
5. Tested the Implementations:
Compiled and tested on Linux x86, which resulted in all tests passing
cleanly. Also successfully completed all Windows 32-bit build checks
ensuring 32-bit regression prevention without issues.
Bug: 477295731
Change-Id: I4f5eec9a961e24a9d760d0a1c0810fb5e29a0bd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7759494
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
- Add ifdef for LIBYUV_UNLIMITED_DATA
Fixed by Gemini just telling it how to build and run the test and to fix it.
Bug: libyuv:353545922
Change-Id: I117a25b75b9616ee2ce6122aa163c2085ed4dc7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7742120
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This CL sets the Update Mechanism to Manual in README files.
Bug: 445311061
Change-Id: I4df6c5815b85c04b047b39b4352ba43789702d26
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512992
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Owners-Override: Jordan Brown <rop@google.com>
planar_test.cc was
Error: selected processor does not support `vmrs r3,fpscr' in ARM mode
Error: selected processor does not support `vmsr fpscr,r3' in ARM mode
Bug: None
Change-Id: I2ee0e7191c372277901c94e29d9ed91bbac71af2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7063737
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Skylake
Was ARGBToJ420_Opt (312 ms)
Now ARGBToJ420_Opt (242 ms)
Icelake
Was ARGBToJ420_Opt (302 ms)
Now ARGBToJ420_Opt (220 ms)
AMD Zen3 on Windows
Was ARGBToJ420_Opt (305 ms)
Now ARGBToJ420_Opt (216 ms)
32 bit x86 uses SSE
Now ARGBToJ420_Opt (326 ms)
MCA analysis of new AVX, SSE and old AVX
https://godbolt.org/z/37bdazWYr
Bug: None
Change-Id: I72f5504407751e164c3558aebe836dd15223d65f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6957477
Reviewed-by: Justin Green <greenjustin@google.com>
- detect lack of dot product instruction to infer the cpu is low end
- only run the test on higher end arm
Bug: 416842099
Change-Id: Idd2dd16a624bbba280cf531644440024b12f7ecf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6804632
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
- Was using avgb twice for non-exact and C for exact.
On Skylake Xeon:
Now SSE3
ARGBToJ420_Opt (326 ms)
Was
Exact C
ARGBToJ420_Opt (871 ms)
Not exact AVX2
ARGBToJ420_Opt (237 ms)
Not exact SSSE3
ARGBToJ420_Opt (312 ms)
Bug: 381138208
Change-Id: I6d1081bb52e36f06736c0c6575fa82bb2268629b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6629605
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ben Weiss <bweiss@google.com>
- Add +i8mm build option for sve ARGBToUV which uses usdot
- util/cpuid Get cpu count (windows, macos, linux)
- For each x86 cpu, detect hybrid (e-core)
- Includes a comment fix for ubsan unittest
- Bump version
- Apply clang format to util/*.c as well as all *.cc/*.h
Bug: 424637372
Change-Id: I08310e18051fff62c9e4e4a10d1e4361871119ac
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6635640
Reviewed-by: Wan-Teh Chang <wtc@google.com>
- make width loop count on stack
- set YMM constants in its own asm block
- make struct for shuffle and add constants
- disable clang format on row_neon.cc function
Bug: 413781394
Change-Id: I263f6862cb7589dc31ac65d118f7ebeb65dbb24a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6495259
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- use negative coefficients for UV to allow -128
- change shift to truncate instead of round for UV
- adapt all row_gcc RGB to UV into matrix functions
- add -DLIBYUV_ENABLE_ROWWIN to allow clang on Windows to use row_win.cc
Bug: 381138208
Change-Id: I6016062c859faf147a8a2cdea6c09976cbf2963c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6277710
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>