Removed all SSE functions, macros, dispatching logic, and related
unit tests across the repository to reduce code size and complexity.
Left cpuid detection intact. Supported architectures like AVX2, NEON,
SVE, etc. are unaffected.
R=rrwinterton@gmail.com
Bug: None
Test: Build and run libyuv_unittest
Change-Id: Id19608dba35b79c4c8fc31f920a6a968883d300f
Use ptrdiff_t instead of intptr_t for buffer offsets, such as stride,
width_temp, and src_step*.
Change-Id: I64e6701fa71ab59c94325a6dad8762d040035208
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7800070
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Fix int overflow of yi * src_stride overflow in ScalePlaneVertical(),
ScalePlaneVertical_16(), and ScalePlaneVertical_16To8() by casting the
operand src_stride to ptrdiff_t.
Adapted from the patches by Victor Miura <vmiura@google.com>.
Bug: 505814332
Change-Id: I4a4751041a213f7208b01eb18c43c9e196a36261
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7796558
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
ptrdiff_t is the appropriate type for a buffer offset. intptr_t is
intended for a different purpose.
Change-Id: I475c548338b61f573fb11766c24cde6d31fbbed8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7796559
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
- implement wrappers with RAW, RGB24, NV21 and JNV21 to call it.
Zen5
Was [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (1146 ms)
Now [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (1446 ms)
reason - the new code uses 1 pass for RAWToY but 2 pass for RAWToARGB,ARGBToUV. needs 1 RGBToUV
Bug: libyuv:42280902
Change-Id: Ife6fbed0829484045409e6d42b85cec1d1fd6052
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7780026
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Adds RGBToYMatrixRow_AVX2 which reads 24 bit RGB values by reading 3 vectors instead of 4 and permutes them into 4 ARGB vectors before conversion.
Also adds RGBToYMatrixRow_Opt and RGBToYMatrixRow_2Step_Opt to convert_argb_test.cc to benchmark and compare the direct AVX2 conversion vs a 2-step approach.
./libyuv_test '--gunit_filter=*RAWToJ400_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1
AMD Zen 5
Was LibYUVConvertTest.RAWToJ400_Opt (757 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (699 ms)
Intel Skylake
Was LibYUVConvertTest.RAWToJ400_Opt (1705 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (1426 ms)
Bug: 477295731
Change-Id: I29866baf4ad5fe7a3725e4a01f2fe24649510a7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7777325
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
implementing horizontal paired adds and accumulation to improve
performance on SiFive x280, and fixes the remainder logic to use valid
vlseg4 loads. Adds TestARGBToUVRow_Any to test odd-width remainder
handling.
Also fixes a build break for non-RVV compilations by ensuring all RVV
functions and their closing cplusplus braces are correctly wrapped in
#if !defined(LIBYUV_DISABLE_RVV).
Also adds NV12ToNV21 as a macro alias for NV21ToNV12 in
planar_functions.h, as the conversion is bidirectional (swapping byte
pairs in the interleaved chroma plane). (Patch from
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7762904)
Bug: libyuv:42280902
Change-Id: If2d6cbb3e232d63d43e32aba33fa9b2eee8190e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7772164
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
This change implements ARGBToUV444MatrixRow_RVV, ARGBToUVMatrixRow_RVV,
and their wrappers (ARGBToUVRow_RVV, ARGBToUVJRow_RVV, etc.) using RVV
intrinsics, mirroring the NEON/AVX2 designs. It wires them into the
build and dispatch systems.
LIBYUV_RVV_HAS_TUPLE_TYPE is always true on new compilers. This macro
has been removed, assuming it is true everywhere, reducing the amount of
code in row_rvv.cc, scale_rvv.cc, and row.h.
Tested via: ~/bin/doyuv3v && ~/bin/runyuv3v TestARGBToI444Matrix
~/bin/doyuv3av
Bug: libyuv:42280902
Change-Id: I36d305386b297d69023c068aa9c62ab6b2ad039c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7769956
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- remove inline asm which was only for 32 bit
- add ARGBToYMatrixRow_AVX2
- add gn flag libyuv_enable_rowwin=true
Example of building with GN and Ninja:
Without the new flag:
gn gen out/Release "--args=is_debug=false"
ninja -C out/Release
With the new flag:
gn gen out/Release "--args=is_debug=false libyuv_enable_rowwin=true"
ninja -C out/Release
Bug: libyuv:42280806, 477295731, libyuv:42280902, libyuv:439628764
R=dalecurtis@chromium.org, rrwinterton@gmail.com
Change-Id: I451bf814622fba690005c02fbf5816819c6a08c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7765790
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Benchmark on Icelake Xeon
Now AVX512BW:
[ OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms)
Was AVX2:
[ OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms)
- Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`.
- Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions.
- Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`.
- Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`.
- Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately.
- Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support.
- Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment.
Bug: 477295731
R=dalecurtis@chromium.org, rrwinterton@gmail.com
Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
I have successfully ported the usage of ARGBToYRow_AVX2 to dynamically detect and utilize ARGBToYRow_AVX512BW when available.
Here's a summary of the changes:
1. Source Modifications: In both source/convert.cc and source/convert_from_argb.cc, I searched for all references where ARGBToYRow_AVX2 was
being conditionally used (which operates on 32 pixels).
2. AVX512BW Detection: Immediately following those blocks, I injected a new check for kCpuHasAVX512BW. If the CPU flag is present, the logic
now utilizes ARGBToYRow_Any_AVX512BW by default, falling back to the fully aligned ARGBToYRow_AVX512BW when the width is aligned to 64
bytes.
3. Profiling: After building and compiling the tests (doyuv3x), I validated the change using perfyuv3 ARGBToNV12_Opt | cat. The test
successfully executed and the performance profile indicated that ARGBToYRow_AVX512BW successfully executed (taking up ~18% of CPU cycles,
replacing the previous AVX2 specific instruction overhead for the Y row extraction).
The HAS_ARGBTOYROW_AVX512BW macro implementation now fully supports all AVX2 conversion paths to utilize AVX512BW when the system processor
flags allow it!
R=richard, rrwinterton@gmail.com
Change-Id: Iad811e12d301f5621e6f6d039105420861ade43e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7760779
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
In all functions that start with ARGB, BGRA, RGBA or ABGR in the include/libyuv/ headers, make sure the parameter variable name has the same 4 letters, but lower case, and the comment before the function should have the same matching name. Then make sure the implementation in source/ folder has the same variable names.
Change-Id: Idadbbbb993156eea16e318719f4888cb3bed5f6a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7760057
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- add ARGBToYMatrixRow_AVX512BW
- refactor SSE and AVX to use Matrix functions, making old functions
call the new ones.
Zen5 1280x720
Was AVX2 LibYUVConvertTest.ARGBToI444_Opt (1125 ms)
Now AVX512 LibYUVConvertTest.ARGBToI444_Opt (641 ms)
Details by Gemini:
1. Created 3 new Matrix functions:
Added ARGBToYMatrixRow_SSSE3, ARGBToYMatrixRow_AVX2, and
ARGBToYMatrixRow_AVX512BW to source/row_gcc.cc. These take the
const struct ArgbConstants* c parameter similarly to
ARGBToUV444MatrixRow_*. The x86 vector instructions dynamically
calculate the needed values using the properties of the constants
struct, including using vpmaddwd inside the AVX512 code to offset
the lack of a native vphaddw.
2. Replaced Old Functions with Wrappers:
Modified the existing implementations of ARGBToYRow_SSSE3,
ARGBToYJRow_SSSE3, ABGRToYRow_SSSE3, ABGRToYJRow_SSSE3,
RGBAToYRow_SSSE3, RGBAToYJRow_SSSE3, BGRAToYRow_SSSE3 (and their
_AVX2 equivalents) in source/row_gcc.cc to act as inline wrappers
calling the new ARGBToYMatrixRow_* functions, passing the right
matrix parameters (e.g. &kArgbI601Constants, &kArgbJPEGConstants,
&kAbgrI601Constants).
3. Added row_any.cc Handlers:
Added ANY11MC definitions to source/row_any.cc to autogenerate
ARGBToYMatrixRow_Any_SSSE3, ARGBToYMatrixRow_Any_AVX2, and
ARGBToYMatrixRow_Any_AVX512BW which safely handles non-aligned
tails.
4. Updated include/libyuv/row.h:
Updated the headers with the proper void declarations for all newly
generated Matrix and Any_ variants. Also defined
HAS_ARGBTOYROW_AVX512BW in the CPU macros.
5. Tested the Implementations:
Compiled and tested on Linux x86, which resulted in all tests passing
cleanly. Also successfully completed all Windows 32-bit build checks
ensuring 32-bit regression prevention without issues.
Bug: 477295731
Change-Id: I4f5eec9a961e24a9d760d0a1c0810fb5e29a0bd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7759494
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Increases buffer sizes from 128 to 256 in ANY11, ANY11C, ANY11MC, ANY12,
and ANY12M macros to safely accommodate AVX512BW processing which can
write up to 256 bytes per operation.
Bug: libyuv:42280902, libyuv:502250231, 501882928
Change-Id: Icfba1982dc5fb6545255464f7decb2baec7be90f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7758060
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This fixes a build failure on bare-metal toolchains like
riscv64-unknown-elf-clang++ where strtok_r may be undeclared.
Bug: 477295731
Change-Id: If4edd6c6d2e975ae34278f479700ef9b996c0a3e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7744872
Reviewed-by: James Zern <jzern@google.com>
Adds a check for the AVX512F feature bit (cpu_info7[1] & 0x00010000)
before enabling AVX512 features. Alder Lake CPUs can report OS support
for YMM/ZMM but not actually support AVX512F, leading to incorrect
capability detection and crashes.
Bug: libyuv:500318522
Change-Id: I84167ee3fcfc7a2572afba148bbb275bd3ccb1e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7746229
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
- Add ifdef for LIBYUV_UNLIMITED_DATA
Fixed by Gemini just telling it how to build and run the test and to fix it.
Bug: libyuv:353545922
Change-Id: I117a25b75b9616ee2ce6122aa163c2085ed4dc7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7742120
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
The libyuv into Chromium roller is currently broken, see bug 500795092.
This change adds a forward declaration for struct ArgbConstants in
include/libyuv/convert.h. This resolves a -Wvisibility error where the
struct was being declared within a function prototype, making it
invisible outside that scope and breaking automated binding generation
(e.g., for crabbyavif).
Verified building crabbyavif_libyuv_bindings locally and this patch
fixed it.
Bug: 500795092
Change-Id: Ie0126650ab346940f4610bd4d2e8a5b3ef9ce103
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7739974
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
The latest Android NDK marks strtok as deprecated and suggests using
strtok_r instead.
Bug: 477295731
Change-Id: I2b20a2ae0a9e19ec93e31669ec380802e6902090
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7739107
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
These are about 25% faster than the C versions.
Bug: libyuv:42280902
Change-Id: I8b298670ee5f3ed5db35527fc41d6d9a51b020a1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7573682
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
This one reuses the SIMD implementations for MergeUVRow_ from the
existing ARGBToNV12 functions.
Bug: libyuv:42280902
Change-Id: If0a4be133d657ed0262f29fdd568dac90b49636c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564317
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
This allows for ABGR conversion using the same methods
Bug: libyuv:42280902
Change-Id: I5566e3150b30573a2326a900ce31ab095f8935f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564316
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
This was implemented by Gemini followed by manual review and some
tweaking for style. The 601 and JPEG constants are fully verified
against the existing non-matrix implementations. On x86 the C-only
versions appear to be about 25% slower than the optimized ones.
Bug: libyuv:42280902
Change-Id: Ia5b7cb499bad5c76faec53f36086ebb18f2b530f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512030
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
This CL sets the Update Mechanism to Manual in README files.
Bug: 445311061
Change-Id: I4df6c5815b85c04b047b39b4352ba43789702d26
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512992
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Owners-Override: Jordan Brown <rop@google.com>
GCC now supports vector segment load and store, which
was previously missing; and the reason why it was disabled.
Change-Id: I923fd8a15476de8dcc2103bb8335d4fcc3ca96a9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7241606
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Following crrev.com/c/7171485, libyuv should be able to rely on
the default xcode version of the bots.
Bug: 461757070
Change-Id: Iecc34bb0b0476b61be1d9dfd51904396913c85f4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7177782
Commit-Queue: Victor Vianna <victorvianna@google.com>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Detect if arm cpu support FMMLA instruction
Bug: None
Change-Id: Ia7b83bf2735ddeeb8a85da44177e708c34e4b1fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7085486
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
planar_test.cc was
Error: selected processor does not support `vmrs r3,fpscr' in ARM mode
Error: selected processor does not support `vmsr fpscr,r3' in ARM mode
Bug: None
Change-Id: I2ee0e7191c372277901c94e29d9ed91bbac71af2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7063737
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
According to README of rvv-intrinsic-doc,
Clang 19 and GCC 14 supports the v1.0 version.
But __riscv_v_intrinsic is 12000 on Clang 19,
so need Clang >= 20 to test this patch.
I test it with Clang 21.
Change-Id: I0e75efcdab3e7bc0ce1acd19eca3568b47c84cbf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6995438
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The .cfg file in https://crrev.com/c/7043382 wasn't up-to-date for
some reason. I verified this CL indeed updates xcode in the iOS
bot. mac_asan is still broken for now.
- led get-builder "libyuv/try/ios_arm64_rel" > config.json
- Edit config.json to include the snippet below in its properties.
- cat config.json | led launch
https://ci.chromium.org/ui/p/libyuv/builders/ci.shadow/iOS%20ARM64%20Release/1/overview
```
"$depot_tools/osx_sdk": {
"sdk_version": "17a324"
},
```
Bug: 448679376
Change-Id: Ie15e6164246611a5a1c06357307be512da0ff902
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7046681
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>