Removed all SSE functions, macros, dispatching logic, and related
unit tests across the repository to reduce code size and complexity.
Left cpuid detection intact. Supported architectures like AVX2, NEON,
SVE, etc. are unaffected.
R=rrwinterton@gmail.com
Bug: None
Test: Build and run libyuv_unittest
Change-Id: Id19608dba35b79c4c8fc31f920a6a968883d300f
- Replace ScalePlane with CopyPlane for Y channel
- Vertical mirroring is supported, but not horizontal mirroring.
- Check src_y is not null when dst_y is not null for all libyuv functions that allow a null dst_y.
- Apply clang-format
- Bump version to 1899
Bug: None
Change-Id: Id1805b52b8024ba95a7f1b098dabf45af48670eb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6128599
Reviewed-by: Wan-Teh Chang <wtc@google.com>
InterpolateRow_SME and InterpolateRow_16_SME need special cases to
handle if source_y_fraction is 256 since this would overflow a byte and
can just be a call to memcpy instead.
InterpolateRow_16To8_SME is never called with a source_y_fraction value
of 256 so there is no need for a special case here.
Change-Id: I67805b5db2c411acb93ada626cf414b35620f467
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6074375
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Some projects require scaling support for the NV24 format, but libyuv currently lacks this functionality. This commit adds a scaling function for NV24, enabling its use in projects that require NV24 format processing.
Change-Id: I6e6b2bea342e1df7f387056ab3bc5003da983bb7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6068715
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Mostly just straightforward copies of the Neon code ported to
Streaming-SVE, these follow the same pattern as the prior ScaleRowDown2
SME kernels, but operating on 16-bit data rather than 8-bit.
These is no benefit from this kernel when the SVE vector length is only
128 bits, so skip writing a non-streaming SVE implementation.
Change-Id: I7bad0719d24cdb1760d1039c63c0e77726b28a54
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6070784
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
The auto-vectorized implementation unrolls to process 32 elements per
iteration, so unroll the new Neon implementation to match and avoid a
performance regression on little cores.
Performance relative to the auto-vectorized C implementation compiled
with LLVM 19:
Cortex-A55: -35.8%
Cortex-A510: -20.4%
Cortex-A520: -22.1%
Cortex-A76: -54.8%
Cortex-A710: -44.5%
Cortex-A715: -31.1%
Cortex-A720: -31.4%
Cortex-X1: -48.5%
Cortex-X2: -47.8%
Cortex-X3: -47.6%
Cortex-X4: -51.1%
Cortex-X925: -14.6%
Bug: b/42280942
Change-Id: Ib4e89ba230d554f2717052e934ca0e8a109ccc42
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6040153
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The #ifdef surrounding the use of this kernel is never defined and
ScaleRowDown2_16_NEON does not exist, so add the missing #define and
remove the use of ScaleRowDown2_16_NEON for now. Additionally since
there is no implementation of this kernel for 32-bit Arm, restrict the
define to only be present on AArch64.
Bug: b/42280942
Change-Id: Icc35c145c1bad1c0df2933a2d8bc7dcf7fe63cb7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6040152
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
There is no benefit from an SVE version of this kernel for devices with
an SVE vector length of 128-bits, so skip directly to SME instead. We
do not use the ZA tile here, so this is a purely streaming-SVE (SSVE)
implementation.
Change-Id: I5021aeda30f4c5f1aa4cc6326c8d7886851d2c09
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913885
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
There is no benefit from an SVE version of this kernel for devices with
an SVE vector length of 128-bits, so skip directly to SME instead. We
do not use the ZA tile here, so this is a purely streaming-SVE (SSVE)
implementation.
Change-Id: Ie6b91bd4407130ba2653838088e81e72e4460f68
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913884
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Including associated changes for adding a new scale_sme.cc file.
There is no benefit from an SVE version of this kernel for devices with
an SVE vector length of 128-bits, so skip directly to SME instead. We
do not use the ZA tile here, so this is a purely streaming-SVE (SSVE)
implementation.
Change-Id: I47d149613fbabd8c203605a809811f1a668e8fb7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913883
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
We can use wider load/store instructions here which is mostly an
improvement across the board.
Reduction in runtimes observed compared to the existing Neon
implementation:
Cortex-A55: +4.9% (!)
Cortex-A510: -46.3%
Cortex-A520: -49.0%
Cortex-A76: -12.2%
Cortex-A715: -15.5%
Cortex-A720: -15.0%
Cortex-X1: -12.4%
Cortex-X2: -12.5%
Cortex-X3: -12.3%
Cortex-X4: +0.3%
Bug: b/42280945
Change-Id: Id8af6499c63919924c2a954dfe7765b703ce4820
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5785970
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- Scaling 48 pixels at a time, but calling code checked for 24 pixels
- Added test for scaling to 1080x1920
libyuv_test --gunit_filter=LibYUVScaleTest.I420ScaleTo1080x1920_Box* --libyuv_width=1440 --libyuv_height=2560
Was
libyuv_test --gunit_filter=LibYUVScaleTest.I420ScaleTo1080x1920_Box* --libyuv_width=1440 --libyuv_height=2560
[ RUN ] LibYUVScaleTest.I420ScaleTo1080x1920_Box
Segmentation fault
Traceback (most recent call last):
Now
[ RUN ] LibYUVScaleTest.I420ScaleTo1080x1920_Box
filter 3 - 6741 us C - 3566 us OPT
[ OK ] LibYUVScaleTest.I420ScaleTo1080x1920_Box (43 ms)
Bug: b/366045177
Change-Id: I0ea6c2d6a32b2e7ca44cd030abc9f248115be44a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5857554
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Change ScalePlane(), ScalePlane_16(), and ScalePlane_12() to return int
so that they can report memory allocation failures (by returning 1).
BUG=libyuv:968
Change-Id: Ie5c183ee42e3d595302671f9ecb7b3472dc8fdb5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5005031
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
src_width parameter is used for assertions and unused with NDEBUG.
Fix the warning treated as an error when -Wall -Wextra -Werror is used
to build that part of the code.
BUG=libyuv:967
Change-Id: I4c02ab013e8e2684b3bed5ce9693e1493d7751b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4905033
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
ScaleUVRowUp2_(Bi)linear_RVV function is equal to other platforms' ScaleRowUp2_(Bi)linear_Any_XXX.
We process entire row in this function.
Other platforms only implement non-edge part of image and process edge with scalar.
ScaleRowUp2_(Bi)linear_Any_XXX: Combine ScaleRowUp2_(Bi)linear_XXX(non-edge) + ScaleRowUp2_(Bi)linear_C(edge) by SBUH2LANY/SU2BLANY.
* Run on SiFive internal FPGA:
Test case RVV function Speedup
I444ScaleFrom640x360_Bilinear ScaleRowUp2_Bilinear_RVV 8.21
I444ScaleFrom640x360_Linear ScaleRowUp2_Linear_RVV 8.08
UVScaleFrom640x360_Bilinear ScaleUVRowUp2_Bilinear_RVV 7.80
UVScaleFrom640x360_Linear ScaleUVRowUp2_Linear_RVV 7.03
Change-Id: I539245ce51858f077506a78f0e7e82377ac6a95d
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4666062
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Add static to internal scale and rotate functions
- Remove unittest that tested an internal scale function
- Remove unused private functions
- Include missing scale_argb.h header
- Bump version and apply clang format
Bug: libyuv:830
Change-Id: I45bab0423b86334f9707f935aedd0c6efc442dd4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4658956
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Run on SiFive internal FPGA:
Test case RVV function Speedup
I444ScaleDownBy3by4_None ScaleRowDown34_RVV 5.8
I444ScaleDownBy3by4_Linear ScaleRowDown34_0/1_Box_RVV 6.5
I444ScaleDownBy3by4_Bilinear ScaleRowDown34_0/1_Box_RVV 6.3
Bug: libyuv:956
Change-Id: I8ef221ab14d631e14f1ba1aaa25d2b30d4e710db
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4607777
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
* Run on SiFive internal FPGA:
MergeUVPlane_Opt(~6x vs scalar)
SplitUVPlane_Opt(~6x vs scalar)
TestCopyPlane(~8x vs scalar)
ARGBInterpolate0_Opt(~10x vs scalar)
ARGBInterpolate64_Opt(~9x vs scalar)
ARGBInterpolate168_Opt(~9x vs scalar)
ARGBInterpolate192_Opt(~8.5x vs scalar)
ARGBInterpolate255_Opt(~8x vs scalar)
Bug: libyuv:956
Change-Id: I8372341865f75f42e30371ef943d5c2e4be7b79a
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4574186
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Avoid repetitions of the expression boxwidth - minboxwidth.
Change-Id: Ib53fb6b06a926b80ff9a64cc5d499aeef0894c99
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408062
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The Clang static analyzer (scan-build) in LLVM 14 warns about
array index out of bounds in scaletbl[boxwidth - minboxwidth] in
ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two
elements. It's not clear the index boxwidth - minboxwidth is either 0 or
1.
Change-Id: I072476e86950154beffe6b1a89915755118b3cbd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
- Minor variable name changes first/last to top/bottom
- Comments explaining rotate temporary buffers usage
- Add asserts for scale parameter
- Use NULL and stddef.h instead of 0
- Use void * for allocation in row.h
- Add () around size parameter in macros
Bug: libyuv:926, libyuv:949
Change-Id: Ib55417570926ccada0a0f8abd1753dc12e5b162e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4136762
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- MT2T support for source strides added, but only works for positive values.
- Reduced casting in row_common - one cast per assignment.
- scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t
Bug: libyuv:948, b/257266635, b/262468594
Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ernest Hua <ernesthua@google.com>
- Define HAS_SCALEROWUP2_BILINEAR_16_SSE2: it's now fixed.
- Correct function name to ScaleRowUp2_Bilinear_16_Any_SSE2:
this row function uses only SSE2 instructions.
Bug: libyuv:882
Change-Id: Ib1c7ac5b09997cb5b32bc54109d8c566af762433
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3800842
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- Avoid stepping to height + 1 for bilinear filter 2nd row for last row of source
- Box filter ubsan fix for 3/4 and 3/8 scaling for 16 bit planar
- Height 1 asan fixes
Bug: libyuv:935, b/206716399
Change-Id: I56088520f2a884a37b987ee5265def175047673e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717263
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit
- Add NEON InterpolateRow_16 for fast 10 bit scaling
- When scaling up, set step to interpolate toward height - 1 to avoid buffer overread
- When scaling down, center the 2 rows used for source to achieve filtering.
- CopyPlane check for 0 size and return
Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Optimize 20 functions in source/scale_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Optimize 44 functions in source/row_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- ubsan complains on unaligned tests when an int16 or int32 is stored unaligned in C.
Although current Intel, ARM, Mips and PPC can do unaligned load/store, its not guaranteed
and could crash a CPU that doesnt support it.
- unaligned tests use offset of 2 or 4, which ubsan accepts.
- unittest fills in random buffer with 2 bytes at a time instead of a short.
- row common functions for int16 types use 2 shorts instead of 1 int.
Bug: libyuv:908, b/203243873
Change-Id: Idf13fa901647d7b0975f1947291caa781999a9bc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3229782
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
- reenable Intel SIMD unaffected by BIT_EXACT
- add bit exact version of ARGBAttenuate, which uses ARM version of formula.
- add bit exact version of ARGBUnatenuate, which mimics the AVX code.
Apply clang format to cleanup code.
Bug: libyuv:908, b/202888439
Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>