Removed all SSE functions, macros, dispatching logic, and related
unit tests across the repository to reduce code size and complexity.
Left cpuid detection intact. Supported architectures like AVX2, NEON,
SVE, etc. are unaffected.
R=rrwinterton@gmail.com
Bug: None
Test: Build and run libyuv_unittest
Change-Id: Id19608dba35b79c4c8fc31f920a6a968883d300f
- implement wrappers with RAW, RGB24, NV21 and JNV21 to call it.
Zen5
Was [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (1146 ms)
Now [ OK ] LibYUVConvertTest.RAWToJNV21_Opt (1446 ms)
reason - the new code uses 1 pass for RAWToY but 2 pass for RAWToARGB,ARGBToUV. needs 1 RGBToUV
Bug: libyuv:42280902
Change-Id: Ife6fbed0829484045409e6d42b85cec1d1fd6052
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7780026
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Adds RGBToYMatrixRow_AVX2 which reads 24 bit RGB values by reading 3 vectors instead of 4 and permutes them into 4 ARGB vectors before conversion.
Also adds RGBToYMatrixRow_Opt and RGBToYMatrixRow_2Step_Opt to convert_argb_test.cc to benchmark and compare the direct AVX2 conversion vs a 2-step approach.
./libyuv_test '--gunit_filter=*RAWToJ400_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=10000 --libyuv_flags=-1 --libyuv_cpu_info=-1
AMD Zen 5
Was LibYUVConvertTest.RAWToJ400_Opt (757 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (699 ms)
Intel Skylake
Was LibYUVConvertTest.RAWToJ400_Opt (1705 ms)
Now LibYUVConvertTest.RAWToJ400_Opt (1426 ms)
Bug: 477295731
Change-Id: I29866baf4ad5fe7a3725e4a01f2fe24649510a7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7777325
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This change implements ARGBToUV444MatrixRow_RVV, ARGBToUVMatrixRow_RVV,
and their wrappers (ARGBToUVRow_RVV, ARGBToUVJRow_RVV, etc.) using RVV
intrinsics, mirroring the NEON/AVX2 designs. It wires them into the
build and dispatch systems.
LIBYUV_RVV_HAS_TUPLE_TYPE is always true on new compilers. This macro
has been removed, assuming it is true everywhere, reducing the amount of
code in row_rvv.cc, scale_rvv.cc, and row.h.
Tested via: ~/bin/doyuv3v && ~/bin/runyuv3v TestARGBToI444Matrix
~/bin/doyuv3av
Bug: libyuv:42280902
Change-Id: I36d305386b297d69023c068aa9c62ab6b2ad039c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7769956
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Benchmark on Icelake Xeon
Now AVX512BW:
[ OK ] LibYUVConvertTest.ARGBToNV12_Opt (1723 ms)
Was AVX2:
[ OK ] LibYUVConvertTest.ARGBToNV12_Opt (2144 ms)
- Added `ARGBToUVMatrixRow_AVX512BW` implementation in `source/row_gcc.cc`.
- Added corresponding `ARGBToUVRow_AVX512BW` and `ABGRToUVRow_AVX512BW` functions.
- Added unaligned wrappers `ARGBToUVRow_Any_AVX512BW` and `ABGRToUVRow_Any_AVX512BW` in `source/row_any.cc`.
- Updated `source/row_any.cc` to correctly size `vin` and `vout` buffers for AVX512BW width and adjusted the `ANY12MS` and `ANY12S` macros to handle `MASK=63`.
- Updated `include/libyuv/row.h` with the required AVX512BW headers and definitions, scoped appropriately.
- Wired all callers of `ARGBToUVRow_AVX2` and related functions in `source/convert.cc` and `source/convert_from_argb.cc` to dynamically use the `AVX512BW` implementations if the CPU flag indicates AVX-512BW support.
- Optimized AVX-512 code to generate the `-1` multiplier in a single instruction (`vpternlogd`) and reused it across word (`vpmaddwd`) dot products. Handled the resulting negation by replacing a subtraction with `vpaddw` offset adjustment.
Bug: 477295731
R=dalecurtis@chromium.org, rrwinterton@gmail.com
Change-Id: Ida5fb27e59ae4c1c3824737f009b80549cd20a06
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7763257
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
I have successfully ported the usage of ARGBToYRow_AVX2 to dynamically detect and utilize ARGBToYRow_AVX512BW when available.
Here's a summary of the changes:
1. Source Modifications: In both source/convert.cc and source/convert_from_argb.cc, I searched for all references where ARGBToYRow_AVX2 was
being conditionally used (which operates on 32 pixels).
2. AVX512BW Detection: Immediately following those blocks, I injected a new check for kCpuHasAVX512BW. If the CPU flag is present, the logic
now utilizes ARGBToYRow_Any_AVX512BW by default, falling back to the fully aligned ARGBToYRow_AVX512BW when the width is aligned to 64
bytes.
3. Profiling: After building and compiling the tests (doyuv3x), I validated the change using perfyuv3 ARGBToNV12_Opt | cat. The test
successfully executed and the performance profile indicated that ARGBToYRow_AVX512BW successfully executed (taking up ~18% of CPU cycles,
replacing the previous AVX2 specific instruction overhead for the Y row extraction).
The HAS_ARGBTOYROW_AVX512BW macro implementation now fully supports all AVX2 conversion paths to utilize AVX512BW when the system processor
flags allow it!
R=richard, rrwinterton@gmail.com
Change-Id: Iad811e12d301f5621e6f6d039105420861ade43e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7760779
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
- add ARGBToYMatrixRow_AVX512BW
- refactor SSE and AVX to use Matrix functions, making old functions
call the new ones.
Zen5 1280x720
Was AVX2 LibYUVConvertTest.ARGBToI444_Opt (1125 ms)
Now AVX512 LibYUVConvertTest.ARGBToI444_Opt (641 ms)
Details by Gemini:
1. Created 3 new Matrix functions:
Added ARGBToYMatrixRow_SSSE3, ARGBToYMatrixRow_AVX2, and
ARGBToYMatrixRow_AVX512BW to source/row_gcc.cc. These take the
const struct ArgbConstants* c parameter similarly to
ARGBToUV444MatrixRow_*. The x86 vector instructions dynamically
calculate the needed values using the properties of the constants
struct, including using vpmaddwd inside the AVX512 code to offset
the lack of a native vphaddw.
2. Replaced Old Functions with Wrappers:
Modified the existing implementations of ARGBToYRow_SSSE3,
ARGBToYJRow_SSSE3, ABGRToYRow_SSSE3, ABGRToYJRow_SSSE3,
RGBAToYRow_SSSE3, RGBAToYJRow_SSSE3, BGRAToYRow_SSSE3 (and their
_AVX2 equivalents) in source/row_gcc.cc to act as inline wrappers
calling the new ARGBToYMatrixRow_* functions, passing the right
matrix parameters (e.g. &kArgbI601Constants, &kArgbJPEGConstants,
&kAbgrI601Constants).
3. Added row_any.cc Handlers:
Added ANY11MC definitions to source/row_any.cc to autogenerate
ARGBToYMatrixRow_Any_SSSE3, ARGBToYMatrixRow_Any_AVX2, and
ARGBToYMatrixRow_Any_AVX512BW which safely handles non-aligned
tails.
4. Updated include/libyuv/row.h:
Updated the headers with the proper void declarations for all newly
generated Matrix and Any_ variants. Also defined
HAS_ARGBTOYROW_AVX512BW in the CPU macros.
5. Tested the Implementations:
Compiled and tested on Linux x86, which resulted in all tests passing
cleanly. Also successfully completed all Windows 32-bit build checks
ensuring 32-bit regression prevention without issues.
Bug: 477295731
Change-Id: I4f5eec9a961e24a9d760d0a1c0810fb5e29a0bd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7759494
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
These are about 25% faster than the C versions.
Bug: libyuv:42280902
Change-Id: I8b298670ee5f3ed5db35527fc41d6d9a51b020a1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7573682
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
This allows for ABGR conversion using the same methods
Bug: libyuv:42280902
Change-Id: I5566e3150b30573a2326a900ce31ab095f8935f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564316
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
This was implemented by Gemini followed by manual review and some
tweaking for style. The 601 and JPEG constants are fully verified
against the existing non-matrix implementations. On x86 the C-only
versions appear to be about 25% slower than the optimized ones.
Bug: libyuv:42280902
Change-Id: Ia5b7cb499bad5c76faec53f36086ebb18f2b530f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512030
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Mostly just a straightforward copy of the existing SVE2 code ported to
Streaming-SVE. Introduce new "any" kernels for non-multiple of two
cases, matching what we already do for SVE2.
The existing SVE2 code makes use of the Neon MOVI instruction that is
not supported in Streaming-SVE, so adjust the code to use FMOV instead
which has the same performance characteristics.
Change-Id: I74b7ea1fe8e6af75dfaf92826a4de775a1559f77
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6663806
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The maximum coefficient is 128, so store constants negated to take
advantage of -128 being representable in 8-bit integers. This allows us
to use the I8MM USDOT instructions.
Reduction in time taken observed compared to the existing Neon
implementation, as a geomean of all ARGBToUV variants:
Cortex-A510: -7.1%
Cortex-A520: -2.1%
Cortex-A710: -8.4%
Cortex-A715: -0.3%
Cortex-A720: -0.3%
Cortex-X2: -40.0%
Cortex-X3: -43.3%
Cortex-X4: -11.3%
Cortex-X925: -2.5%
Change-Id: Id06dc17d101b66975b84b93e5abe91c0032921dd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6535686
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- use negative coefficients for UV to allow -128
- change shift to truncate instead of round for UV
- adapt all row_gcc RGB to UV into matrix functions
- add -DLIBYUV_ENABLE_ROWWIN to allow clang on Windows to use row_win.cc
Bug: 381138208
Change-Id: I6016062c859faf147a8a2cdea6c09976cbf2963c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6277710
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Replace ScalePlane with CopyPlane for Y channel
- Vertical mirroring is supported, but not horizontal mirroring.
- Check src_y is not null when dst_y is not null for all libyuv functions that allow a null dst_y.
- Apply clang-format
- Bump version to 1899
Bug: None
Change-Id: Id1805b52b8024ba95a7f1b098dabf45af48670eb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6128599
Reviewed-by: Wan-Teh Chang <wtc@google.com>
InterpolateRow_SME and InterpolateRow_16_SME need special cases to
handle if source_y_fraction is 256 since this would overflow a byte and
can just be a call to memcpy instead.
InterpolateRow_16To8_SME is never called with a source_y_fraction value
of 256 so there is no need for a special case here.
Change-Id: I67805b5db2c411acb93ada626cf414b35620f467
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6074375
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Mostly just a straightforward copy of the Neon code ported to
Streaming-SVE, we can use predication to avoid needing an `Any` kernel.
SVE has a "widening multiply get high half" instruction in UMULH,
however using the same technique as the Neon code to avoid the need for
a widening multiply at all is more performant here.
These is no benefit from this kernel when the SVE vector length is only
128 bits, so skip writing a non-streaming SVE implementation.
Change-Id: Ib12699c5b8b168d004ebc74c0281ea3772ca8d32
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6070786
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Mostly just a straightforward copy of the Neon code ported to
Streaming-SVE, we can use predication to avoid needing an `Any` kernel
and use ST2 to avoid needing a separate ZIP instruction.
These is no benefit from this kernel when the SVE vector length is only
128 bits, so skip writing a non-streaming SVE implementation.
Change-Id: I5ae36afe699b88f119dc545e49c59c5d85e98742
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6070785
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The existing code only makes use of half of the vector lanes in the
RGB565TOARGB macro. In the RGB565To{ARGB,Y} kernels we can load more
data to allow using full vectors, adjusting the "any" kernel macros to
match. For the RGB565ToUVRow kernel we already have plenty of data but
currently call the macro twice as much as needed, so refactor the code
to only call it once but operating with full vectors instead.
Reduction in runtimes observed for selected micro-architectures:
| RGB565ToARGBRow | RGB565ToUVRow | RGB565ToYRow
Cortex-A53 | -35.2% | -28.8% | -31.1%
Cortex-A55 | -32.5% | -34.4% | -42.9%
Cortex-A510 | -21.6% | -27.7% | -47.2%
Cortex-A76 | -0.9% | -42.0% | -21.4%
Cortex-A720 | -28.6% | -37.2% | -26.1%
Cortex-X1 | -3.2% | -42.3% | -23.4%
Bug: b/42280945
Change-Id: Ib1f68e5b87cc05a1485bbe96cfef87e6ac119fc3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5790974
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- avx2 is pack/perm is mutating order
- cvt method maintains channel order on avx512
Sapphire Rapids
Benchmark of 640x360 on Sapphire Rapids
AVX512BW
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (3547 ms)
[ OK ] LibYUVConvertTest.P010ToNV12_Opt (3186 ms)
AVX2
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (4000 ms)
[ OK ] LibYUVConvertTest.P010ToNV12_Opt (3190 ms)
SSE2
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (5433 ms)
[ OK ] LibYUVConvertTest.P010ToNV12_Opt (4840 ms)
Skylake Xeon
Now vpmovuswb
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (7946 ms)
[ OK ] LibYUVConvertTest.P010ToNV12_Opt (7071 ms)
Was vpackuswb
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (7684 ms)
[ OK ] LibYUVConvertTest.P010ToNV12_Opt (7059 ms)
Switch from vpunpcklwd to vpbroadcastw for scale value parameter
Was
vpunpcklwd %%xmm2,%%xmm2,%%xmm2
vbroadcastss %%xmm2,%%ymm2
Now
vpbroadcastw %%xmm2,%%ymm2
Bug: 357439226, 357721018
Change-Id: Ifc9c82ab70dba58af6efa0f57f5f7a344014652e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5787040
Reviewed-by: Wan-Teh Chang <wtc@google.com>
- convert full Y plane with row coalescing if possible
- convert rows of UV from 10 bit to 8 bit then call MergeUV
libyuv_test '--gunit_filter=*010ToNV12_Opt' --libyuv_width=3840 --libyuv_height=2160 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
Note: Google Test filter = *010ToNV12_Opt
Skylake Xeon Was 2 pass planes
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (4512 ms)
Now 2 pass rows
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (2400 ms)
[ OK ] LibYUVConvertTest.P010ToNV12_Opt (2265 ms)
On Samsung S23
libyuv_test --gunit_filter=*.????ToNV12_Opt --libyuv_width=3840 --libyuv_height=2160 --libyuv_repeat=1000'
Was
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (3563 ms)
Now
[ OK ] LibYUVConvertTest.AYUVToNV12_Opt (3068 ms
[ OK ] LibYUVConvertTest.ARGBToNV12_Opt (2990 ms
[ OK ] LibYUVConvertTest.ABGRToNV12_Opt (2904 ms
[ OK ] LibYUVConvertTest.P010ToNV12_Opt (1177 ms
[ OK ] LibYUVConvertTest.I010ToNV12_Opt (1150 ms <- now
[ OK ] LibYUVConvertTest.I444ToNV12_Opt (1118 ms
[ OK ] LibYUVConvertTest.MM21ToNV12_Opt (1008 ms
[ OK ] LibYUVConvertTest.UYVYToNV12_Opt (1007 ms
[ OK ] LibYUVConvertTest.YUY2ToNV12_Opt (938 ms)
[ OK ] LibYUVConvertTest.NV21ToNV12_Opt (496 ms)
[ OK ] LibYUVConvertTest.I420ToNV12_Opt (466 ms)
Bug: b/357439226, b/357721018
Change-Id: I48405929ae835b171e7d556a16794eac22c50ae9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5782404
Reviewed-by: Wan-Teh Chang <wtc@google.com>
I010, also known as YUV420P10, is 10 bit YUV pixel format with 3 planes.
Both I010 and NV12 are 4:2:0 subsampling. NV12 has a Y plane, and an
interleaved UV plane.
Bug: 357721018
Change-Id: If215529b9eda8e0fb32aed666ca179c90244aaff
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5764823
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- P010 and NV12 have the same layout: Full size Y plane and half size UV plane.
P010 and NV12 are 4:2:0 subsampling
- P010 uses upper 10 bits of 16 bit elements
- NV12 uses 8 bit elements
- The Convert16To8 used internally will discard the low 2 bits.
- UV order is the same - U first in memory, followed by V, interleaved
- UV plane is be rounded up in size to allow odd size Y to have UV values
- Similar code could be used to convert P210ToNV16, P410ToNV24, with the size
of the UV plane affected by subsampling 4:2:2 and 4:4:4 variants.
Bug: b/357439226
Change-Id: I5d6ec84d97d0e0cc4008eeb18a929ea28570d6d9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5761958
Reviewed-by: Wan-Teh Chang <wtc@google.com>
These kernels are mostly identical to each other except for the order of
the results, so we can use a single macro to parameterize the pairwise
addition and use the same macro for both implementations, just with the
register order flipped.
Similar to other 2x2 kernels the implementation here differs slightly
for the last element if the problem size is odd, so use an "any" kernel
to avoid needing to handle this in the common code path.
Observed reduction in runtime compared to the existing Neon code:
| AYUVToUVRow | AYUVToVURow
Cortex-A510 | -33.1% | -33.0%
Cortex-A720 | -25.1% | -25.1%
Cortex-X2 | -59.5% | -53.9%
Cortex-X4 | -39.2% | -39.4%
Bug: libyuv:973
Change-Id: I957db9ea31c8830535c243175790db0ff2a3ccae
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5522316
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
By maintaining the interleaved format of the data we can use a common
kernel for all input channel orderings and simply pass a different
vector of constants instead.
A similar approach is possible with only Neon by making use of
multiplies and repeated application of ADDP to combine channels, however
this is slower on older cores like Cortex-A53 so is not pursued further.
For odd problem sizes we need a slightly different implementation for
the final element, so introduce an "any" kernel to address that rather
than bloating the code for the common case.
Observed affect on runtimes compared to the existing Neon kernels:
| Cortex-A510 | Cortex-A720 | Cortex-X2
ABGRToUVJRow | -15.5% | +5.4% | -33.1%
ABGRToUVRow | -15.6% | +5.3% | -35.9%
ARGBToUVJRow | -10.1% | +5.4% | -32.7%
ARGBToUVRow | -10.1% | +5.4% | -29.3%
BGRAToUVRow | -15.5% | +4.6% | -32.8%
RGBAToUVRow | -10.1% | +4.2% | -36.0%
Bug: libyuv:973
Change-Id: I041ca44db0ae8a2adffcdf24e822eebe962baf33
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5505537
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Add detect linux kernel version number in util/cpuid
adbrun -- blaze-bin/third_party/libyuv/cpuid
Kernel Version 4.14
Cpu Flags 0x7
Has ARM 0x2
Bug: libyuv:970
Change-Id: I655ed598db3655ca8448be08f1d71fbc328ced66
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5207990
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Change ScalePlane(), ScalePlane_16(), and ScalePlane_12() to return int
so that they can report memory allocation failures (by returning 1).
BUG=libyuv:968
Change-Id: Ie5c183ee42e3d595302671f9ecb7b3472dc8fdb5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5005031
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
* Run on SiFive internal FPGA:
TestARGBExtractAlpha(~3.2x vs scalar)
TestARGBCopyYToAlpha(~1.6x vs scalar)
Change-Id: I36525c67e8ac3f71ea9d1a58c7dc15a4009d9da1
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4617955
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
They re-use the same method as I410/I210 to I420 with a depth
value of 12 instead of 10.
Bug: b/268505204
Change-Id: I299862b4556461d8c95f0fc1dcd5260e1c1f25cd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4581867
Commit-Queue: Vignesh Venkatasubramanian <vigneshv@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
* Run on SiFive internal FPGA:
MergeUVPlane_Opt(~6x vs scalar)
SplitUVPlane_Opt(~6x vs scalar)
TestCopyPlane(~8x vs scalar)
ARGBInterpolate0_Opt(~10x vs scalar)
ARGBInterpolate64_Opt(~9x vs scalar)
ARGBInterpolate168_Opt(~9x vs scalar)
ARGBInterpolate192_Opt(~8.5x vs scalar)
ARGBInterpolate255_Opt(~8x vs scalar)
Bug: libyuv:956
Change-Id: I8372341865f75f42e30371ef943d5c2e4be7b79a
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4574186
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Run on SiFive internal FPGA:
ARGBToJ400_Opt (~6x vs scalar)
RGBAToJ400_Opt (~6x vs scalar)
RGB24ToJ400_Opt (~5.5x vs scalar)
LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10
Change-Id: Ia3ce8cea7962fbd8618cc23e850a7913c9cabf4f
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4521783
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>