- remove inline asm which was only for 32 bit
- add ARGBToYMatrixRow_AVX2
- add gn flag libyuv_enable_rowwin=true
Example of building with GN and Ninja:
Without the new flag:
gn gen out/Release "--args=is_debug=false"
ninja -C out/Release
With the new flag:
gn gen out/Release "--args=is_debug=false libyuv_enable_rowwin=true"
ninja -C out/Release
Bug: libyuv:42280806, 477295731, libyuv:42280902, libyuv:439628764
R=dalecurtis@chromium.org, rrwinterton@gmail.com
Change-Id: I451bf814622fba690005c02fbf5816819c6a08c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7765790
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Add ifdef for LIBYUV_UNLIMITED_DATA
Fixed by Gemini just telling it how to build and run the test and to fix it.
Bug: libyuv:353545922
Change-Id: I117a25b75b9616ee2ce6122aa163c2085ed4dc7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7742120
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This one reuses the SIMD implementations for MergeUVRow_ from the
existing ARGBToNV12 functions.
Bug: libyuv:42280902
Change-Id: If0a4be133d657ed0262f29fdd568dac90b49636c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564317
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
This allows for ABGR conversion using the same methods
Bug: libyuv:42280902
Change-Id: I5566e3150b30573a2326a900ce31ab095f8935f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7564316
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
This was implemented by Gemini followed by manual review and some
tweaking for style. The 601 and JPEG constants are fully verified
against the existing non-matrix implementations. On x86 the C-only
versions appear to be about 25% slower than the optimized ones.
Bug: libyuv:42280902
Change-Id: Ia5b7cb499bad5c76faec53f36086ebb18f2b530f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7512030
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Detect if arm cpu support FMMLA instruction
Bug: None
Change-Id: Ia7b83bf2735ddeeb8a85da44177e708c34e4b1fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7085486
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
planar_test.cc was
Error: selected processor does not support `vmrs r3,fpscr' in ARM mode
Error: selected processor does not support `vmsr fpscr,r3' in ARM mode
Bug: None
Change-Id: I2ee0e7191c372277901c94e29d9ed91bbac71af2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7063737
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- detect lack of dot product instruction to infer the cpu is low end
- only run the test on higher end arm
Bug: 416842099
Change-Id: Idd2dd16a624bbba280cf531644440024b12f7ecf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6804632
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
- Was using avgb twice for non-exact and C for exact.
On Skylake Xeon:
Now SSE3
ARGBToJ420_Opt (326 ms)
Was
Exact C
ARGBToJ420_Opt (871 ms)
Not exact AVX2
ARGBToJ420_Opt (237 ms)
Not exact SSSE3
ARGBToJ420_Opt (312 ms)
Bug: 381138208
Change-Id: I6d1081bb52e36f06736c0c6575fa82bb2268629b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6629605
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ben Weiss <bweiss@google.com>
- Add +i8mm build option for sve ARGBToUV which uses usdot
- util/cpuid Get cpu count (windows, macos, linux)
- For each x86 cpu, detect hybrid (e-core)
- Includes a comment fix for ubsan unittest
- Bump version
- Apply clang format to util/*.c as well as all *.cc/*.h
Bug: 424637372
Change-Id: I08310e18051fff62c9e4e4a10d1e4361871119ac
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6635640
Reviewed-by: Wan-Teh Chang <wtc@google.com>
The declarations of ARGBAffineRow_C and ARGBAffineRow_SSE2 and the code
to support those declarations are duplicated in planar_functions.h. They
are already in row.h, so we can simply remove them.
Change-Id: I9b522fdd201ca530f1268bf4200cd2e18b806ba5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6434733
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
The ENABLE_ROW_TESTS macro is not used in convert_test.cc.
Change-Id: Icc50ec465beca81e14a9683a717680e179a541dd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6434620
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
- use negative coefficients for UV to allow -128
- change shift to truncate instead of round for UV
- adapt all row_gcc RGB to UV into matrix functions
- add -DLIBYUV_ENABLE_ROWWIN to allow clang on Windows to use row_win.cc
Bug: 381138208
Change-Id: I6016062c859faf147a8a2cdea6c09976cbf2963c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6277710
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
The existing tests reuse the intermediate buffers between the reference
and optimized implementations. In particular the existing tests appear
to pass even if the optimized implementation is completely empty, so
long as it does not modify the desintation buffers since these are
already filled with correct values from the reference code.
To avoid this, allocate separate buffers for optimized and reference
implementations to store intermediate data between function calls.
Additionally remove unused buffers from HalfMergeUVPlane_Opt tests.
Change-Id: I7e9ea21fc193e7be21cc24e2be0d7a122e068f6e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6074941
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- Remove special case Scale of 1 which used fp16 cvt but requires cpuid
- Port aarch64 to aarch32
- Use C for aarch32 with small (denormal) scale value
Bug: 377693555
Change-Id: I38e207e79ac54907ed6e65118b8109288fddb207
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6043392
Reviewed-by: Wan-Teh Chang <wtc@google.com>
The test case should have the dst width and height, and the src width
and height should be specified by the --libyuv_width and --libyuv_height
options to libyuv_unittest.
Tested:
libyuv_unittest --gtest_filter=LibYUVScaleTest.I420ScaleTo264x216_Box \
--libyuv_width=352 --libyuv_height=288
Bug: b/369963535, b/366045177
Change-Id: I8166a264c9c4840e0d16c0d3c1818c18aebc1b2e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5896466
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The existing behaviour does not round correctly in all cases, so adjust
it to match the existing Neon implementation.
Update the tests to require bit-exactness and disable other
implementations that do not round correctly.
Change-Id: Ie790fb4b4805b555d74d689d83802e1dd4f33df5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5869115
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- Scaling 48 pixels at a time, but calling code checked for 24 pixels
- Added test for scaling to 1080x1920
libyuv_test --gunit_filter=LibYUVScaleTest.I420ScaleTo1080x1920_Box* --libyuv_width=1440 --libyuv_height=2560
Was
libyuv_test --gunit_filter=LibYUVScaleTest.I420ScaleTo1080x1920_Box* --libyuv_width=1440 --libyuv_height=2560
[ RUN ] LibYUVScaleTest.I420ScaleTo1080x1920_Box
Segmentation fault
Traceback (most recent call last):
Now
[ RUN ] LibYUVScaleTest.I420ScaleTo1080x1920_Box
filter 3 - 6741 us C - 3566 us OPT
[ OK ] LibYUVScaleTest.I420ScaleTo1080x1920_Box (43 ms)
Bug: b/366045177
Change-Id: I0ea6c2d6a32b2e7ca44cd030abc9f248115be44a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5857554
Reviewed-by: Wan-Teh Chang <wtc@google.com>
The failure of malloc would make a NULL pointer. But if in this case,
things like reinterpret_cast is done to some shift from the NULL point,
it will return a valid pointer although its content would be Access
Violation area.
Bug: 359949838
Change-Id: Ie73bca426671ee85315b96f187a6de8c955cada6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5789885
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Declare functions as static. Declare functions in a header. Include the
header that declares the functions. Delete undeclared and unused
functions ScaleFilterRows_NEON() and ScaleRowUp2_16_NEON(). Delete
unused function ScaleY() in psnr_main.cc.
Change-Id: I182ec30611df83c61ffd01bbab595cd61fb5f1e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5778601
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- P010 and NV12 have the same layout: Full size Y plane and half size UV plane.
P010 and NV12 are 4:2:0 subsampling
- P010 uses upper 10 bits of 16 bit elements
- NV12 uses 8 bit elements
- The Convert16To8 used internally will discard the low 2 bits.
- UV order is the same - U first in memory, followed by V, interleaved
- UV plane is be rounded up in size to allow odd size Y to have UV values
- Similar code could be used to convert P210ToNV16, P410ToNV24, with the size
of the UV plane affected by subsampling 4:2:2 and 4:4:4 variants.
Bug: b/357439226
Change-Id: I5d6ec84d97d0e0cc4008eeb18a929ea28570d6d9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5761958
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Allow users to set LIBYUV_DISABLE_${FEATURE} environment variables to
disable individual architecture extensions.
Change-Id: I555dd64311789bd6d760e48045ac6734177a730b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5712929
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
A semicolon is treated as the start of a comment by some assemblers
causing the vector length to be reported incorrectly, so use a newline
instead.
- Add volatile asm in row_gcc and row_neon64
Bug: b/5631539
Change-Id: I6b0836fcdd9247ef7b9e8ceda01df3150519ecf8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5666060
Reviewed-by: Justin Green <greenjustin@google.com>
- Some configs have int64 elements off by default.
Disable ScaleDownBy4 row function to avoid compile error
Bug: 344954354
Change-Id: Ie0d74daea72375eff6438ab54cb2803d68d67e52
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5598460
Reviewed-by: James Zern <jzern@google.com>
- ARM Planar test use regular asm volatile syntax
- x86 row functions remove volatile from asm
Bug: 347111119, 347112532
Change-Id: I535b3dfa1a7a19824503bd95584a63b047b0e9a1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5637058
Reviewed-by: Justin Green <greenjustin@google.com>
This commit just adds the kCpuHasSME to represent that the CPU has the
Arm Scalable Matrix Extension enabled, but this commit does not
introduce any code to actually use it yet.
Add a test to check that the HWCAP value is interpreted correctly.
Change-Id: I2de7bca26ca44ff3ee278b59108298a299a171b7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5598869
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>