- Add +i8mm build option for sve ARGBToUV which uses usdot
- util/cpuid Get cpu count (windows, macos, linux)
- For each x86 cpu, detect hybrid (e-core)
- Includes a comment fix for ubsan unittest
- Bump version
- Apply clang format to util/*.c as well as all *.cc/*.h
Bug: 424637372
Change-Id: I08310e18051fff62c9e4e4a10d1e4361871119ac
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6635640
Reviewed-by: Wan-Teh Chang <wtc@google.com>
- use negative coefficients for UV to allow -128
- change shift to truncate instead of round for UV
- adapt all row_gcc RGB to UV into matrix functions
- add -DLIBYUV_ENABLE_ROWWIN to allow clang on Windows to use row_win.cc
Bug: 381138208
Change-Id: I6016062c859faf147a8a2cdea6c09976cbf2963c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6277710
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
We can use the Neon dot-product instructions as a slightly faster
widening accumulation. This also has the advantage of widening to 32
bits so avoids the risk of overflow present in the original Neon code.
Reduction in runtimes observed for HammingDistance compared to the
existing Neon code:
Cortex-A55: -4.4%
Cortex-A510: -26.5%
Cortex-A76: -8.1%
Cortex-A720: -15.5%
Cortex-X1: -4.1%
Cortex-X2: -5.1%
Bug: libyuv:977
Change-Id: I9e5e10d228c339d905cb2e668a9811ff0a6af5de
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5490049
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
The kernel is only ever called with count as a multiple of 32 so it is
safe to unroll this and maintain two accumulators.
Reduction in runtime observed compared to the existing
SumSquareError_NEON_DotProd implementation:
Cortex-A55: -28.2%
Cortex-A510: -27.6%
Cortex-A76: -33.0%
Cortex-A720: -35.3%
Cortex-X1: -16.9%
Cortex-X2: -13.3%
Bug: libyuv:977
Change-Id: Iee423106c38e97cc38007d73fa80e8374dd96721
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5490048
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
The Neon dot-product instructions perform two widening steps rather than
one, saving us the need to widen the absolute difference to 16-bits
before accumulating. Additionally, the dot-product instructions tend to
have better performance characteristics than traditional widening
multiply instructions like SMLAL used in the existing
SumSquareError_NEON code.
Observed reduction in runtimes compared to the existing Neon kernel:
Cortex-A55: -9.1%
Cortex-A510: -36.7%
Cortex-A76: -37.6%
Cortex-A720: -48.8%
Cortex-X1: -56.1%
Cortex-X2: -42.6%
Bug: libyuv:977
Change-Id: Ie20c69040cc47a803d8e95620d31e0bf1e1dac12
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463945
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The "memory" clobber needs to be present even if the asm does not store
anything to memory, since otherwise the compiler would be allowed to
reorder earlier stores to the pointers after they would be needed by the
asm.
Also fix up the zero-initialisation of accumulators in
SumSquareError_NEON, since EOR'ing a register by itself is not a
recognised zeroing idiom on most AArch64 micro-architectures.
Bug: libyuv:976
Change-Id: I3175367abf6f59db8371b4478f1156950277d7c5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5378705
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
NaCL has been disabled for awhile, so the code
will still build, but only with C versions.
This change removes the MEMACCESS() macros from
Neon and Neon64 source.
BUG=libyuv:702
TEST=try bots build for arm.
R=kjellander@chromium.org
Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19
Reviewed-on: https://chromium-review.googlesource.com/528332
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Summing 16 bit hamming codes restricts the maximum length,
but saves an inner loop instruction. The outer loop can sum the
values.
32 bit Neon
Now BenchmarkHammingDistance_Opt (78 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
64 bit Neon
Now BenchmarkHammingDistance_Opt (85 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
R=wangcheng@google.comTBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae
Reviewed-on: https://chromium-review.googlesource.com/526932
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The 32 bit version of HammingDistance_NEON accumulates
using vertical add and paired adds, which takes 3 instructions
instead of 4.
The instructions are also portable between 32 and 64 bit.
Was BenchmarkHammingDistance_Opt (105 ms)
Now BenchmarkHammingDistance_Opt (90 ms)
TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
BenchmarkHammingDistance_Opt (90 ms)
Change-Id: If9e621e0bd2fe2492a1532056f8a1b451ba53d7e
Reviewed-on: https://chromium-review.googlesource.com/526365
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
BUG=libyuv:703
TEST=compile and disassemble. see registers used not stack.
R=wangcheng@google.com
Change-Id: Iaa07ee5d0c35252994491bb2868276e161149efd
Reviewed-on: https://chromium-review.googlesource.com/500427
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
BUG=libyuv:701
TEST=built and disassembled for aarch64
R=kjellander@chromium.org
Change-Id: I7712b1c7934e5dfb55fda1fa7c8405c32d6964ce
Reviewed-on: https://chromium-review.googlesource.com/495327
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
clangcl use compare_win for 32 bit, allowing fallback and enabling avx2 code for clang.
move defines/protos to compare_row.h
fix issue with odd width ARGBCopyAlpha functions by copying destination to temp buffer, then doing alpha copy, then copy back to destination.
R=harryjin@google.comTBR=harryjin@google.com
BUG=libyuv:484
Review URL: https://webrtc-codereview.appspot.com/59379004.