This reverts commit ec75df5894845b8d6b1341885a78db1de83decd8.
Reason for revert: <INSERT REASONING HERE>
Original change's description:
> ComputeHammingDistance reduce SIMD loop to 1 call when possible.
>
> 32 bit x86 has high overhead due to -fpic. So this reduces the
> number of calls by 1.
>
> TBR=kjellander@chromium.org
> Bug: libyuv:701
> Test: BenchmarkHammingDistance
> Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
> Reviewed-on: https://chromium-review.googlesource.com/701755
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Reviewed-by: Cheng Wang <wangcheng@google.com>
TBR=rrwinterton@gmail.com,fbarchard@google.com,wangcheng@google.com
Change-Id: Ia61e8558a8f083c14be5f51e0e141550b6f2b5c1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: libyuv:701
Reviewed-on: https://chromium-review.googlesource.com/707823
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
32 bit x86 has high overhead due to -fpic. So this reduces the
number of calls by 1.
TBR=kjellander@chromium.org
Bug: libyuv:701
Test: BenchmarkHammingDistance
Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
Reviewed-on: https://chromium-review.googlesource.com/701755
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
If length of HammingDistance was not a multiple of 4,
the result was incorrect. The old tests did not catch this
so a new test is done to count 1s.
Bug: libyuv:740
Test: LibYUVCompareTest.TestHammingDistance
Change-Id: I93db5437821c597f1f162ac263d4a594bb83231f
Reviewed-on: https://chromium-review.googlesource.com/699614
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Under cache thrashing circumstances, ldp/stp perform better than
ld1/st1 on QC820/QC821 CPUs. Same performance when hitting cache.
Bug: libyuv:738
Test: LibYUVPlanarTest.TestCopySamples_Opt (445 ms)
Change-Id: Ib6a0a5d5e6a1b7ef667b9bb2edb39d681cf3614c
Reviewed-on: https://chromium-review.googlesource.com/691281
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Full color test is the slowest of the unittests, and not catching any
additional bugs at the moment. Step thru range of 0 to 255 in steps of
5 to speed up the test. 255 is 3 * 5 * 17, so any of those primes would
hit 0 and 255 exactly.
Was LibYUVColorTest.TestFullYUV (896 ms)
Now LibYUVColorTest.TestFullYUV (212 ms)
TBR=kjellander@chromium.org
Bug: libyuv:736
Test: LibYUVColorTest.TestFullYUV
Change-Id: I5b55fb07ada0dc7bdc3c3c20569d36bf09bb3804
Reviewed-on: https://chromium-review.googlesource.com/672064
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Use ld2 to load even and odd pixels into different registers
and hadd to half add them to each other.
Previously used paired and shift.
TBR=kjellander@chromium.org
BUG=libyuv:723
TEST=ScaleDownBy2_Linear
Change-Id: I3ec72bcf7d4c746837217496c301eb4e4ad963cf
Reviewed-on: https://chromium-review.googlesource.com/644113
Reviewed-by: Cheng Wang <wangcheng@google.com>
urhadd is a rounded average. Linear filter wants to average
horizontally, so use ld2 to separate even and odd pixels.
TBR=jkellander@chromium.org
BUG=None
TEST=LibYUVScaleTest.*ScaleDownBy2*
Change-Id: Id667288a030e72ce8e1c1d6719b69c555c0db063
Reviewed-on: https://chromium-review.googlesource.com/642448
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Roughly. instead of 4 loads and 8 multiples, use 1 load and 2 multiples
4 times over. The original code, as with the C code from clang and gcc,
did all the loads, then all the math, then the store. The new code
does a load, then the math, then the next load, etc.
This schedules better on current arm 64 cpus.
Number of registers also reduced, reusing the same registers.
HiSilicon ARM A73:
Now
TestGaussRow_Opt (890 ms)
TestGaussCol_Opt (571 ms)
Was
TestGaussRow_Opt (1061 ms)
TestGaussCol_Opt (595 ms)
Qualcomm 821 (Pixel):
Now
TestGaussRow_Opt (571 ms)
TestGaussCol_Opt (474 ms)
Was
TestGaussRow_Opt (751 ms)
TestGaussCol_Opt (520 ms)
TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt
Reviewed-on: https://chromium-review.googlesource.com/627478
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Change-Id: I5ec81191d460801f0d4a89f0384f89925ff036de
Reviewed-on: https://chromium-review.googlesource.com/634448
Commit-Queue: Frank Barchard <fbarchard@google.com>
Downsample 16x2 to 8x1 with box filtering
[ RUN ] LibYUVScaleTest.TestScaleRowUp2_16
[ OK ] LibYUVScaleTest.TestScaleRowUp2_16 (579 ms)
[ RUN ] LibYUVScaleTest.TestScaleRowDown2Box_16
[ OK ] LibYUVScaleTest.TestScaleRowDown2Box_16 (329 ms)
[----------] 2 tests from LibYUVScaleTest (909 ms total)
TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowUp2_16 and LibYUVScaleTest.TestScaleRowDown2Box_16
Change-Id: I457d44123f2751e5f71bf3935401fff74b8e9db2
Reviewed-on: https://chromium-review.googlesource.com/608876
Reviewed-by: Cheng Wang <wangcheng@google.com>
add ScaleMaxSamples_NEON function with max
done on original values.
TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleMaxSamples_Opt
Change-Id: Id99338860782b10ffd24f66242eb42014c2e229e
Reviewed-on: https://chromium-review.googlesource.com/614685
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
This reverts commit 1dda4cb0b7bd564e646d6ec2efee497fcd7146ca.
Reason for revert: build error on jpeg FILE
Original change's description:
> include <new> header for benefit of new clang builds
>
> TBR=kjellander@chromium.org
> BUG=libyuv:712
> TEST=local builds still work
>
> Change-Id: I040e8edc40aafd820d2a29629fe7aec5c049bc6b
> Reviewed-on: https://chromium-review.googlesource.com/576971
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Commit-Queue: Frank Barchard <fbarchard@google.com>
TBR=kjellander@chromium.org,fbarchard@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: libyuv:712
Change-Id: I4cf4e26eadb476017dc95e6c9578092204f088a3
Reviewed-on: https://chromium-review.googlesource.com/601211
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
NaCL has been disabled for awhile, so the code
will still build, but only with C versions.
This change removes the MEMACCESS() macros from
Neon and Neon64 source.
BUG=libyuv:702
TEST=try bots build for arm.
R=kjellander@chromium.org
Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19
Reviewed-on: https://chromium-review.googlesource.com/528332
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
instead of casting int to int64, pass the int
and use %w modifier to use the word version of the register.
TBR=kjellander@chromium.org
BUG=libyuv:706
TEST=git cl lint
R=wangcheng@google.com
Change-Id: Iee5a70f04d928903ca8efac00066b8821a465e36
Reviewed-on: https://chromium-review.googlesource.com/528381
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Summing 16 bit hamming codes restricts the maximum length,
but saves an inner loop instruction. The outer loop can sum the
values.
32 bit Neon
Now BenchmarkHammingDistance_Opt (78 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
64 bit Neon
Now BenchmarkHammingDistance_Opt (85 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
R=wangcheng@google.comTBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae
Reviewed-on: https://chromium-review.googlesource.com/526932
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The 32 bit version of HammingDistance_NEON accumulates
using vertical add and paired adds, which takes 3 instructions
instead of 4.
The instructions are also portable between 32 and 64 bit.
Was BenchmarkHammingDistance_Opt (105 ms)
Now BenchmarkHammingDistance_Opt (90 ms)
TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
BenchmarkHammingDistance_Opt (90 ms)
Change-Id: If9e621e0bd2fe2492a1532056f8a1b451ba53d7e
Reviewed-on: https://chromium-review.googlesource.com/526365
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
existing test passes
out/Release/libyuv_unittest --gtest_filter=*Blend* --libyuv_width=33 --libyuv_height=16
new test added
BUG=libyuv:705
TEST=LibYUVScaleTest.TestScaleOdd
Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704
Reviewed-on: https://chromium-review.googlesource.com/524932
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The CpuId function is a wrapper for the intrinsic, or
implemented with inline if unavailable. It had been
using uint32, but the intrinsics use int, so it was causing
casting and lint warnings. This change makes the internal
implementation use int.
Casting was also done for xgetbv, and the cast is simply
removed, and is not causing a build error.
MipCpuCaps was doing strlen to check for white space after the
instruction set. Arm also does this but with a hard coded offset.
This was causing a cast from size_t to int, which produced a lint
warning. The change removes the white space detect.
In theory the code could be used to detect SSE vs SSE2, and it would
need to check SSE is followed by a space or end of line. But this
code is only used on Arm and Mips, where there there is one form
of SIMD detected. e.g. MSA for mips. If a new instruction set is
added with a similar name, the write space check could be reintroduced.
But its more likely the code can be rewritten to use a better form
of detection by then. Or remove detection and require the instructions
BUG=libyuv:641
TEST=try bots build on all platforms without error and lint is clean
Change-Id: I9f55f8e57bba0f78571bdddbe63b945dea3e8809
Reviewed-on: https://chromium-review.googlesource.com/514524
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Wan-Teh Chang <wtc@chromium.org>
Reduce number of atomic references to cpu_info by making
InitCpuFlags call MaskCpuFlags and return the same value.
BUG=libyuv:641
TEST=libyuv_unittests pass
Change-Id: I5dfff8f7a10671bc8ef3ec0ed6f302791e752faa
Reviewed-on: https://chromium-review.googlesource.com/514145
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Detect the compiler's support of C11 atomics, and use C11 atomics when
available.
Note that libyuv::MaskCpuFlags() is still not thread-safe.
BUG=libyuv:641
TEST= cpu_thread_test.cc adds a pthread based test
R=wangcheng@google.com
Change-Id: If05b1e16da833105a0159ed67ef20f4e61bc7abd
Reviewed-on: https://chromium-review.googlesource.com/510079
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
BUG=libyuv:703
TEST=compile and disassemble. see registers used not stack.
R=wangcheng@google.com
Change-Id: Iaa07ee5d0c35252994491bb2868276e161149efd
Reviewed-on: https://chromium-review.googlesource.com/500427
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
BUG=libyuv:701
TEST=built and disassembled for aarch64
R=kjellander@chromium.org
Change-Id: I7712b1c7934e5dfb55fda1fa7c8405c32d6964ce
Reviewed-on: https://chromium-review.googlesource.com/495327
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
The verion of clang in ndk r14 (3.9) has a built in llvm assembler
that does not have the sgtu pseudo instruction.
sltu is the actual instruction, so switch the 2 operands and use
the instruction instead of the pseudo op.
BUG=libyuv:700
TEST=try bots build mips without error.
Change-Id: I2d5f94f81acbd56cdedea011e7d9308979e19079
Reviewed-on: https://chromium-review.googlesource.com/494026
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Revert the workaround and fix it properly by passing the
additional necessary flag to the compiler.
BUG=libyuv:700
Change-Id: I1c893a8acb5079decbee6963b689424bf2f99f4f
Reviewed-on: https://chromium-review.googlesource.com/487881
Reviewed-by: Frank Barchard <fbarchard@google.com>