Use ld2 to load even and odd pixels into different registers
and hadd to half add them to each other.
Previously used paired and shift.
TBR=kjellander@chromium.org
BUG=libyuv:723
TEST=ScaleDownBy2_Linear
Change-Id: I3ec72bcf7d4c746837217496c301eb4e4ad963cf
Reviewed-on: https://chromium-review.googlesource.com/644113
Reviewed-by: Cheng Wang <wangcheng@google.com>
urhadd is a rounded average. Linear filter wants to average
horizontally, so use ld2 to separate even and odd pixels.
TBR=jkellander@chromium.org
BUG=None
TEST=LibYUVScaleTest.*ScaleDownBy2*
Change-Id: Id667288a030e72ce8e1c1d6719b69c555c0db063
Reviewed-on: https://chromium-review.googlesource.com/642448
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Roughly. instead of 4 loads and 8 multiples, use 1 load and 2 multiples
4 times over. The original code, as with the C code from clang and gcc,
did all the loads, then all the math, then the store. The new code
does a load, then the math, then the next load, etc.
This schedules better on current arm 64 cpus.
Number of registers also reduced, reusing the same registers.
HiSilicon ARM A73:
Now
TestGaussRow_Opt (890 ms)
TestGaussCol_Opt (571 ms)
Was
TestGaussRow_Opt (1061 ms)
TestGaussCol_Opt (595 ms)
Qualcomm 821 (Pixel):
Now
TestGaussRow_Opt (571 ms)
TestGaussCol_Opt (474 ms)
Was
TestGaussRow_Opt (751 ms)
TestGaussCol_Opt (520 ms)
TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt
Reviewed-on: https://chromium-review.googlesource.com/627478
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Change-Id: I5ec81191d460801f0d4a89f0384f89925ff036de
Reviewed-on: https://chromium-review.googlesource.com/634448
Commit-Queue: Frank Barchard <fbarchard@google.com>
Downsample 16x2 to 8x1 with box filtering
[ RUN ] LibYUVScaleTest.TestScaleRowUp2_16
[ OK ] LibYUVScaleTest.TestScaleRowUp2_16 (579 ms)
[ RUN ] LibYUVScaleTest.TestScaleRowDown2Box_16
[ OK ] LibYUVScaleTest.TestScaleRowDown2Box_16 (329 ms)
[----------] 2 tests from LibYUVScaleTest (909 ms total)
TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowUp2_16 and LibYUVScaleTest.TestScaleRowDown2Box_16
Change-Id: I457d44123f2751e5f71bf3935401fff74b8e9db2
Reviewed-on: https://chromium-review.googlesource.com/608876
Reviewed-by: Cheng Wang <wangcheng@google.com>
Copy Valgrind scripts that was deleted from Chromium's tools/ to fix the Memcheck bot:
valgrind/chrome_tests.bat
valgrind/chrome_tests.py
valgrind/chrome_tests.sh
valgrind/common.py
valgrind/gdb_helper.py
valgrind/locate_valgrind.sh
valgrind/memcheck_analyze.py
valgrind/valgrind.gni
valgrind/valgrind.sh
valgrind/valgrind_test.py
valgrind_test.py was stripped of its Mac and Dr Memory specific parts, which
we don't use. There's still more cleanup to do, tracked in bugs.webrc.org/7849.
This is similar to changes in https://codereview.webrtc.org/2945753002.
BUG=libyuv:714
NOTRY=True
Change-Id: Ia6ba9bd3d3fca6f2ebe0e4f30e1eb39bb1a66813
Reviewed-on: https://chromium-review.googlesource.com/615162
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Henrik Kjellander <kjellander@chromium.org>
add ScaleMaxSamples_NEON function with max
done on original values.
TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleMaxSamples_Opt
Change-Id: Id99338860782b10ffd24f66242eb42014c2e229e
Reviewed-on: https://chromium-review.googlesource.com/614685
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
This reverts commit 1dda4cb0b7bd564e646d6ec2efee497fcd7146ca.
Reason for revert: build error on jpeg FILE
Original change's description:
> include <new> header for benefit of new clang builds
>
> TBR=kjellander@chromium.org
> BUG=libyuv:712
> TEST=local builds still work
>
> Change-Id: I040e8edc40aafd820d2a29629fe7aec5c049bc6b
> Reviewed-on: https://chromium-review.googlesource.com/576971
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Commit-Queue: Frank Barchard <fbarchard@google.com>
TBR=kjellander@chromium.org,fbarchard@google.com
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: libyuv:712
Change-Id: I4cf4e26eadb476017dc95e6c9578092204f088a3
Reviewed-on: https://chromium-review.googlesource.com/601211
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Take change from webrtc into libyuv autoroll
BUG=libyuv:716
TEST=tools_libyuv/autoroller/roll_deps.py
TBR=kjellander@chromium.org
Change-Id: I81b1eed114b982e336f2e209d7d825094e584295
Reviewed-on: https://chromium-review.googlesource.com/596472
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
In cross builds of chrome/win, the host and target toolchains currently
have the same name. To fix this, rename clang_x64 to win_clang_x64.
Because the toolchain name is also referenced in libyuv, this requires
a five-sided change:
1. Introduce variable containing the toolchain name in src.git
2. Change libyuv to refer to the variable
3. Rename toolchain in src.git (including in the newly introduced var)
4* Let libyuv refer to the new name directly
5. Remove variable again
(See also https://codereview.chromium.org/2463143002)
Bug: 748501
Change-Id: I89fdf1503f1a57992a8336026d4c8d767685d53f
Reviewed-on: https://chromium-review.googlesource.com/585306
Commit-Queue: Nico Weber <thakis@chromium.org>
Reviewed-by: Nico Weber <thakis@chromium.org>
In cross builds of chrome/win, the host and target toolchains currently
have the same name. To fix this, rename clang_x64 to win_clang_x64.
Because the toolchain name is also referenced in libyuv, this requires
a five-sided change:
1. Introduce variable containing the toolchain name in src.git
2* Change libyuv to refer to the variable
3. Rename toolchain in src.git (including in the newly introduced var)
4. Let libyuv refer to the new name directly
5. Remove variable again
(See also https://codereview.chromium.org/2463143002)
TBR=fbarchard
Bug: 748501
Change-Id: Id8398ab5c4615c7c33dfa5ec793fdc8c0a717e57
Reviewed-on: https://chromium-review.googlesource.com/585307
Reviewed-by: Nico Weber <thakis@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Currently, libyuv is setting this config via mac_sdk_min_build_override. The old
meechanism is deprecated, but cannot be removed until chromium is updated to no
longer require mac_sdk_min_build_override.
TBR=kjellander@chromium.org
Bug:chromium:740693
Change-Id: I71533c9ef20ac8d7584d50751ac5437da54e2cb5
Reviewed-on: https://chromium-review.googlesource.com/565636
Reviewed-by: Frank Barchard <fbarchard@google.com>
NaCL has been disabled for awhile, so the code
will still build, but only with C versions.
This change removes the MEMACCESS() macros from
Neon and Neon64 source.
BUG=libyuv:702
TEST=try bots build for arm.
R=kjellander@chromium.org
Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19
Reviewed-on: https://chromium-review.googlesource.com/528332
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
instead of casting int to int64, pass the int
and use %w modifier to use the word version of the register.
TBR=kjellander@chromium.org
BUG=libyuv:706
TEST=git cl lint
R=wangcheng@google.com
Change-Id: Iee5a70f04d928903ca8efac00066b8821a465e36
Reviewed-on: https://chromium-review.googlesource.com/528381
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Summing 16 bit hamming codes restricts the maximum length,
but saves an inner loop instruction. The outer loop can sum the
values.
32 bit Neon
Now BenchmarkHammingDistance_Opt (78 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
64 bit Neon
Now BenchmarkHammingDistance_Opt (85 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
R=wangcheng@google.comTBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae
Reviewed-on: https://chromium-review.googlesource.com/526932
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The 32 bit version of HammingDistance_NEON accumulates
using vertical add and paired adds, which takes 3 instructions
instead of 4.
The instructions are also portable between 32 and 64 bit.
Was BenchmarkHammingDistance_Opt (105 ms)
Now BenchmarkHammingDistance_Opt (90 ms)
TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
BenchmarkHammingDistance_Opt (90 ms)
Change-Id: If9e621e0bd2fe2492a1532056f8a1b451ba53d7e
Reviewed-on: https://chromium-review.googlesource.com/526365
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
existing test passes
out/Release/libyuv_unittest --gtest_filter=*Blend* --libyuv_width=33 --libyuv_height=16
new test added
BUG=libyuv:705
TEST=LibYUVScaleTest.TestScaleOdd
Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704
Reviewed-on: https://chromium-review.googlesource.com/524932
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The CpuId function is a wrapper for the intrinsic, or
implemented with inline if unavailable. It had been
using uint32, but the intrinsics use int, so it was causing
casting and lint warnings. This change makes the internal
implementation use int.
Casting was also done for xgetbv, and the cast is simply
removed, and is not causing a build error.
MipCpuCaps was doing strlen to check for white space after the
instruction set. Arm also does this but with a hard coded offset.
This was causing a cast from size_t to int, which produced a lint
warning. The change removes the white space detect.
In theory the code could be used to detect SSE vs SSE2, and it would
need to check SSE is followed by a space or end of line. But this
code is only used on Arm and Mips, where there there is one form
of SIMD detected. e.g. MSA for mips. If a new instruction set is
added with a similar name, the write space check could be reintroduced.
But its more likely the code can be rewritten to use a better form
of detection by then. Or remove detection and require the instructions
BUG=libyuv:641
TEST=try bots build on all platforms without error and lint is clean
Change-Id: I9f55f8e57bba0f78571bdddbe63b945dea3e8809
Reviewed-on: https://chromium-review.googlesource.com/514524
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Wan-Teh Chang <wtc@chromium.org>
In https://chromium-review.googlesource.com/c/509693/ the From
keyword was removed. This update the script to match that (we
also were no longer using it).
BUG=libyuv:704
NOTRY=True
Change-Id: Iccbbfb426a3acd986fbc036672fb51abc2c5d346
Reviewed-on: https://chromium-review.googlesource.com/513908
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Henrik Kjellander <kjellander@chromium.org>
Reduce number of atomic references to cpu_info by making
InitCpuFlags call MaskCpuFlags and return the same value.
BUG=libyuv:641
TEST=libyuv_unittests pass
Change-Id: I5dfff8f7a10671bc8ef3ec0ed6f302791e752faa
Reviewed-on: https://chromium-review.googlesource.com/514145
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Detect the compiler's support of C11 atomics, and use C11 atomics when
available.
Note that libyuv::MaskCpuFlags() is still not thread-safe.
BUG=libyuv:641
TEST= cpu_thread_test.cc adds a pthread based test
R=wangcheng@google.com
Change-Id: If05b1e16da833105a0159ed67ef20f4e61bc7abd
Reviewed-on: https://chromium-review.googlesource.com/510079
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>