ST2 with 64-bit lanes has good performance on all micro-architectures of
interest and saves us 8 TRN instructions, so use that instead.
Reduction in runtimes observed compared to the existing Neon
implementation:
Cortex-A55: -8.6%
Cortex-A510: -4.9%
Cortex-A520: -6.0%
Cortex-A76: -14.4%
Cortex-A720: -5.3%
Cortex-X1: -13.6%
Cortex-X2: -5.8%
Bug: libyuv:976
Change-Id: I08bb5517bbdc54c4784fce42a885b12f91e7a982
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5581597
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
We already have an "any" helper function set up for this kernel, so use
it to match the other existing architecture paths. This change also
affects the 32-bit Arm paths, which will be cleaned up in a later
commit.
With this change the kernel is now only entered with width as a multiple
of eight, so remove the now-unneeded tail loops.
Also remove volatile specifier from the asm block, it is unnecessary.
Change-Id: If37428ac2d6035a8c27eec9bd80d014a98ac3eb1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5553717
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
The existing Neon code only makes use of 64-bit vectors throughout which
limits the performance on larger cores. To avoid this, swap the Neon
code from a Wx8 implementation to a Wx16 implementation and process
blocks of 16 full vectors at a time.
The original code also handled widths that were not exact multiples of
16, however this should already be handled by the "any" kernel so it is
removed.
Finally, avoid duplicating the TransposeWx16_C fallback kernel
definition in all architectures that need it, and just put it once in
rotate_common.cc instead.
Observed speedups for TransposePlane across a range of
micro-architectures:
Cortex-A53: -40.0%
Cortex-A55: -20.7%
Cortex-A57: -43.9%
Cortex-A510: -43.5%
Cortex-A520: -43.9%
Cortex-A720: -31.1%
Cortex-X2: -38.3%
Cortex-X4: -43.6%
Change-Id: Ic7c4d5f24eb27091d743ddc00cd95ef178b6984e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5545459
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
This allows the linker to move the variables from the .data section to
the .rodata section.
Bug: libyuv:254
Test: out/Release/libyuv_unittest --gtest_filter=* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I6998570f1af4337d7b80313d9e18e36aa20d6ec0
Reviewed-on: https://chromium-review.googlesource.com/777033
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
NaCL has been disabled for awhile, so the code
will still build, but only with C versions.
This change removes the MEMACCESS() macros from
Neon and Neon64 source.
BUG=libyuv:702
TEST=try bots build for arm.
R=kjellander@chromium.org
Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19
Reviewed-on: https://chromium-review.googlesource.com/528332
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
instead of casting int to int64, pass the int
and use %w modifier to use the word version of the register.
TBR=kjellander@chromium.org
BUG=libyuv:706
TEST=git cl lint
R=wangcheng@google.com
Change-Id: Iee5a70f04d928903ca8efac00066b8821a465e36
Reviewed-on: https://chromium-review.googlesource.com/528381
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
BUG=None
TEST=try bots and lint test
Change-Id: I1ab462adf2d309117862c5eb4b244a61ae202951
Reviewed-on: https://chromium-review.googlesource.com/450658
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Inline that uses temporary variables is currently initializing them
to 0 and passing in as output "+r".
This CL replaces the output constraint to "=&r" for most meaning an
output with early write (before inputs). This allows the initialize
to zero step to be removed, saving 1 instruction.
BUG=libyuv:580
TESTED=local libyuv build on gcc/linux and try bots
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1895743008 .