3 Commits

Author SHA1 Message Date
George Steed
4f7fd808b7 [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON
The existing Neon code only makes use of 64-bit vectors throughout which
limits the performance on larger cores. To avoid this, swap the Neon
code from a Wx8 implementation to a Wx16 implementation and process
blocks of 16 full vectors at a time.

The original code also handled widths that were not exact multiples of
16, however this should already be handled by the "any" kernel so it is
removed.

Finally, avoid duplicating the TransposeWx16_C fallback kernel
definition in all architectures that need it, and just put it once in
rotate_common.cc instead.

Observed speedups for TransposePlane across a range of
micro-architectures:

 Cortex-A53: -40.0%
 Cortex-A55: -20.7%
 Cortex-A57: -43.9%
Cortex-A510: -43.5%
Cortex-A520: -43.9%
Cortex-A720: -31.1%
  Cortex-X2: -38.3%
  Cortex-X4: -43.6%

Change-Id: Ic7c4d5f24eb27091d743ddc00cd95ef178b6984e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5545459
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-05-21 07:46:42 +00:00
Frank Barchard
804980bbab DetilePlane and unittest for NEON
Bug: libyuv:915, b/215425056
Change-Id: Iccab1ed3f6d385f02895d44faa94d198ad79d693
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3424820
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-31 20:05:55 +00:00
Hao Chen
f8e2da48ae Add optimization functions in rotate_lsx.cc file.
Optimize two functions in source/rotate_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913
Change-Id: Idf670a1bc078f6284a499a292e0cb795f5b603b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00