The existing disabled gtest rotate tests fail because the existing "any"
kernels always assume we are processing height=8 rows at a time. This
was recently changed to 16 on AArch64 which triggered this bug.
To fix this, amend the TANY macro to explicitly specify the fallback
kernel, such that we can use the height=16 kernel to match the SIMD
optimized version where necessary. Also change other architecture
versions to match.
Bug: b/352351302
Change-Id: I8080fa8f44c7c67fa970a78fb426f2f801a9a00e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5703585
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The existing Neon code only makes use of 64-bit vectors throughout which
limits the performance on larger cores. To avoid this, swap the Neon
code from a Wx8 implementation to a Wx16 implementation and process
blocks of 16 full vectors at a time.
The original code also handled widths that were not exact multiples of
16, however this should already be handled by the "any" kernel so it is
removed.
Finally, avoid duplicating the TransposeWx16_C fallback kernel
definition in all architectures that need it, and just put it once in
rotate_common.cc instead.
Observed speedups for TransposePlane across a range of
micro-architectures:
Cortex-A53: -40.0%
Cortex-A55: -20.7%
Cortex-A57: -43.9%
Cortex-A510: -43.5%
Cortex-A520: -43.9%
Cortex-A720: -31.1%
Cortex-X2: -38.3%
Cortex-X4: -43.6%
Change-Id: Ic7c4d5f24eb27091d743ddc00cd95ef178b6984e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5545459
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Optimize two functions in source/rotate_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: Idf670a1bc078f6284a499a292e0cb795f5b603b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.
Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).
In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.
BUG=libyuv:804
Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor. This should be named consistently.
Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.
TBR=harryjin@google.com
BUG=libyuv:562
Review URL: https://codereview.chromium.org/1677633002 .