George Steed c613c3f102 [AArch64] Add SVE2 implementations for RAWTo{ARGB,RGBA}Row
We can construct particular predicates to load only up to 3/4 of a full
vector, allowing us to use TBL to shuffle elements into the correct
place rather than needing to rely on more expensive LD3 or ST4
instructions.

Reduction in runtimes observed compared to the existing Neon
implementation:

            | RAWToARGBRow | RAWToRGBARow
Cortex-A510 |       -32.4% |       -31.9%
Cortex-A720 |       -15.7% |       -15.6%
  Cortex-X2 |       -24.6% |       -24.4%

Bug: libyuv:973
Change-Id: I271c625d97bab3b0e08ac1e9d7fcf7d18f3d6894
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5631542
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2024-07-06 22:40:15 +00:00
..
libyuv [AArch64] Add SVE2 implementations for RAWTo{ARGB,RGBA}Row 2024-07-06 22:40:15 +00:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00