George Steed 11ff6067a5 [AArch64] Add SVE2 implementation of RAWToRGB24Row
There is no nice way of forming the TBL permute indices here since we
are operating on sets of three bytes at a time, so instead load the
appropriate indices from a static array. We can make use of SVE
predication to ensure we are operating on a multiple of three bytes for
the load/store instructions rather than needing to make use of more
expensive LD3 or ST3 instructions.

Reduction in runtime observed compared to the existing Neon
implementation:

Cortex-A510: -39.2%
Cortex-A720: -34.5%
  Cortex-X2: -31.0%

Bug: libyuv:973
Change-Id: I68560bde7a529e5cec150b0e9d3ffe4341038fb8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5631543
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-07-08 15:55:14 +00:00
..
libyuv [AArch64] Add SVE2 implementation of RAWToRGB24Row 2024-07-08 15:55:14 +00:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00