mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2025-12-07 17:26:49 +08:00
We can construct particular predicates to load only up to 3/4 of a full
vector, allowing us to use TBL to shuffle elements into the correct
place rather than needing to rely on more expensive LD3 or ST4
instructions.
Reduction in runtimes observed compared to the existing Neon
implementation:
| RAWToARGBRow | RAWToRGBARow
Cortex-A510 | -32.4% | -31.9%
Cortex-A720 | -15.7% | -15.6%
Cortex-X2 | -24.6% | -24.4%
Bug: libyuv:973
Change-Id: I271c625d97bab3b0e08ac1e9d7fcf7d18f3d6894
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5631542
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
|
||
|---|---|---|
| .. | ||
| libyuv | ||
| libyuv.h | ||