mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2026-01-01 03:12:16 +08:00
There is no nice way of forming the TBL permute indices here since we
are operating on sets of three bytes at a time, so instead load the
appropriate indices from a static array. We can make use of SVE
predication to ensure we are operating on a multiple of three bytes for
the load/store instructions rather than needing to make use of more
expensive LD4 or ST3 instructions.
Reduction in runtime observed compared to the existing Neon
implementations:
| ARGBToRAWRow | ARGBToRGB24Row
Cortex-A510 | -50.8% | -19.9%
Cortex-A720 | -39.8% | -39.1%
Cortex-X2 | -66.5% | -51.9%
Bug: libyuv:973
Change-Id: Iaead678715a3d70d54cf823391272a6196836769
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5631544
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
|
||
|---|---|---|
| .. | ||
| basic_types.h | ||
| compare_row.h | ||
| compare.h | ||
| convert_argb.h | ||
| convert_from_argb.h | ||
| convert_from.h | ||
| convert.h | ||
| cpu_id.h | ||
| loongson_intrinsics.h | ||
| macros_msa.h | ||
| mjpeg_decoder.h | ||
| planar_functions.h | ||
| rotate_argb.h | ||
| rotate_row.h | ||
| rotate.h | ||
| row.h | ||
| scale_argb.h | ||
| scale_rgb.h | ||
| scale_row.h | ||
| scale_uv.h | ||
| scale.h | ||
| version.h | ||
| video_common.h | ||