2 Commits

Author SHA1 Message Date
George Steed
e52007eff9 [AArch64] Add SVE2 implementation for I444ToARGBRow
Being able to use SVE2 functionality for these kernels has a number of
performance wins compared to the existing Neon code:

* For the Y component calculation we are able to use UMULH, versus the
  existing UMULL x2 + UZP2 sequence in Neon.

* For the RGBTORGBA8 calculation we are able to take advantage of
  interleaving narrowing instructions, allowing us to use ST2 rather
  than ST4 for the store. This is a big performance win on some
  micro-architectures where ST4 is costly.

* The use of predication means we do not need to add "any" kernels, we
  can simply rerun the calculation with a not-full predicate for the
  final iteration.

To avoid the overhead of generating a predicate register on every
iteration we duplicate the loop body and only generate a predicate on
the final iteration of the loop. This costs a small amount on the final
iteration but should still be significantly quicker than the overhead of
a function call needed by the "any" cases. Duplicating the loop body to
reduce the use of the WHILELT instruction improves little core
performance by ~12% by itself but has negligable impact on other
micro-architectures.

Reduction in runtime for the new SVE2 implementation compared to the
existing Neon implementation on selected micro-architectures:

Cortex-A510: -36.5%
Cortex-A720: -17.3%
  Cortex-X2: -11.3%

Bug: libyuv:973
Change-Id: I2a485f0dfa077a56f96b80a667ad38bbea47b4b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424739
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-04-09 03:11:01 +00:00
George Steed
9a8be20def [AArch64] Add :libyuv_sve library in preparation for SVE kernels
This commit only adds the bare minimum to get the new library building
through GN, the actual content of row_sve.cc is empty for now until we
start porting some kernels across.

Bug: libyuv:973
Change-Id: Ibdf4fc258761f3e507d700f27a405099c667ac75
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424738
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-04-09 03:10:01 +00:00