George Steed f27b983f38 [AArch64] Add SVE2 implementation of DivideRow_16
SVE contains the UMULH instruction which allows us to multiply and take
the high half of the result in a single instruction rather than needing
separate widening multiply and then narrowing shift steps.

Observed reduction in runtime compared to the existing Neon code:

Cortex-A510: -21.2%
Cortex-A520: -20.9%
Cortex-A715: -47.9%
Cortex-A720: -47.6%
  Cortex-X2:  -5.2%
  Cortex-X3:  -2.6%
  Cortex-X4: -32.4%
Cortex-X925:  -1.5%

Bug: b/42280942
Change-Id: I25154699b17772db1fb5cb84c049919181d86f4b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5975318
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-11-07 18:46:02 +00:00
..
libyuv [AArch64] Add SVE2 implementation of DivideRow_16 2024-11-07 18:46:02 +00:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00