libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-06-16 00:46:04 +08:00

History

George Steed f27b983f38 [AArch64] Add SVE2 implementation of DivideRow_16 SVE contains the UMULH instruction which allows us to multiply and take the high half of the result in a single instruction rather than needing separate widening multiply and then narrowing shift steps. Observed reduction in runtime compared to the existing Neon code: Cortex-A510: -21.2% Cortex-A520: -20.9% Cortex-A715: -47.9% Cortex-A720: -47.6% Cortex-X2: -5.2% Cortex-X3: -2.6% Cortex-X4: -32.4% Cortex-X925: -1.5% Bug: b/42280942 Change-Id: I25154699b17772db1fb5cb84c049919181d86f4b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5975318 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-11-07 18:46:02 +00:00
..
libyuv	[AArch64] Add SVE2 implementation of DivideRow_16	2024-11-07 18:46:02 +00:00
libyuv.h	NV12 Copy, include scale_uv.h	2020-12-08 18:54:16 +00:00

George Steed f27b983f38 [AArch64] Add SVE2 implementation of DivideRow_16

SVE contains the UMULH instruction which allows us to multiply and take
the high half of the result in a single instruction rather than needing
separate widening multiply and then narrowing shift steps.

Observed reduction in runtime compared to the existing Neon code:

Cortex-A510: -21.2%
Cortex-A520: -20.9%
Cortex-A715: -47.9%
Cortex-A720: -47.6%
  Cortex-X2:  -5.2%
  Cortex-X3:  -2.6%
  Cortex-X4: -32.4%
Cortex-X925:  -1.5%

Bug: b/42280942
Change-Id: I25154699b17772db1fb5cb84c049919181d86f4b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5975318
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>

2024-11-07 18:46:02 +00:00

libyuv

[AArch64] Add SVE2 implementation of DivideRow_16

2024-11-07 18:46:02 +00:00

libyuv.h

NV12 Copy, include scale_uv.h

2020-12-08 18:54:16 +00:00