libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-07-30 16:26:19 +08:00

History

George Steed 432d186116 [AArch64] Add Neon dot-product implementation for ARGBSepiaRow We can use the dot product instructions to apply the coefficients directly without the need for LD4 de-interleaving load instructions, since these are known to be slow on some micro-architectures. ST4 is also known to be slow on more modern micro-architectures, however avoiding this is left for a future SVE implementation where we can make use of interleaving-narrowing instructions. Reduction in cycle counts observed compared to existing Neon code: Cortex-A55: -5.8% Cortex-A510: -18.9% Cortex-A76: -21.8% Cortex-A720: -30.2% Cortex-X1: -28.6% Cortex-X2: -23.4% Bug: b/42280946 Change-Id: I5887559649cc805a810d867b652c85d48285657d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5790970 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-09-16 04:31:35 +00:00
..
libyuv	[AArch64] Add Neon dot-product implementation for ARGBSepiaRow	2024-09-16 04:31:35 +00:00
libyuv.h	NV12 Copy, include scale_uv.h	2020-12-08 18:54:16 +00:00

George Steed 432d186116 [AArch64] Add Neon dot-product implementation for ARGBSepiaRow

We can use the dot product instructions to apply the coefficients
directly without the need for LD4 de-interleaving load instructions,
since these are known to be slow on some micro-architectures.

ST4 is also known to be slow on more modern micro-architectures, however
avoiding this is left for a future SVE implementation where we can make
use of interleaving-narrowing instructions.

Reduction in cycle counts observed compared to existing Neon code:

 Cortex-A55:  -5.8%
Cortex-A510: -18.9%
 Cortex-A76: -21.8%
Cortex-A720: -30.2%
  Cortex-X1: -28.6%
  Cortex-X2: -23.4%

Bug: b/42280946
Change-Id: I5887559649cc805a810d867b652c85d48285657d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5790970
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>

2024-09-16 04:31:35 +00:00

libyuv

[AArch64] Add Neon dot-product implementation for ARGBSepiaRow

2024-09-16 04:31:35 +00:00

libyuv.h

NV12 Copy, include scale_uv.h

2020-12-08 18:54:16 +00:00