George Steed 53b65220da [AArch64] Add Neon dot-product implementation of SumSquareError
The Neon dot-product instructions perform two widening steps rather than
one, saving us the need to widen the absolute difference to 16-bits
before accumulating. Additionally, the dot-product instructions tend to
have better performance characteristics than traditional widening
multiply instructions like SMLAL used in the existing
SumSquareError_NEON code.

Observed reduction in runtimes compared to the existing Neon kernel:

 Cortex-A55:  -9.1%
Cortex-A510: -36.7%
 Cortex-A76: -37.6%
Cortex-A720: -48.8%
  Cortex-X1: -56.1%
  Cortex-X2: -42.6%

Bug: libyuv:977
Change-Id: Ie20c69040cc47a803d8e95620d31e0bf1e1dac12
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463945
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-04-25 20:54:48 +00:00
..
libyuv [AArch64] Add Neon dot-product implementation of SumSquareError 2024-04-25 20:54:48 +00:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00