George Steed ef9833fc70 Add Neon implementation of Convert8To16Row
Add a Neon implementation of the Convert8To16Row kernel. Compared to the
C implementation we can take advantage of knowing that the "scale"
parameter is always an unsigned power of two and fits in 16-bits,
allowing us to combine this with the shift and avoid needing to widen
the input data.

Reduction in run times observed compared to the existing C
implementation:

 Cortex-A55: -44.5%
Cortex-A510: -26.1%
Cortex-A520: -30.6%
 Cortex-A76: -61.6%
Cortex-A710: -57.6%
  Cortex-X1: -46.5%
  Cortex-X2: -54.4%
  Cortex-X3: -57.1%
  Cortex-X4: -55.0%
Cortex-X925: -49.3%

Change-Id: I34b858605ece47e46588c0680a1d2afa7a90d7a0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6516186
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-05-29 13:37:48 -07:00
..
libyuv Add Neon implementation of Convert8To16Row 2025-05-29 13:37:48 -07:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00