George Steed 5c12e0b2de [AArch64] Add SVE2 implementations of HalfFloat{,1}Row
For HalfFloat1Row, SVE has direct 16-bit integer to half-float
conversion instructions so there is no need to widen to 32-bits.

For HalfFloatRow, SVE zero-extending loads avoid the need for seperate
UXTL(2) instructions.

Observed reductions in runtime compared to the existing Neon code:

            | HalfFloat1Row | HalfFloatRow
Cortex-A510 |        -38.3% |       -17.3%
Cortex-A520 |        -37.6% |       -18.8%
Cortex-A720 |        -50.1% |        -7.8%
  Cortex-X2 |        -50.2% |        -0.4%
  Cortex-X4 |        -51.5% |       -12.5%

Bug: b/42280942
Change-Id: I445071ccd453113144ce42d465ba03c9ee89ec9e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5975319
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-11-07 18:53:00 +00:00
..
libyuv [AArch64] Add SVE2 implementations of HalfFloat{,1}Row 2024-11-07 18:53:00 +00:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00