George Steed c4a0c8d34a [AArch64] Add SVE2 and SME implementations for Convert8To8Row
SVE can make use of the UMULH instruction to avoid needing separate
widening multiply and narrowing steps for the scale application.

Reduction in runtime for Convert8To8Row_SVE2 observed compared to the
existing Neon implementation:

        Cortex-A510: -13.2%
        Cortex-A520: -16.4%
        Cortex-A710: -37.1%
        Cortex-A715: -38.5%
        Cortex-A720: -38.4%
          Cortex-X2: -33.2%
          Cortex-X3: -31.8%
          Cortex-X4: -31.8%
        Cortex-X925: -13.9%

Change-Id: I17c0cb81661c5fbce786b47cdf481549cfdcbfc7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6207692
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-01-28 15:53:26 -08:00
..
libyuv [AArch64] Add SVE2 and SME implementations for Convert8To8Row 2025-01-28 15:53:26 -08:00
libyuv.h NV12 Copy, include scale_uv.h 2020-12-08 18:54:16 +00:00