InterpolateRow_SME and InterpolateRow_16_SME need special cases to
handle if source_y_fraction is 256 since this would overflow a byte and
can just be a call to memcpy instead.
InterpolateRow_16To8_SME is never called with a source_y_fraction value
of 256 so there is no need for a special case here.
Change-Id: I67805b5db2c411acb93ada626cf414b35620f467
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6074375
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Add a streaming-SVE implementation of CopyRow using normal vector
load/store instructions.
Change-Id: Ia551413f9740a96473fa2e8a0958953be2f4b04e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6074374
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Mostly just a straightforward copy of the Neon code ported to
Streaming-SVE, we can use predication to avoid needing an `Any` kernel.
SVE has a "widening multiply get high half" instruction in UMULH,
however using the same technique as the Neon code to avoid the need for
a widening multiply at all is more performant here.
These is no benefit from this kernel when the SVE vector length is only
128 bits, so skip writing a non-streaming SVE implementation.
Change-Id: Ib12699c5b8b168d004ebc74c0281ea3772ca8d32
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6070786
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Mostly just a straightforward copy of the Neon code ported to
Streaming-SVE, we can use predication to avoid needing an `Any` kernel
and use ST2 to avoid needing a separate ZIP instruction.
These is no benefit from this kernel when the SVE vector length is only
128 bits, so skip writing a non-streaming SVE implementation.
Change-Id: I5ae36afe699b88f119dc545e49c59c5d85e98742
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6070785
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
This is based on an unrolled version of the existing SVE2 code. The
implementation in this case is a pure streaming-SVE (SSVE)
implementation based on the existing SVE2 implementation, we do not use
the ZA tile.
Change-Id: I83d8e58aafd814125b3446fb1c9ec4a5fb56fe3e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913882
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Including addition of a new row_sme.cc file and associated
infrastructure.
The actual implementation in this case is a pure streaming-SVE (SSVE)
implementation based on the existing SVE2 implementation, we do not use
the ZA tile.
Change-Id: Ibc132c55de8d41a107e563b95f842323fef94444
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913881
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>