mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2025-12-06 16:56:55 +08:00
[docs] Add documentation on AArch64 SME for feature detection
Give a brief explanation of the Scalable Matrix Extension and where we believe it will be beneficial, in line with the existing documentation for Neon and SVE. Change-Id: I477b7f293c00740ce8346a96a9a0ad133f4ef1c2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5587508 Reviewed-by: Frank Barchard <fbarchard@chromium.org>
This commit is contained in:
parent
214b4a25c7
commit
cc823114a1
@ -18,7 +18,7 @@ Neon is available and mandatory in AArch64 from the base Armv8.0-A
|
|||||||
architecture. Neon can be used even if later extensions like the Scalable
|
architecture. Neon can be used even if later extensions like the Scalable
|
||||||
Vector Extension (SVE) are also present. The exception to this is if the CPU is
|
Vector Extension (SVE) are also present. The exception to this is if the CPU is
|
||||||
currently operating in streaming mode as introduced by the Scalable Matrix
|
currently operating in streaming mode as introduced by the Scalable Matrix
|
||||||
Extension, which is not currently used in libyuv.
|
Extension, described later.
|
||||||
|
|
||||||
There are also a couple of architecture extensions present for Neon that we can
|
There are also a couple of architecture extensions present for Neon that we can
|
||||||
take advantage of in libyuv:
|
take advantage of in libyuv:
|
||||||
@ -64,6 +64,27 @@ Armv8.6-A or Armv9.1-A, however there is no micro-architecture at time of
|
|||||||
writing where SVE2 is implemented without all previously-mentioned features
|
writing where SVE2 is implemented without all previously-mentioned features
|
||||||
also being implemented.
|
also being implemented.
|
||||||
|
|
||||||
|
### The Scalable Matrix Extension (SME)
|
||||||
|
|
||||||
|
The Scalable Matrix Extension (SME) is an optional feature introduced from
|
||||||
|
Armv9.2-A. SME exists alongside SVE and introduces new execution modes for
|
||||||
|
applications performing extended periods of data processing. In particular SME
|
||||||
|
introduces a few new components of interest:
|
||||||
|
|
||||||
|
* Access to a scalable two-dimensional ZA tile register and new instructions to
|
||||||
|
interact with rows and columns of the ZA tiles. This can be useful for data
|
||||||
|
transformations like transposes.
|
||||||
|
|
||||||
|
* A streaming SVE (SSVE) mode, during which the SVE vector length matches the
|
||||||
|
ZA tile register width. In typical systems where the ZA tile register width
|
||||||
|
is longer than the core SVE vector length, SSVE processing allows for faster
|
||||||
|
data processing, even if the ZA tile register is unused. While the CPU is
|
||||||
|
executing in streaming mode, Neon instructions are unavailable.
|
||||||
|
|
||||||
|
* When both SSVE and the ZA tile registers are enabled there are additional
|
||||||
|
outer-product instructions accumulating into a whole ZA tile, suitable for
|
||||||
|
accelerating matrix arithmetic. This is likely less useful in libyuv.
|
||||||
|
|
||||||
## Linux and Android
|
## Linux and Android
|
||||||
|
|
||||||
On AArch64 running under Linux and Android, features are detected by inspecting
|
On AArch64 running under Linux and Android, features are detected by inspecting
|
||||||
|
|||||||
Loading…
x
Reference in New Issue
Block a user