From c6632d43ae84af46b5c5df0fc5d09ffce514bf0a Mon Sep 17 00:00:00 2001 From: George Steed Date: Mon, 22 Apr 2024 10:05:26 +0100 Subject: [PATCH] [AArch64] Impose feature dependencies in detection code The strict architectural requirements between features are reasonably relaxed and difficult to map out fully, in particular: * FEAT_DotProd is architecturally available from Armv8.1-A and becomes mandatory from Armv8.4-A. * FEAT_I8MM is architecturally available from Armv8.1-A and becomes mandatory from Armv8.6-A. It does not strictly depend on FEAT_DotProd being implemented however I am not aware of a micro-architecture where FEAT_I8MM is implemented without FEAT_DotProd also being implemented. * FEAT_SVE is architecturally available from Armv8.2-A. It does not strictly depend on either of FEAT_DotProd or FEAT_I8MM being implemented. The only micro-architecture I am aware of where FEAT_SVE is implemented without FEAT_DotProd and FEAT_I8MM both also being implemented is the Fujitsu A64FX. * FEAT_SVE2 is architecturally available from Armv9.0-A. If FEAT_SVE2 is implemented then FEAT_SVE must also be implemented. Since Armv9.0-A is based on Armv8.5-A this implies that FEAT_DotProd is also implemented. Interestingly this means that FEAT_I8MM is not mandatory since it only becomes mandatory from Armv8.6-A (Armv9.1-A), however I am not aware of a micro-architecture where FEAT_SVE2 is implemented without all three of the above features also being implemented. Additionally, when testing under emulation there are sometimes bugs where even mandatory architecture relationships are broken. For example there is one known case where SVE2 may be reported as available even when SVE is explicitly disabled. To simplify these dependencies, don't try to enable later extensions unless earlier extensions are reported implemented. This notably penalises code if it were to run on a Fujitsu A64FX, however this is not a likely target for libyuv deployment. Change-Id: Ifa32f7a43043641f99afb120e591945e136c9fd1 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5546385 Reviewed-by: Frank Barchard --- source/cpu_id.cc | 30 ++++++++++++++++++------------ 1 file changed, 18 insertions(+), 12 deletions(-) diff --git a/source/cpu_id.cc b/source/cpu_id.cc index 6b6e8745f..db6a29796 100644 --- a/source/cpu_id.cc +++ b/source/cpu_id.cc @@ -193,17 +193,23 @@ LIBYUV_API SAFEBUFFERS int AArch64CpuCaps(unsigned long hwcap, // Neon is mandatory on AArch64, so enable regardless of hwcaps. int features = kCpuHasNEON; + // Don't try to enable later extensions unless earlier extensions are also + // reported available. Some of these constraints aren't strictly required by + // the architecture, but are satisfied by all micro-architectures of + // interest. This also avoids an issue on some emulators where true + // architectural constraints are not satisfied, e.g. SVE2 may be reported as + // available while SVE is not. if (hwcap & YUV_AARCH64_HWCAP_ASIMDDP) { features |= kCpuHasNeonDotProd; - } - if (hwcap2 & YUV_AARCH64_HWCAP2_I8MM) { - features |= kCpuHasNeonI8MM; - } - if (hwcap & YUV_AARCH64_HWCAP_SVE) { - features |= kCpuHasSVE; - } - if (hwcap2 & YUV_AARCH64_HWCAP2_SVE2) { - features |= kCpuHasSVE2; + if (hwcap2 & YUV_AARCH64_HWCAP2_I8MM) { + features |= kCpuHasNeonI8MM; + if (hwcap & YUV_AARCH64_HWCAP_SVE) { + features |= kCpuHasSVE; + if (hwcap2 & YUV_AARCH64_HWCAP2_SVE2) { + features |= kCpuHasSVE2; + } + } + } } return features; } @@ -244,9 +250,9 @@ LIBYUV_API SAFEBUFFERS int AArch64CpuCaps() { if (have_feature("hw.optional.arm.FEAT_DotProd")) { features |= kCpuHasNeonDotProd; - } - if (have_feature("hw.optional.arm.FEAT_I8MM")) { - features |= kCpuHasNeonI8MM; + if (have_feature("hw.optional.arm.FEAT_I8MM")) { + features |= kCpuHasNeonI8MM; + } } // No SVE feature detection available here at time of writing. return features;