Frank Barchard
fa16ddbb9f
cpuid show vector length on ARM and RISCV
...
- additional asm volatile changes from github
- rotate mips remove C function - moved to common
Run on Samsung S22
[ RUN ] LibYUVBaseTest.TestCpuHas
Kernel Version 5.10
Has Arm 0x2
Has Neon 0x4
Has Neon DotProd 0x10
Has Neon I8MM 0x20
Has SVE 0x40
Has SVE2 0x80
Has SME 0x0
SVE vector length: 16 bytes
[ OK ] LibYUVBaseTest.TestCpuHas (0 ms)
[ RUN ] LibYUVBaseTest.TestCompilerMacros
__ATOMIC_RELAXED 0
__cplusplus 201703
__clang_major__ 17
__clang_minor__ 0
__GNUC__ 4
__GNUC_MINOR__ 2
__aarch64__ 1
__clang__ 1
__llvm__ 1
__pic__ 2
INT_TYPES_DEFINED
__has_feature
Run on RISCV qemu emulating SiFive X280:
[ RUN ] LibYUVBaseTest.TestCpuHas
Kernel Version 6.6
Has RISCV 0x10000000
Has RVV 0x20000000
RVV vector length: 64 bytes
[ OK ] LibYUVBaseTest.TestCpuHas (4 ms)
[ RUN ] LibYUVBaseTest.TestCompilerMacros
__ATOMIC_RELAXED 0
__cplusplus 202002
__clang_major__ 9999
__clang_minor__ 0
__GNUC__ 4
__GNUC_MINOR__ 2
__riscv 1
__riscv_vector 1
__riscv_v_intrinsic 12000
__riscv_zve64x 1000000
__clang__ 1
__llvm__ 1
__pic__ 2
INT_TYPES_DEFINED
__has_feature
Bug: b/42280943
Change-Id: I53cf0450be4965a28942e113e4c77295ace70999
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5672088
Reviewed-by: David Gao <davidgao@google.com>
2024-07-02 18:10:56 +00:00
Frank Barchard
616bee5420
Add volatile for gcc inline to avoid being removed
...
Bug: b/42280943
Change-Id: I4439077a92ffa6dff91d2d10accd5251b76f7544
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5671187
Reviewed-by: David Gao <davidgao@google.com>
2024-07-02 01:25:24 +00:00
George Steed
4f7fd808b7
[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON
...
The existing Neon code only makes use of 64-bit vectors throughout which
limits the performance on larger cores. To avoid this, swap the Neon
code from a Wx8 implementation to a Wx16 implementation and process
blocks of 16 full vectors at a time.
The original code also handled widths that were not exact multiples of
16, however this should already be handled by the "any" kernel so it is
removed.
Finally, avoid duplicating the TransposeWx16_C fallback kernel
definition in all architectures that need it, and just put it once in
rotate_common.cc instead.
Observed speedups for TransposePlane across a range of
micro-architectures:
Cortex-A53: -40.0%
Cortex-A55: -20.7%
Cortex-A57: -43.9%
Cortex-A510: -43.5%
Cortex-A520: -43.9%
Cortex-A720: -31.1%
Cortex-X2: -38.3%
Cortex-X4: -43.6%
Change-Id: Ic7c4d5f24eb27091d743ddc00cd95ef178b6984e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5545459
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-05-21 07:46:42 +00:00
Frank Barchard
7e389884a1
Switch to C99 types
...
Append _t to all sized types.
uint64 becomes uint64_t etc
Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Manojkumar Bhosale
73a6f100a9
Add MSA optimized rotate functions (used 16x16 transpose)
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
TransposeWx16_MSA - ~6.0x
TransposeWx16_Any_MSA - ~4.7x
TransposeUVWx16_MSA - ~6.3x
TransposeUVWx16_Any_MSA - ~5.4x
Performance Gain (vs C non-vectorized)
TransposeWx16_MSA - ~6.0x
TransposeWx16_Any_MSA - ~4.8x
TransposeUVWx16_MSA - ~6.3x
TransposeUVWx16_Any_MSA - ~5.4x
Review-Url: https://codereview.chromium.org/2617703002 .
2017-01-13 15:50:02 +05:30
Manojkumar Bhosale
6fa5e4eb78
Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
TransposeWx8_MSA - ~2.7x
TransposeWx8_Any_MSA - ~2.1x
TransposeUVWx8_MSA - ~2.5x
TransposeUVWx8_Any_MSA - ~2.7x
Performance Gain (vs C non-vectorized)
TransposeWx8_MSA - ~4.6x
TransposeWx8_Any_MSA - ~2.9x
TransposeUVWx8_MSA - ~4.4x
TransposeUVWx8_Any_MSA - ~3.7x
Review URL: https://codereview.chromium.org/2553403002 .
2016-12-15 10:06:01 +05:30