mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2025-12-07 17:26:49 +08:00
The existing code makes use of a pair of lane-indexed load instructions
to fill the two halves of the input vector, however this has the effect
of introducing an unnecessary dependency on the value of the vector from
the previous loop iteration.
This doesn't really seem to affect little core performance since these
cores never execute enough work concurrently to hit the bottleneck,
however we can improve performance on mid and big cores quite a bit by
using LDR instead of LD1 to load the low lane, zeroing the upper portion
of the vector rather than keeping the previous value.
Reduction in runtime for select kernels (no observed performance delta
on Cortex-A55):
Kernel | Cortex-A76 | Cortex-X2
I422ToARGB4444Row_NEON | -23.1% | -49.3%
I422ToARGBRow_NEON | -1.2% | -2.5%
I422ToRGB24Row_NEON | -11.7% | -7.0%
I422ToRGBARow_NEON | -4.7% | -3.4%
I444AlphaToARGBRow_NEON | -1.1% | -2.4%
I444ToARGBRow_NEON | -1.6% | -3.2%
I444ToRGB24Row_NEON | -9.6% | -6.8%
Bug: libyuv:976
Change-Id: I8c9413e0e6ed97b8f060ce42b6e8abdfb77914b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5365868
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
|
||
|---|---|---|
| build_overrides | ||
| docs | ||
| include | ||
| infra/config | ||
| riscv_script | ||
| source | ||
| tools_libyuv | ||
| unit_test | ||
| util | ||
| .clang-format | ||
| .gitignore | ||
| .gn | ||
| .vpython | ||
| .vpython3 | ||
| Android.bp | ||
| Android.mk | ||
| AUTHORS | ||
| BUILD.gn | ||
| cleanup_links.py | ||
| CM_linux_packages.cmake | ||
| CMakeLists.txt | ||
| codereview.settings | ||
| DEPS | ||
| DIR_METADATA | ||
| download_vs_toolchain.py | ||
| libyuv.gni | ||
| libyuv.gyp | ||
| libyuv.gypi | ||
| LICENSE | ||
| linux.mk | ||
| OWNERS | ||
| PATENTS | ||
| PRESUBMIT.py | ||
| public.mk | ||
| pylintrc | ||
| README.chromium | ||
| README.md | ||
| winarm.mk | ||
libyuv is an open source project that includes YUV scaling and conversion functionality.
- Scale YUV to prepare content for compression, with point, bilinear or box filter.
- Convert to YUV from webcam formats for compression.
- Convert to RGB formats for rendering/effects.
- Rotate by 90/180/270 degrees to adjust for mobile devices in portrait mode.
- Optimized for SSSE3/AVX2 on x86/x64.
- Optimized for Neon on Arm.
- Optimized for MSA on Mips.
- Optimized for RVV on RISC-V.
Development
See Getting started for instructions on how to get started developing.
You can also browse the docs directory for more documentation.