George Steed a4ccf9940e [AArch64] Add I8MM implementation of ARGBToUV444Row
We cannot use the standard dot-product instructions since the
coefficients multiplication results are both added and subtracted, but
I8MM supports mixed-sign dot products which work well here.  We need to
add an additional variant of the coefficient structs since we need
negative constants for the elements that were previously subtracted.

Reduction in runtimes observed compared to the previous Neon
implementation:

Cortex-A510: -37.3%
Cortex-A520: -31.1%
Cortex-A715: -37.1%
Cortex-A720: -37.0%
  Cortex-X2: -62.1%
  Cortex-X3: -62.2%
  Cortex-X4: -40.4%

Bug: libyuv:977
Change-Id: Idc3d9a6408c30e1bce3816a1ed926ecd76792236
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5712928
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2024-07-16 17:32:52 +00:00
..
basic_types.h Disable old int types by default. 2018-07-09 21:16:47 +00:00
compare_row.h [AArch64] Add Neon implementation of HashDjb2 2024-05-01 19:37:31 +00:00
compare.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_argb.h YUY2ToARGBMatrix and UYVYToARGBMatrix added to allow any color matrix 2024-01-19 21:21:37 +00:00
convert_from_argb.h MM21ToYUY2 and ABGRToJ420 conversion 2022-08-16 22:07:38 +00:00
convert_from.h Add 10/12 bit YUV To YUV functions 2021-02-25 23:16:54 +00:00
convert.h Fix tidy warning that uint32_t dither4 should not be const 2023-06-02 00:42:02 +00:00
cpu_id.h [AArch64] Add SME feature detection on Linux 2024-06-08 23:34:22 +00:00
loongson_intrinsics.h RAWToJ400 faster version for ARM 2022-03-18 07:22:36 +00:00
macros_msa.h Add volatile for gcc inline to avoid being removed 2024-07-02 01:25:24 +00:00
mjpeg_decoder.h add YUV24 and AYUV formats 2019-03-05 02:53:56 +00:00
planar_functions.h Disable NEON if memory sanitizer is enabled 2023-08-31 18:07:42 +00:00
rotate_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_row.h [AArch64] Use full vectors in TransposeWx{8 => 16}_NEON 2024-05-21 07:46:42 +00:00
rotate.h Add 10 bit rotate methods. 2023-01-04 21:10:01 +00:00
row.h [AArch64] Add I8MM implementation of ARGBToUV444Row 2024-07-16 17:32:52 +00:00
scale_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_rgb.h RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 2022-03-19 01:44:06 +00:00
scale_row.h Disable RVV ScaleDownBy4 if compiler option is not enabled 2024-06-18 01:52:40 +00:00
scale_uv.h add yuvconvstants util 2021-02-12 19:45:16 +00:00
scale.h Change ScalePlane,ScalePlane_16,... to return int 2023-11-03 23:53:24 +00:00
version.h [AArch64] Fix SVE/SME vector length printing in cpuid 2024-07-02 19:44:41 +00:00
video_common.h Add support for AR64 format 2021-03-13 20:55:21 +00:00