2865 Commits

Author SHA1 Message Date
Frank Barchard
d71cda1bb0 Rollback util cpuid hybrid detect due to android build errors
Bug: 438241552
Change-Id: Ie56aa7296e796e44e63d0dd913120b897b12cc9b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6843504
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-08-12 14:13:24 -07:00
George Steed
b7d97d5f3f [AArch64] Fix compilation due to incorrect register constraint
The y0_fraction and y1_fraction variables in InterpolateRow_NEON were
marked as modified by the inline-asm block, however
5eea7812826c551559fdcd4a6988fcf1fbe341f6 marked these variables as
`const` which caused both LLVM and GCC to emit errors about modification
of const variables.

There is no need for these variables to be modified in the loop since
they are read-only, so simply update the inline asm block constraints to
match.

Change-Id: I94ca3696c4163ede6ad27d645f0f445fcfb0a1c3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6818289
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-08-05 11:23:20 -07:00
Frank Barchard
5eea781282 Fix util/cpuid hybrid detect
- clang-format applied

Bug: None
Change-Id: If8aec0bbb3d3461886f176a77e029833f5dc197d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6805445
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-08-04 16:29:34 -07:00
Frank Barchard
48943bb378 Convert8To16 use VPSRLW instead of VPMULHUW for better lunarlake performance
- MCA says old version was 4 cycles and new version is 2.5 cycles/loop
- lunarlake is the only known cpu
mca -mcpu=lunarlake 100 iterations

Was vpmulhu
  Iterations:        100
  Instructions:      1200
  Total Cycles:      426
  Total uOps:        1200

  Dispatch Width:    8
  uOps Per Cycle:    2.82
  IPC:               2.82
  Block RThroughput: 4.0

Now vpsrlw
  Iterations:        100
  Instructions:      1200
  Total Cycles:      279
  Total uOps:        1400

  Dispatch Width:    8
  uOps Per Cycle:    5.02
  IPC:               4.30
  Block RThroughput: 2.5

Bug: None
Change-Id: I5a49e1cf1ed3dfb59fe9861a871df9862417c6a6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6697745
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-08-04 12:42:50 -07:00
Frank Barchard
cdd3bae848 TestI400LargeSize fix for warning message build error
- change %ld to %zd for size_t printf warnings
- disable TestI400LargeSize when disabling SLOW_TESTS
- disable cpuid tests that read proc/cpuinfo test data files
- add ifdef around timers to allow hexagon build
- remove faulty hybrid detect
- remove old mips LIBYUV_DISABLE_DSPR2 reference in gyp build
- apply clang-format

Bug: 434382656
Change-Id: Id74812e6ef29d4a8d0ff967a9189d249b80816d4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6812825
Reviewed-by: Jeremy Leconte <jleconte@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-08-01 12:03:11 -07:00
Frank Barchard
3ff31b2a5f Make LibYUVConvertTest.TestI400LargeSize skip test on low end arm cpu
- detect lack of dot product instruction to infer the cpu is low end
- only run the test on higher end arm

Bug: 416842099
Change-Id: Idd2dd16a624bbba280cf531644440024b12f7ecf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6804632
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2025-07-31 02:41:17 -07:00
Xi Ruoyao
dd9ced1c6d loong64: Use HWCAP instead of CPUCFG to detect LSX/LASX
Per the Software Development and Build Convention for LoongArch™
Architectures manual, on Linux we should use HWCAP instead of CPUCFG to
detect if LSX/LASX is available.  The reason is the kernel may be
configured to disable them, and CPUCFG cannot provide info about the
kernel support.

Change-Id: I3f1b23e6d4c91c7da81311fbbe294e36ff178121
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6772567
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-07-24 23:43:54 -07:00
Takuto Ikuta
96134e95a7 BUILD.gn: Disable libc++ modules for NEON and SVE
The -march arguments used for NEON and SVE builds in libyuv are
incompatible with libc++ modules. This change disables libc++ modules
for these build configurations to fix the build.

Bug: 425535758
Change-Id: I578a0d9929c10177903c567bc268407470b45034
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6695664
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2025-07-01 10:34:37 -07:00
George Steed
007b920232 [AArch64] Add SME implementation of ARGBToUVRow and similar
Mostly just a straightforward copy of the existing SVE2 code ported to
Streaming-SVE. Introduce new "any" kernels for non-multiple of two
cases, matching what we already do for SVE2.

The existing SVE2 code makes use of the Neon MOVI instruction that is
not supported in Streaming-SVE, so adjust the code to use FMOV instead
which has the same performance characteristics.

Change-Id: I74b7ea1fe8e6af75dfaf92826a4de775a1559f77
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6663806
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-06-30 09:20:23 -07:00
Junji Watanabe
9519b7df0e Set use_siso=true by default in .gn
This CL enables use_siso=true by default.
Developer builds will switch to Siso, or get suggestion to run
`gn clean` to switch.

No-Try: true
Bug: chromium:412968361
Change-Id: I1913d6735d835c614dca863ca7781f9154a4e42a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6651381
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2025-06-23 23:50:32 -07:00
George Steed
88798bcd63 [AArch64] Add SME implementation of Convert8To16Row_SME
Mostly just a straightforward copy of the Neon code ported to
Streaming-SVE. There is no benefit from this kernel when the SVE vector
length is only 128 bits, so skip writing a non-streaming SVE
implementation.

Change-Id: Ide34dbb7125b5f2a1edda6ef7111a1a49aad324f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6651565
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-06-23 11:32:56 -07:00
George Steed
1724c4be72 [AArch64] Add missing "+i8mm" feature when building SME
FEAT_I8MM is not unconditionally enabled with -march=armv9-a since it
only becomes mandatory from Armv9.1-A, so explicitly specify it in both
BUILD.gn and CMakeLists.txt.

Also flip the order of +sve2+i8mm => +i8mm+sve2 to match occurrences
elsewhere.

Change-Id: I8c37580d3718f380b772cdb726d8c30bcd5b9e2c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6656718
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-06-19 12:10:23 -07:00
Frank Barchard
6f729fbe65 ARGBToUV SSE use average of 4 pixels
- Was using avgb twice for non-exact and C for exact.

On Skylake Xeon:

Now SSE3
ARGBToJ420_Opt (326 ms)

Was
Exact C
ARGBToJ420_Opt (871 ms)
Not exact AVX2
ARGBToJ420_Opt (237 ms)
Not exact SSSE3
ARGBToJ420_Opt (312 ms)

Bug: 381138208
Change-Id: I6d1081bb52e36f06736c0c6575fa82bb2268629b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6629605
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ben Weiss <bweiss@google.com>
2025-06-17 11:55:27 -07:00
Frank Barchard
889613683a Add hybrid detect for Intel laptop cpus
- Add +i8mm build option for sve ARGBToUV which uses usdot
- util/cpuid Get cpu count (windows, macos, linux)
- For each x86 cpu, detect hybrid (e-core)
- Includes a comment fix for ubsan unittest
- Bump version
- Apply clang format to util/*.c as well as all *.cc/*.h

Bug: 424637372
Change-Id: I08310e18051fff62c9e4e4a10d1e4361871119ac
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6635640
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-06-13 13:22:54 -07:00
George Steed
3d66e94fb5 [AArch64] Improve ARGBToUVRow_SVE2 and related kernels
This commit reworks the implementation of ARGBToUVMatrixRow_SVE2, using
an approach similar to that recently used in
61bdaee13a701d2b52c6dc943ccc5c888077a591.

In particular we can rework these SVE2 implementations to use 8-bit
dot-product instructions instead of 16-bit, allowing us to process more
data in a single vector.

To ensure that the input values fit in 8-bits, negate the UV constants
arrays passed to the kernel and undo the now-unnecessary flipping of the
middle two component values.

This commit mostly reverses the performance inversion where the Neon
I8MM implementation was previously faster than the SVE2 implementation.
The reduction in runtime observed compared to the existing Neon I8MM
implementation is now:

Cortex-A510:  +5.6% (!)
Cortex-A520:  -3.0%
Cortex-A710: -12.6%
Cortex-A715: -10.9%
Cortex-A720: -10.8%
  Cortex-X2:  -3.8%
  Cortex-X3: -10.3%
  Cortex-X4:  -9.5%
Cortex-X925:  -6.7%

Change-Id: I30253976dc8e3651cfb5fd39b63a6763975d41e3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6640990
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2025-06-12 14:10:44 -07:00
George Steed
1b2f6cdbe8 [AArch64] Unroll I210ToAR30Row_{SVE2,SME}
Now that we have a STOREAR30_SVE_2X implementation, we can use this to
unroll other kernels. The predication on I210ToAR30Row needs adjusting
to allow loading two vectors of Y compared to one vector of U/V, and
additionally UZP is needed to ensure the data arrangement in vector
lanes matches the U/V layout. LD2H could also be used, however this
provides no performance improvement on most cores and would necessitate
the addition of an "any" kernel to handle the case where width % 2 != 0.

Reduction in run times of I210ToAR30Row_SVE2 observed compared to the
previous SVE2 implementation: (note that even in the observed slowdowns,
the SVE2 implementation still outperforms the existing Neon code)

Cortex-A510: -37.1%
Cortex-A520: -39.1%
Cortex-A710: +1.6% (!)
Cortex-A715: +6.5% (!)
Cortex-A720: +6.5% (!)
  Cortex-X2: -2.9%
  Cortex-X3: -2.2%
  Cortex-X4: -8.8%
Cortex-X925: -3.5%

Change-Id: I2ff285b48105883526eceb8be1fcbe0e033a553b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6640989
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2025-06-12 14:10:21 -07:00
George Steed
867bdc51ed [AArch64] Unroll I422ToAR30Row_{SVE2,SME}
The existing STOREAR30_SVE macro works fine for out of order cores,
however for in-order cores the number of dependent vector instructions
laid out consecutively impacts performance.

We can improve this by unrolling the loop to process two sets of vectors
at a time, allowing little cores to process two independent streams of
vector instructions at the same time to improve performance. Using one
set of ZIP instructions at the end allows us to (a) avoid ST4 which we
know is slow on some micro-architectures, and (b) enable the use of
predication and avoid the need for separate "any" kernels.

Reduction in run times of I422ToAR30Row_SVE2 observed compared to the
previous SVE2 implementation:

Cortex-A510: -37.7%
Cortex-A520: -38.8%
Cortex-A710: -14.8%
Cortex-A715: -17.1%
Cortex-A720: -16.9%
  Cortex-X2: -10.3%
  Cortex-X3:  -6.7%
  Cortex-X4:  -9.4%
Cortex-X925:  -7.1%

Change-Id: I160fb41300d2d08fce2e6eb92181324fd723a02d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6632916
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2025-06-12 14:09:49 -07:00
Frank Barchard
843cda7e7b TestI400LargeSize test __x86_64__, _M_X64, or __aarch64__
- apply clang-format to row_neon64.cc

Bug: 416842099
Change-Id: Ic21f08d8b65bb86cf72eba82d45591f6558170ec
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6634515
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-06-10 15:53:02 -07:00
Frank Barchard
4ac0a3ae3d ubsan compliant '_any' functions using ptrdiff_t for pointer math
Bug: 416842099
Change-Id: I1e3c7bc1b363c11baeb3b529ee78e5ac8878c359
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6634217
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-06-10 15:01:52 -07:00
George Steed
cd0ae0a222 row_sve.h: Add missing z21 clobber
The z21 register is used in the I444TORGB_SVE_2X macro and other places,
so add it to the clobber list macro that is used throughout this file.

Change-Id: If4277c1ffcac0fa68cc44263acc6f41a9e82ec8b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6619508
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-06-08 19:41:44 -07:00
George Steed
998bec7ca9 Sort row.h #define *_NEON lists
Sort the Arm Neon and Neon DotProd #define lists to match the
alphabetical ordering used for the SVE2 and SME lists.

Change-Id: Ibeb380f477d5476d0018d20a754557a5f93f2190
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6613686
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-06-08 19:38:30 -07:00
Junji Watanabe
20b1d84ec8 infra: Remove reclient properties from infra config
This completes the Siso migration for libyuv.

No-Try: true
Bug: 412968361
Change-Id: I0b823d7a0b6895bfd26dcdd0cdae9eb665f02f11
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6606659
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Christoffer Dewerin <jansson@chromium.org>
Commit-Queue: Junji Watanabe <jwata@google.com>
2025-06-02 00:54:25 -07:00
Junji Watanabe
f7392e01c1 infra: Add $build/siso properties to libyuv builders
This CL switches libyuv builders from Ninja to Siso. Reclient will still
be used.

https://crrev.com/c/6605972 is the corresponding recipe change.

No-Try: true
Bug: chromium:412968361
Change-Id: I6ba063d0aa954185284a44d0b353278d71953e4b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6589372
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Junji Watanabe <jwata@google.com>
Reviewed-by: Christoffer Dewerin <jansson@chromium.org>
2025-06-02 00:07:24 -07:00
WANG Xuerui
6ecfe106c3 Enable explicit control over LoongArch LSX & LASX for GYP builds
And enable LASX by default for LoongArch builds, because LASX is
widely supported among LoongArch desktops and servers, and performance
is better than with LSX alone.

Because the LoongArch SIMD code is written to only compile if the
respective codegen option is enabled, but the defaults and availability
differ between compiler versions and target `-march` setting, the
codegen flags are explicitly added to CFLAGS for wider compatibility.

Bug: None
Change-Id: I735ceac0f6b46eea2155e58ecf3630383ef5b728
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6241804
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2025-05-30 10:17:27 -07:00
George Steed
ef9833fc70 Add Neon implementation of Convert8To16Row
Add a Neon implementation of the Convert8To16Row kernel. Compared to the
C implementation we can take advantage of knowing that the "scale"
parameter is always an unsigned power of two and fits in 16-bits,
allowing us to combine this with the shift and avoid needing to widen
the input data.

Reduction in run times observed compared to the existing C
implementation:

 Cortex-A55: -44.5%
Cortex-A510: -26.1%
Cortex-A520: -30.6%
 Cortex-A76: -61.6%
Cortex-A710: -57.6%
  Cortex-X1: -46.5%
  Cortex-X2: -54.4%
  Cortex-X3: -57.1%
  Cortex-X4: -55.0%
Cortex-X925: -49.3%

Change-Id: I34b858605ece47e46588c0680a1d2afa7a90d7a0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6516186
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-05-29 13:37:48 -07:00
George Steed
7e5863ae5a Add SVE2 and SME implementations of I422ToAR30Row
This can make use of the existing load/convert/store macros that are
already present for other kernels, so add I422ToAR30Row_SVE2 and
I422ToAR30Row_SME to match the existing kernels.

Reduction in time taken observed for the new SVE2 implementation,
compared to the existing Neon implementation:

Cortex-A510: -9.1%
Cortex-A520: +6.8% (!)
Cortex-A710: -4.0%
Cortex-A715: -1.1%
Cortex-A720: -1.1%
  Cortex-X2: -5.7%
  Cortex-X3: -5.9%
  Cortex-X4: -2.8%
Cortex-X925: -4.0%

Change-Id: Ibf8bfaaeaba51f426649ded621cb0c8948dd9ee1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6592332
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-05-27 11:39:00 -07:00
Junji Watanabe
3489272e28 Support Siso builds
This CL adds Siso support to libyuv:
- Install Siso CIPD package.
- Add a DEPS hook to generate .sisoenv file.
- Generate gn_logs.txt to propagate GN variables to Siso.

No-Try: True
Bug: chromium:412968361
Change-Id: I32fa1f34b4db257e34ca7445577ceb619c50c097
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6589371
Reviewed-by: Christoffer Dewerin <jansson@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
2025-05-27 01:54:30 -07:00
Mirko Bonadei
42404a6efc Roll chromium_revision 3d4d5701ea..9dbf00e283 (1445131:1465343)
Change log: 3d4d5701ea..9dbf00e283
Full diff: 3d4d5701ea..9dbf00e283

Changed dependencies
* fuchsia_vesion: version:27.20250409.6.1..version:28.20250522.3.1
* gn_vesion: git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63..git_revision:ebc8f16ca7b0d36a3e532ee90896f9eb48e5423b
* reclient_vesion: re_client_version:0.177.1.e58c0145-gomaip..re_client_version:0.178.0.5ee9d3e8-gomaip
* src/build: a86a8f1eef..7907108fc6
* src/buildtools: 085537daf1..813bee86ee
* src/buildtools/linux64: git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63..git_revision:ebc8f16ca7b0d36a3e532ee90896f9eb48e5423b
* src/buildtools/mac: git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63..git_revision:ebc8f16ca7b0d36a3e532ee90896f9eb48e5423b
* src/buildtools/reclient: re_client_version:0.177.1.e58c0145-gomaip..re_client_version:0.178.0.5ee9d3e8-gomaip
* src/buildtools/win: git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63..git_revision:ebc8f16ca7b0d36a3e532ee90896f9eb48e5423b
* src/ios: cb94f0a680..c61efe1d6d
* src/testing: 83917cb85c..739fbc1a64
* src/third_party: 0aa3190ee4..a0168b392b
* src/third_party/android_toolchain/ndk: Idl-vYnWGnM8K3XJhM3h6zjYVDXlnljVz3FE00V9IM8C..KXOia11cm9lVdUdPlbGLu8sCz6Y4ey_HV2s8_8qeqhgC
* src/third_party/androidx/cipd: jYbS8zrbmfTcWph9ZY_BcX8HUPFnwt3fGMfhQcNURSQC..IKju-kxPcx53mOt9VCPN7dmPmZeWguJU1JS6WmN67kQC
* src/third_party/catapult: https://chromium.googlesource.com/catapult.git/+log/4f90fb2788..938fc9953b
* src/third_party/depot_tools: f40ddcd8d5..e0ece52cfb
* src/third_party/googletest/src: 52204f78f9..09ffd00153
* src/third_party/icu: c9fb4b3a6f..b929596bae
* src/third_party/kotlin_stdlib/cipd: dpAaSR0n15OMLmQJlIc-ZQ14UqzGBr2LaBEw_rukkl8C..GUpKElqF0PYGB-SP4D5w6p_MuMYQSBrRkGqFGjPhsIYC
* src/third_party/kotlinc/current: Wood5j4J3uPDtbP0fk868sOS0Y0umzF5X7w6U6QWupgC..XmaM7JA4hB75AuMdzCegF-XYzXtoHKOA1anrWqAJL3QC
* src/third_party/libc++/src: 024b5251a7..a01c02c9d4
* src/third_party/libc++abi/src: 78140a7276..9810fb23f6
* src/third_party/libunwind/src: e2e6f2a67e..8575f4ae4f
* src/third_party/llvm-libc/src: 54db6cfdef..9c3ae3120f
* src/third_party/nasm: 767a169c88..9f916e90e6
* src/third_party/r8/cipd: S1YW2OlP8ThsNUXDptm52Ouvnwp9t9xpTy5LECvEOw4C..QhYGRVpYYKZmt3f_Zb2HoJ9LIBEnWaeeLXRNei47Z30C
* src/third_party/r8/d8/cipd: wvbyt_Mr06Bl4Rcv4zoX-sTk_keiEYxfspOMUufh5nIC..QhYGRVpYYKZmt3f_Zb2HoJ9LIBEnWaeeLXRNei47Z30C
* src/third_party/turbine/cipd: scfGptWnO9bwzbg-jr0mcnVO3NG5KQJvlAQd_JSD5QUC..VGtOG2ivl1SJR7Lai5FQddIu15mWCYDnp47QtozMQeoC
* src/tools: f06b4755aa..ae54c8a35f
Added dependency
* src/third_party/android_deps/autorolled/cipd
DEPS diff: 3d4d5701ea..9dbf00e283/DEPS

Clang version changed llvmorg-21-init-6681-g5b36835d:llvmorg-21-init-11777-gfd3fecfc
Details: 3d4d5701ea..9dbf00e283/tools/clang/scripts/update.py

BUG=None
No-Try: True
Change-Id: I66a5c541493d688571a3ec324a67498334ac307e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6586277
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Jeremy Leconte <jleconte@google.com>
2025-05-26 06:56:54 -07:00
George Steed
949cb623bf Add SVE2 and SME implementations of I444ToRGB24Row
Move the READYUV444_SVE_2X and I444TORGB_SVE_2X macros to row_sve.h so
they are usable in both SVE2 and SME implementations, and use them to
add new I444ToRGB24Row implementations for SVE2 and SME. We need to use
the unrolled versions here to use the ST3B interleaving store
instructions, since there is no partial vector version of this store
instruction.

Reduction in time taken observed for the new SVE2 implementation,
compared to the existing Neon implementation:

Cortex-A510: -57.6%
Cortex-A520: -38.1%
Cortex-A710: -15.5%
Cortex-A715:  -9.2%
Cortex-A720:  -9.2%
  Cortex-X2: -25.8%
  Cortex-X3: -26.2%
  Cortex-X4: -23.2%
Cortex-X925: -17.8%

Change-Id: I6acd0b798a35e5352d4fad664769f12d3d938ed7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6530646
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-05-22 13:33:06 -07:00
Frank Barchard
951e43439c Use pragma comment to disable warning for ASSERT_NE when including gtest.h
- // IWYU pragma: export

Bug: None
Change-Id: Ic438b9712ca9bccb819358cb94fbee9a63389748
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6553193
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-05-15 15:58:15 -07:00
Etienne Pierre-doray
84890943b3 [tracing] Remove enable_base_tracing
libyuv doesn't depend on chromium/base/ anymore.

Change-Id: Idb89d1e8cc6ebe1cd14f012299cdc9680c2da5cc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6545812
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-05-15 01:22:38 -07:00
Frank Barchard
0853c9353f ARGBToUV 64 bit use ymm8 for shuffler
Bug: 381138208
Change-Id: I5e69bc1610bd6269bf9a4113e729cf307dd36f60
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6536833
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2025-05-12 15:09:40 -07:00
George Steed
61bdaee13a Add Neon I8MM implementations of ARGB to UV and variants
The maximum coefficient is 128, so store constants negated to take
advantage of -128 being representable in 8-bit integers. This allows us
to use the I8MM USDOT instructions.

Reduction in time taken observed compared to the existing Neon
implementation, as a geomean of all ARGBToUV variants:

Cortex-A510:  -7.1%
Cortex-A520:  -2.1%
Cortex-A710:  -8.4%
Cortex-A715:  -0.3%
Cortex-A720:  -0.3%
  Cortex-X2: -40.0%
  Cortex-X3: -43.3%
  Cortex-X4: -11.3%
Cortex-X925:  -2.5%

Change-Id: Id06dc17d101b66975b84b93e5abe91c0032921dd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6535686
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-05-12 11:14:00 -07:00
Gavin Mak
4db2af62da Remove --no_auth from download_from_google_storage hooks
The flag was deprecated by https://crrev.com/c/6414748 and
has no effect besides telling the user that it has no effect.

Bug: 414826937
Change-Id: Idd0ee2e7a3cab0f49c4f87da0f3901713f9ebf00
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6509300
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2025-05-02 14:27:21 -07:00
Frank Barchard
9f9b5cf660 ARGBToUV allow 32 bit x86 build
- make width loop count on stack
- set YMM constants in its own asm block
- make struct for shuffle and add constants
- disable clang format on row_neon.cc function

Bug: 413781394
Change-Id: I263f6862cb7589dc31ac65d118f7ebeb65dbb24a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6495259
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-04-28 12:11:00 -07:00
WANG Xuerui
1e40e34573 Add missing files for loong64 GYP build
There are a few added source files since the (re-)addition of GYP build
support, for better SIMD optimization support (AArch64 SME & SVE,
LoongArch LSX & LASX, RISC-V RVV). This CL covers the LoongArch part in
preparation of fixing GYP builds for this architecture.

The files' arch-specific contents are all gated behind preprocessor
macro checks, so it is safe to have everything included in the build
unconditionally.

Bug: None
Change-Id: I2da37c1db79c2d8316ae42079e79efed2a2030a9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6241803
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-04-15 14:03:27 -07:00
Mark Zhuang
fb7b9a4df4 Fix typo, remove mips as title already contain mips
Change-Id: I884f2f3ba937ec71fa070373e5c32977d35e7e75
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6267779
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2025-04-15 14:02:28 -07:00
Wan-Teh Chang
ce488afb7b Call cmake_minimum_required(VERSION 3.16) first
CMake version >= 3.16 comes from Google's Foundational C++ Support
matrix:
https://github.com/google/oss-policies-info/blob/main/foundational-cxx-support-matrix.md

Call cmake_minimum_required() first, followed by project().

These changes fix two warnings from cmake version 3.31.5 and 4.0.1.

Change-Id: I42d51f2764d95e23a45a709986011dc0aafb3cf8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6451084
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: James Zern <jzern@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-04-13 10:08:52 -07:00
Mirko Bonadei
bf0f29fdf9 Roll chromium_revision 908f3898af..3d4d5701ea (1403569:1445131)
Change log: 908f3898af..3d4d5701ea
Full diff: 908f3898af..3d4d5701ea

Changed dependencies
* fuchsia_vesion: version:26.20250103.4.1..version:27.20250409.6.1
* gn_vesion: git_revision:c97a86a72105f3328a540f5a5ab17d11989ab7dd..git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63
* reclient_vesion: re_client_version:0.172.0.3cf60ba5-gomaip..re_client_version:0.177.1.e58c0145-gomaip
* src/build: f3e95cc9a0..a86a8f1eef
* src/buildtools: dc74188326..085537daf1
* src/buildtools/linux64: git_revision:c97a86a72105f3328a540f5a5ab17d11989ab7dd..git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63
* src/buildtools/mac: git_revision:c97a86a72105f3328a540f5a5ab17d11989ab7dd..git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63
* src/buildtools/reclient: re_client_version:0.172.0.3cf60ba5-gomaip..re_client_version:0.177.1.e58c0145-gomaip
* src/buildtools/win: git_revision:c97a86a72105f3328a540f5a5ab17d11989ab7dd..git_revision:6d326e97fe0242bf56c3de1a93f887446e80ec63
* src/ios: 6e4e345fbb..cb94f0a680
* src/testing: 4341e4d7a2..83917cb85c
* src/third_party: f25a92da84..0aa3190ee4
* src/third_party/android_deps/cipd/libs/com_google_android_datatransport_transport_api: version:2@2.2.1.cr1..version:2@4.0.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_auth: version:2@21.1.1.cr1..version:2@21.3.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_auth_api_phone: version:2@18.0.2.cr1..version:2@18.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_auth_base: version:2@18.0.10.cr1..version:2@18.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_basement: version:2@18.4.0.cr1..version:2@18.5.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_cast: version:2@17.0.0.cr1..version:2@22.0.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_cast_framework: version:2@17.0.0.cr1..version:2@22.0.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_cloud_messaging: version:2@16.0.0.cr1..version:2@17.2.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_flags: version:2@17.0.0.cr1..version:2@18.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_identity_credentials: version:2@16.0.0-alpha02.cr1..version:2@16.0.0-alpha05.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_instantapps: version:2@18.0.1.cr1..version:2@18.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_location: version:2@21.0.1.cr1..version:2@21.3.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_gms_play_services_stats: version:2@17.0.0.cr1..version:2@17.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_play_core_common: version:2@2.0.2.cr1..version:2@2.0.3.cr1
* src/third_party/android_deps/cipd/libs/com_google_android_play_feature_delivery: version:2@2.0.1.cr1..version:2@2.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_code_gson_gson: version:2@2.9.0.cr1..version:2@2.8.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_annotations: version:2@16.0.0.cr1..version:2@16.2.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_common: version:2@19.5.0.cr1..version:2@21.0.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_components: version:2@16.1.0.cr1..version:2@18.0.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_encoders: version:2@16.1.0.cr1..version:2@17.0.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_encoders_json: version:2@17.1.0.cr1..version:2@18.0.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_iid: version:2@21.0.1.cr1..version:2@21.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_iid_interop: version:2@17.0.0.cr1..version:2@17.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_installations: version:2@16.3.5.cr1..version:2@17.2.0.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_installations_interop: version:2@16.0.1.cr1..version:2@17.1.1.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_measurement_connector: version:2@18.0.0.cr1..version:2@20.0.1.cr1
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_messaging: version:2@21.0.1.cr1..version:2@24.1.0.cr1
* src/third_party/android_deps/cipd/libs/com_squareup_wire_wire_runtime_jvm: version:2@5.0.0.cr1..version:2@5.2.1.cr1
* src/third_party/android_deps/cipd/libs/org_codehaus_mojo_animal_sniffer_annotations: version:2@1.21.cr1..version:2@1.17.cr1
* src/third_party/android_deps/cipd/libs/org_jetbrains_kotlinx_kotlinx_coroutines_core_jvm: version:2@1.8.1.cr1..version:2@1.10.1.cr1
* src/third_party/androidx/cipd: gUjEawxv5mQO8yfbuC8W-rx4V3zYE-4LTWggXpZHI4sC..jYbS8zrbmfTcWph9ZY_BcX8HUPFnwt3fGMfhQcNURSQC
* src/third_party/catapult: https://chromium.googlesource.com/catapult.git/+log/8491e07230..4f90fb2788
* src/third_party/depot_tools: 423f1e1914..f40ddcd8d5
* src/third_party/googletest/src: 7d76a231b0..52204f78f9
* src/third_party/harfbuzz-ng/src: 1c249be96e..9f83bbbe64
* src/third_party/icu: bbccc2f6ef..c9fb4b3a6f
* src/third_party/instrumented_libs: 3cc43119a2..69015643b3
* src/third_party/kotlin_stdlib/cipd: uguVAY3NvbfV4KgHrjjwvtTioMwPwSijfAgBPpbaYk0C..dpAaSR0n15OMLmQJlIc-ZQ14UqzGBr2LaBEw_rukkl8C
* src/third_party/kotlinc/current: YrBSUjA4zjPf3DhU2SYlqamxAAQiM2WIeZftsDSjqTAC..Wood5j4J3uPDtbP0fk868sOS0Y0umzF5X7w6U6QWupgC
* src/third_party/libc++/src: 74dd760826..024b5251a7
* src/third_party/libc++abi/src: 7681005c62..78140a7276
* src/third_party/libjpeg_turbo: 927aabfcd2..e14cbfaa85
* src/third_party/libunwind/src: d1e95b102f..e2e6f2a67e
* src/third_party/libunwindstack: 215bddfd8e..0d758dd57f
* src/third_party/llvm-libc/src: 2019a9e40b..54db6cfdef
* src/third_party/lss: https://chromium.googlesource.com/linux-syscall-support.git/+log/ce877209e1..ed31caa60f
* src/third_party/nasm: f477acb104..767a169c88
* src/third_party/r8/cipd: TQJgBofMEzGILWhAM0LXeob_ZpAiDc8w8SBzU0d8o8YC..S1YW2OlP8ThsNUXDptm52Ouvnwp9t9xpTy5LECvEOw4C
* src/third_party/r8/d8/cipd: U3Jf_ewWOZyxa6vyO3wjNIgm8XIz1yFk-4k3-wqDL44C..wvbyt_Mr06Bl4Rcv4zoX-sTk_keiEYxfspOMUufh5nIC
* src/third_party/re2/src: 6dcd83d60f..c84a140c93
* src/third_party/turbine/cipd: dz8pRLjwNlToJ0tS14T-TDQJNikmFXEDByMo-OzBbl0C..scfGptWnO9bwzbg-jr0mcnVO3NG5KQJvlAQd_JSD5QUC
* src/tools: 09973d22d8..f06b4755aa
Added dependencies
* src/third_party/android_deps/autorolled/cipd
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_encoders_proto
* src/third_party/android_deps/cipd/libs/com_google_ar_impress
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_datatransport
* src/third_party/android_deps/cipd/libs/com_google_android_datatransport_transport_runtime
* src/third_party/android_deps/cipd/libs/org_jetbrains_kotlinx_kotlinx_coroutines_play_services
* src/third_party/android_deps/cipd/libs/com_google_android_datatransport_transport_backend_cct
* src/third_party/android_deps/cipd/libs/com_google_firebase_firebase_common_ktx
Removed dependencies
* src/third_party/android_deps/cipd/libs/com_google_android_annotations
* src/third_party/android_deps/cipd/libs/com_google_dagger_hilt_core
* src/third_party/android_deps/cipd/libs/com_squareup_javawriter
* src/third_party/android_deps/cipd/libs/io_grpc_grpc_api
* src/third_party/android_deps/cipd/libs/io_grpc_grpc_binder
* src/third_party/android_deps/cipd/libs/io_grpc_grpc_context
* src/third_party/android_deps/cipd/libs/io_grpc_grpc_core
* src/third_party/android_deps/cipd/libs/io_grpc_grpc_protobuf_lite
* src/third_party/android_deps/cipd/libs/io_grpc_grpc_stub
* src/third_party/android_deps/cipd/libs/io_perfmark_perfmark_api
* src/third_party/android_deps/cipd/libs/javax_annotation_jsr250_api
* src/third_party/android_deps/cipd/libs/org_hamcrest_hamcrest
DEPS diff: 908f3898af..3d4d5701ea/DEPS

Clang version changed llvmorg-20-init-16062-g091448e3:llvmorg-21-init-6681-g5b36835d
Details: 908f3898af..3d4d5701ea/tools/clang/scripts/update.py

BUG=None

Change-Id: I165d415f9dd1e0d318bdc7eb4ab9fb34d1e81050
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6441069
Reviewed-by: Jeremy Leconte <jleconte@google.com>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
2025-04-10 04:15:34 -07:00
Wan-Teh Chang
8c48036d15 Remove duplicate code in planar_functions.h
The declarations of ARGBAffineRow_C and ARGBAffineRow_SSE2 and the code
to support those declarations are duplicated in planar_functions.h. They
are already in row.h, so we can simply remove them.

Change-Id: I9b522fdd201ca530f1268bf4200cd2e18b806ba5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6434733
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2025-04-04 15:48:23 -07:00
Wan-Teh Chang
6cc603a8cf convert_test.cc: Remove unused ENABLE_ROW_TESTS
The ENABLE_ROW_TESTS macro is not used in convert_test.cc.

Change-Id: Icc50ec465beca81e14a9683a717680e179a541dd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6434620
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2025-04-04 15:43:59 -07:00
Wan-Teh Chang
b7a857659f Disable Arm SME and SVE assmbly code under MSan
The code that disables Arm and Intel assembly code under MSan is
duplicated in cpu_support.h and planar_functions.h. This CL does not
address the code duplication.

Bug: b:407277484, b:407278016, b:407278132
Change-Id: If70fd8d3382916041d75efabcc84010ea3f1e60e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6430806
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-04-03 11:27:31 -07:00
Wan-Teh Chang
a4f653b389 Fix CMakeLists.txt for compatibility with gcc 10
Based on the libavif pull request
https://github.com/AOMediaCodec/libavif/pull/2660
by Frankie Dintino <fdintino@gmail.com>.

Bug: 399856238
Change-Id: I9b21a0cf1fd26b71d86090f41841eefa4d6bb194
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6405834
Reviewed-by: George Steed <george.steed@arm.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2025-04-02 16:46:28 -07:00
WANG Xuerui
55a708e226 Fix unified sources build for LoongArch LASX
Several consumers of libyuv do unified sources build where many source
files are #include'd together to make compilation units larger and allow
for more optimization chances. But for LoongArch there is a wrinkle:
LASX and LSX code paths are implemented in separate files, unlike the
other currently supported architectures, and some definitions are
duplicated e.g. struct RgbConstants.

Since the duplicated content is identical across the two files, short of
some bigger refactoring, we can simply place #ifdef guards around the
definitions to fix unified sources build for LoongArch.

Change-Id: I952e8e0210221ec8bcc113f75fa1b9ba515ec323
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6272801
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
2025-04-01 09:48:19 -07:00
Frank Barchard
23d416d6f3 Detect SME without SVE dependency
Bug: None
Change-Id: Ibe29488e893a493699ea3fae1a1a54a4fff5969c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6418571
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-03-31 17:27:40 -07:00
Wan-Teh Chang
dc47c71b3e Bump cmake_minimum_required version to 3.5
We started to get the following error in libavif's GitHub CI workflows:

  CMake Error at CMakeLists.txt:8 (cmake_minimum_required):
    Compatibility with CMake < 3.5 has been removed from CMake.

Change-Id: If2490208cc3e7da22ff67557c5cdd4bd9f2499ad
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6416369
Commit-Queue: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-03-31 13:14:45 -07:00
Hang Nguyen
a43f62aa40 Enable CFI assembly support
Adds the sanitizer for the static library libyuv to enable CFI assembly
support

Bug: 400789169
Change-Id: I9be82d90d60535fdf59e4e729778a455e946e4cc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6414818
Reviewed-by: James Zern <jzern@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2025-03-31 10:47:47 -07:00
Frank Barchard
f145aa26da Add SME2 detect
Bug: None
Change-Id: I36e576de1cf468049faaf3923b6c21fc9ad14271
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6401373
Reviewed-by: George Steed <george.steed@arm.com>
2025-03-27 11:08:08 -07:00
George Steed
64ac2d8f0f Avoid odd width stores in I422ToRGB565Row_{SVE2,SME}
The existing code for creating RGB565 data in SVE2 and SME produces two
vectors of interleaved 16-bit elements due to the nature of how SVE
widening instructions operate. This means that the indices of the 16-bit
data created appear in the two result vectors as such:

    z18.b: [elem0 byte0, elem0 byte1, elem2 byte0, elem2 byte1, ...]
    z19.b: [elem1 byte0, elem1 byte1, elem3 byte0, elem3 byte1, ...]

This is problematic for the final (predicated) iteration of the
conversion since the p1 predicate input to the ST2H instruction controls
storing the four bytes corresponding to the first two elements, in the
first two bytes of z18 and z19. This means that in the case that the
width is an odd number there is no way of storing just elem0 in z18
individually.

This patch addresses this by permuting the z18/z19 data such that the
two bytes from each element are split evenly across the two vectors:

    z20.b: [elem0 byte0, elem1 byte0, elem2 byte0, elem3 byte0, ...]
    z21.b: [elem0 byte1, elem1 byte1, elem2 byte1, elem3 byte1, ...]

Since we would now always store the same lanes from both vectors we can
continue to use the same predicate without further changes.

The existing (non-tail) loop body utilizes an all-true predicate so we
can avoid the extra permutes in this case, avoiding any performance
degradation.

Change-Id: I7d2be27c84cd9eb02cebac54a14c3498911f21d3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6395137
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2025-03-26 04:08:46 -07:00
Frank Barchard
5f284054cb RVV disable 64 bit elements and vcombine_v
Bug: 405451074
Change-Id: I8e4437be92934b3c367c94d867d7967c32747260
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6385788
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2025-03-25 12:51:25 -07:00