libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-01-01 03:12:16 +08:00

Author	SHA1	Message	Date
George Steed	4f7fd808b7	[AArch64] Use full vectors in TransposeWx{8 => 16}_NEON The existing Neon code only makes use of 64-bit vectors throughout which limits the performance on larger cores. To avoid this, swap the Neon code from a Wx8 implementation to a Wx16 implementation and process blocks of 16 full vectors at a time. The original code also handled widths that were not exact multiples of 16, however this should already be handled by the "any" kernel so it is removed. Finally, avoid duplicating the TransposeWx16_C fallback kernel definition in all architectures that need it, and just put it once in rotate_common.cc instead. Observed speedups for TransposePlane across a range of micro-architectures: Cortex-A53: -40.0% Cortex-A55: -20.7% Cortex-A57: -43.9% Cortex-A510: -43.5% Cortex-A520: -43.9% Cortex-A720: -31.1% Cortex-X2: -38.3% Cortex-X4: -43.6% Change-Id: Ic7c4d5f24eb27091d743ddc00cd95ef178b6984e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5545459 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-05-21 07:46:42 +00:00
George Steed	9fac9a4a82	[AArch64] Add Neon implementations for {ARGB,ABGR}ToAR30Row There are existing x86 implementations for these kernels but not for AArch64, so add them. Reduction in runtimes, compared to the existing C code compiled with LLVM 17: \| ABGRToAR30Row \| ARGBToAR30Row Cortex-A55 \| -55.1% \| -55.1% Cortex-A510 \| -39.3% \| -40.1% Cortex-A76 \| -62.3% \| -63.6% Co-authored-by: Cosmina Dunca <cosmina.dunca@arm.com> Bug: libyuv:976 Change-Id: I307f03bddcbe5429c2d3ab2f42aa023a3539ddd0 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465592 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-05-21 07:35:07 +00:00
George Steed	83c48c782a	[AArch64] Improve ARGB4444TOARGB using SRI instructions Also avoid constructing the alpha component when it isn't needed by introducing a new ARGB4444TORGB macro. Reduction in runtime for selected kernels: \| Cortex-A55 \| Cortex-A510 \| Cortex-A76 ARGB4444ToARGBRow_NEON \| -27.5% \| -27.9% \| -29.1% ARGB4444ToUVRow_NEON \| -20.2% \| -25.2% \| -21.7% ARGB4444ToYRow_NEON \| -16.0% \| -20.2% \| -21.3% Bug: libyuv:976 Change-Id: Ida061e1c49ba228b02c2f691a067b58edad073a8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5509196 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-05-21 07:29:11 +00:00
George Steed	5618a5c762	[AArch64] Use REV16 rather than TBL in SwapUVRow_NEON We don't need a general-purpose purmute here, REV16 does exactly what we want and saves us needing to load the permute indices array. Bug: libyuv:976 Change-Id: Ib3bc2e4d21b00d53aeda6a11c6e6f1016ca6029e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5509201 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-05-21 07:26:54 +00:00
George Steed	c6632d43ae	[AArch64] Impose feature dependencies in detection code The strict architectural requirements between features are reasonably relaxed and difficult to map out fully, in particular: * FEAT_DotProd is architecturally available from Armv8.1-A and becomes mandatory from Armv8.4-A. * FEAT_I8MM is architecturally available from Armv8.1-A and becomes mandatory from Armv8.6-A. It does not strictly depend on FEAT_DotProd being implemented however I am not aware of a micro-architecture where FEAT_I8MM is implemented without FEAT_DotProd also being implemented. * FEAT_SVE is architecturally available from Armv8.2-A. It does not strictly depend on either of FEAT_DotProd or FEAT_I8MM being implemented. The only micro-architecture I am aware of where FEAT_SVE is implemented without FEAT_DotProd and FEAT_I8MM both also being implemented is the Fujitsu A64FX. * FEAT_SVE2 is architecturally available from Armv9.0-A. If FEAT_SVE2 is implemented then FEAT_SVE must also be implemented. Since Armv9.0-A is based on Armv8.5-A this implies that FEAT_DotProd is also implemented. Interestingly this means that FEAT_I8MM is not mandatory since it only becomes mandatory from Armv8.6-A (Armv9.1-A), however I am not aware of a micro-architecture where FEAT_SVE2 is implemented without all three of the above features also being implemented. Additionally, when testing under emulation there are sometimes bugs where even mandatory architecture relationships are broken. For example there is one known case where SVE2 may be reported as available even when SVE is explicitly disabled. To simplify these dependencies, don't try to enable later extensions unless earlier extensions are reported implemented. This notably penalises code if it were to run on a Fujitsu A64FX, however this is not a likely target for libyuv deployment. Change-Id: Ifa32f7a43043641f99afb120e591945e136c9fd1 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5546385 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-05-21 07:21:49 +00:00
Wan-Teh Chang	ec6f15079f	Remove unneeded #ifdef HAVE_JPEG code Change-Id: Ic7e1393b48bec735625197243b3d436ea01cfb07 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5529467 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-05-09 23:02:18 +00:00
George Steed	ee830a5f77	[AArch64] Enable feature detection on Windows and Apple Silicon Using the platform-specific functions IsProcessorFeaturePresent and sysctlbyname to check individual features. Bug: libyuv:980 Change-Id: I7971238ca72e5df862c30c2e65331c46dc634074 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465591 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-05-03 18:42:51 +00:00
George Steed	a114f85e50	[AArch64] Fix naming in ARGBToUVMatrixRow_SVE2 etc constants Avoid abbreviations and capitalize ARGB and UV naming, as suggested here: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5505537 Bug: libyuv:973 Change-Id: I0d0143154594c03e6aca7c859b874e39634ca54f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5513544 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-05-03 17:25:14 +00:00
George Steed	6f1d8b1e11	[AArch64] Add SVE2 implementations for ARGBToUVRow and similar By maintaining the interleaved format of the data we can use a common kernel for all input channel orderings and simply pass a different vector of constants instead. A similar approach is possible with only Neon by making use of multiplies and repeated application of ADDP to combine channels, however this is slower on older cores like Cortex-A53 so is not pursued further. For odd problem sizes we need a slightly different implementation for the final element, so introduce an "any" kernel to address that rather than bloating the code for the common case. Observed affect on runtimes compared to the existing Neon kernels: \| Cortex-A510 \| Cortex-A720 \| Cortex-X2 ABGRToUVJRow \| -15.5% \| +5.4% \| -33.1% ABGRToUVRow \| -15.6% \| +5.3% \| -35.9% ARGBToUVJRow \| -10.1% \| +5.4% \| -32.7% ARGBToUVRow \| -10.1% \| +5.4% \| -29.3% BGRAToUVRow \| -15.5% \| +4.6% \| -32.8% RGBAToUVRow \| -10.1% \| +4.2% \| -36.0% Bug: libyuv:973 Change-Id: I041ca44db0ae8a2adffcdf24e822eebe962baf33 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5505537 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-05-01 19:46:43 +00:00
George Steed	67e5e79dbe	[AArch64] Add Neon implementation of HashDjb2 Reduction in runtime observed compared to the existing C code compiled with LLVM 18: Cortex-A55: -46.2% Cortex-A510: -60.4% Cortex-A76: -82.9% Cortex-A720: -87.4% Cortex-X1: -90.0% Cortex-X2: -91.7% Change-Id: I39a4479f78299508043a864e64fb40578c66ce19 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5494094 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-05-01 19:37:31 +00:00
George Steed	1eae2efbc7	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBShadeRow_NEON The use of LD4 and ST4 to de-interleave ARGB color channels is unnecessary here since we can just adjust the scale multiplicand to match the interleaved layout. LD4 and ST4 are known to perform poorly on some micro-architectures so using LD1 and ST1 here should be preferred. Reduction in runtime for ARGBShadeRow_NEON: Cortex-A55: -19.9% Cortex-A510: -50.8% Cortex-A76: -36.0% Cortex-X2: -46.4% Bug: libyuv:976 Change-Id: I10a0e6a0a62242826d39b1e963063770f084226a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5494093 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-30 00:48:35 +00:00
George Steed	ce32eb773f	[AArch64] Avoid extraneous CMP in I{444,422}ToARGBRow_SVE2 impl We can use subs to set condition flags as part of the subtract, no need for a separate compare instruction. No performance difference observed from this change, but it now matches the other SVE2 kernels. Also remove unnecessary volatile from asm blocks. Bug: libyuv:973 Change-Id: I9bb4f5f1101086602f7d5223feaeae0fb63b385c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463951 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-29 18:56:22 +00:00
George Steed	f483007b9a	[AArch64] Add SVE implementation for I422AlphaToARGBRow This is mostly identical to the existing I422ToARGBRow_SVE implementation, we just need to make sure to load the alpha component rather than hard-coding it to 255. Reduction in runtimes observed compared to the existing Neon code: Cortex-A510: -32.1% Cortex-A720: -5.1% Cortex-X2: -10.1% Bug: libyuv:973 Change-Id: I6f800f3ef59f1dc82b409233017b3cb108da0257 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5444426 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-29 18:54:07 +00:00
George Steed	b53b27d6bf	[AArch64] Add SVE implementation for I444AlphaToARGBRow This is mostly identical to the existing I444ToARGBRow_SVE implementation, we just need to make sure to load the alpha component rather than hard-coding it to 255. Reduction in runtimes observed compared to the existing Neon code: Cortex-A510: -34.2% Cortex-A720: -17.6% Cortex-X2: -9.6% Bug: libyuv:973 Change-Id: Ief63965f6f1048ea24baf8f4037aabdd184e2925 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5444425 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-29 18:54:02 +00:00
George Steed	6ac90403a1	[AArch64] Add SVE implementation for I422ToARGBRow We need a new macro for reading I422 data, but is otherwise mostly identical to the existing I444ToARGBRow_SVE implementation. Reduction in runtimes observed compared to the existing Neon code: Cortex-A510: -25.0% Cortex-A720: -5.0% Cortex-X2: -10.8% Change-Id: I27ddb604a46a53e61c9bde21f76dbc7bd91e0cef Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5444424 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-27 18:26:11 +00:00
George Steed	95eed2b75f	[AArch64] Add Neon dot-product implementation of HammingDistance We can use the Neon dot-product instructions as a slightly faster widening accumulation. This also has the advantage of widening to 32 bits so avoids the risk of overflow present in the original Neon code. Reduction in runtimes observed for HammingDistance compared to the existing Neon code: Cortex-A55: -4.4% Cortex-A510: -26.5% Cortex-A76: -8.1% Cortex-A720: -15.5% Cortex-X1: -4.1% Cortex-X2: -5.1% Bug: libyuv:977 Change-Id: I9e5e10d228c339d905cb2e668a9811ff0a6af5de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5490049 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-26 18:39:00 +00:00
George Steed	6433029df7	[AArch64] Unroll SumSquareError_NEON_DotProd The kernel is only ever called with count as a multiple of 32 so it is safe to unroll this and maintain two accumulators. Reduction in runtime observed compared to the existing SumSquareError_NEON_DotProd implementation: Cortex-A55: -28.2% Cortex-A510: -27.6% Cortex-A76: -33.0% Cortex-A720: -35.3% Cortex-X1: -16.9% Cortex-X2: -13.3% Bug: libyuv:977 Change-Id: Iee423106c38e97cc38007d73fa80e8374dd96721 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5490048 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-26 16:22:01 +00:00
George Steed	f5882ed1c5	[AArch64] getauxval(AT_HWCAP{,2}) feature detection, attempt #2 This re-lands commit ba0bba5b2b7e38c9365a5d152b4efa0458863213. Now with additional #ifdef __linux__ guards to avoid compiling Linux-specific code on non-Linux platforms. Non-linux feature detection will be added in a separate patch. Using getauxval(AT_HWCAP{,2}) has the advantage of also working under emulation where faking /proc/cpuinfo is not supported. For the Chromium sandbox, getauxval is supported since API version 18. The minimum supported API version at time of writing is 21 so we should be able to use getauxval unconditionally. On the off-chance the call fails it will return 0 and we will correctly fall-back to using only Neon. If we want to read the current CPU implementer or part number we could do this by checking HWCAP_CPUID and then reading MIDR_EL1. This will cause a kernel trap to emulate the EL1 read but should still be a lot faster than reading the whole of /proc/cpuinfo. Bug: libyuv:980 Change-Id: I8ae103ea7e32ef44db72f3c9896417bfe97ff5c5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465590 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 21:26:31 +00:00
George Steed	356232b687	[AArch64] Replace UQXTN{,2} with UZP2 in Convert16To8Row_NEON The existing code makes use of a pair of shifts to put the bits we want in the low part of each vector lane and then a pair of UQXTN and UQXTN2 instructions to perform a saturating cast down from 16-bit elements to 8-bit elements. We can instead achieve the same thing by adding eight to the first shift amount so that the bits we want appear in the high half of the lane, doing the saturation at the same time, and then simply use UZP2 to pull out the high halves of each lane in a single instruction. Reduction in runtime for Convert16To8Row_NEON: Cortex-A55: -19.7% Cortex-A510: -23.5% Cortex-A76: -35.4% Cortex-X2: -34.1% Bug: libyuv:976 Change-Id: I9a80c0f4f2c6b5203f23e422c0970d3167052f91 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463950 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 21:23:55 +00:00
George Steed	4f52235a67	[AArch64] Replace SHRN{,2} pair by UZP2 in DivideRow_16_NEON Shift instructions have worse throughput than other permute instructions on some micro-architectures, and we can avoid the need for two separate narrowing instructions by taking the high halves of each lane directly through use of the UZP2 instruction. Reduction in runtime for DivideRow_16_NEON: Cortex-A55: -6.2% Cortex-A510: -30.0% Cortex-A76: -11.9% Cortex-X2: -46.8% Bug: libyuv:976 Change-Id: I4aa06eab06ab6134bb80bc3af5328a1a83b3d249 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463949 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 21:21:52 +00:00
George Steed	53b65220da	[AArch64] Add Neon dot-product implementation of SumSquareError The Neon dot-product instructions perform two widening steps rather than one, saving us the need to widen the absolute difference to 16-bits before accumulating. Additionally, the dot-product instructions tend to have better performance characteristics than traditional widening multiply instructions like SMLAL used in the existing SumSquareError_NEON code. Observed reduction in runtimes compared to the existing Neon kernel: Cortex-A55: -9.1% Cortex-A510: -36.7% Cortex-A76: -37.6% Cortex-A720: -48.8% Cortex-X1: -56.1% Cortex-X2: -42.6% Bug: libyuv:977 Change-Id: Ie20c69040cc47a803d8e95620d31e0bf1e1dac12 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463945 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 20:54:48 +00:00
George Steed	9e223c3fc0	[AArch64] Replace instances of ORR with MOV where possible The MOV instruction is an alias of ORR where both registers are the same and should be preferred. Both ORR and MOV are not zero-cost instructions on all micro-architectures so there may be better ways to express these kernels, but this is left for a later commit. Bug: libyuv:975 Change-Id: I29b7f182a57a61855cb7f8a867691080f153b10b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5332385 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 20:48:16 +00:00
Frank Barchard	fe51553f5f	Revert "[AArch64] Use getauxval(AT_HWCAP{,2}) for feature detection" This reverts commit ba0bba5b2b7e38c9365a5d152b4efa0458863213. Reason for revert: breaks builds on windows and mac Step _compile_ failed. Error logs are shown below: [1/104] CXX obj/libyuv_internal/cpu_id.o FAILED: obj/libyuv_internal/cpu_id.o ../../buildtools/reclient/rewrapper -cfg=../../buildtools/reclient_cfgs/chromium-browser-clang/rewra...(too long) ../../source/cpu_id.cc:25:10: fatal error: 'sys/auxv.h' file not found 25 \| #include // For getauxval() \| ^~~~~~~~~~~~ 1 error generated. More information in raw_io.output_text[failure_summary] Original change's description: > [AArch64] Use getauxval(AT_HWCAP{,2}) for feature detection > > This has the advantage of also working under emulation where > faking /proc/cpuinfo is not supported. > > For the Chromium sandbox, getauxval is supported since API version 18. > The minimum supported API version at time of writing is 21 so we should > be able to use getauxval unconditionally. On the off-chance the call > fails it will return 0 and we will correctly fall-back to using only > Neon. > > Change-Id: Ibbaa9caec1915ac0725c42d6cd2abc7ce19786c7 > Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5453620 > Reviewed-by: Frank Barchard <fbarchard@chromium.org> Change-Id: Ic0f764217af7b4d998f19a8f78fc04ca85a45a3b No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463918 Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:52:22 +00:00
George Steed	73f6e82b1a	[AArch64] Add missing clobber, fix zero-init for compare kernels The "memory" clobber needs to be present even if the asm does not store anything to memory, since otherwise the compiler would be allowed to reorder earlier stores to the pointers after they would be needed by the asm. Also fix up the zero-initialisation of accumulators in SumSquareError_NEON, since EOR'ing a register by itself is not a recognised zeroing idiom on most AArch64 micro-architectures. Bug: libyuv:976 Change-Id: I3175367abf6f59db8371b4478f1156950277d7c5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5378705 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:38:06 +00:00
George Steed	ba0bba5b2b	[AArch64] Use getauxval(AT_HWCAP{,2}) for feature detection This has the advantage of also working under emulation where faking /proc/cpuinfo is not supported. For the Chromium sandbox, getauxval is supported since API version 18. The minimum supported API version at time of writing is 21 so we should be able to use getauxval unconditionally. On the off-chance the call fails it will return 0 and we will correctly fall-back to using only Neon. Change-Id: Ibbaa9caec1915ac0725c42d6cd2abc7ce19786c7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5453620 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:37:04 +00:00
George Steed	4838e7a194	[AArch64] Load full vectors in ARGB{Add,Subtract}Row Using full vectors for Add and Subtract is a win across the board. Using full vectors for the multiply is less obviously a win, especially for smaller cores like Cortex-A53 or Cortex-A57, so is not considered for this change. Observed changes in performance with this change compared to the existing Neon code: \| ARGBAddRow_NEON \| ARGBSubtractRow_NEON Cortex-A55 \| -5.1% \| -5.1% Cortex-A510 \| -18.4% \| -18.4% Cortex-A76 \| -28.9% \| -28.7% Cortex-A720 \| -36.1% \| -36.2% Cortex-X1 \| -14.2% \| -14.4% Cortex-X2 \| -12.5% \| -12.5% Bug: libyuv:976 Change-Id: I85316d4399c93b53baa62d0d43b2fa453517f5b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5457433 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 19:02:43 +00:00
George Steed	90070986ae	[AArch64] Improve RGB565TOARGB using SRI instructions The existing code performs a lot of shifts and combines the R and B components into a single vector unnecessarily. We can express this much more cleanly by making use of the SRI instruction to insert and replace shifted bits into the original data, performing the 5/6-bit to 8-bit expansion in a single instruction if the source bits are already in the high bits of the byte. We still need a single separate XTN instruction to narrow the B component before the left shift since Neon does not have a narrowing left shift instruction. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 RGB565ToYRow_NEON \| -22.1% \| -23.4% \| -25.1% RGB565ToUVRow_NEON \| -26.8% \| -20.5% \| -18.8% RGB565ToARGBRow_NEON \| -38.9% \| -32.0% \| -23.5% Bug: libyuv:976 Change-Id: I77b8d58287b70dbb9549451fc15ed3dd0d2a4dda Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5374286 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-04-18 19:01:26 +00:00
George Steed	1ca7c4e1cc	[AArch64] Avoid lane-indexed loads for UV when loading I444/I422 Most micro-architectures seem to prefer an additional ZIP1 instruction in READYUV422 to needing a lane-indexed LD1 load instruction. We introduce a new macro to handle the YUV to RGB conversion where the U and V components are in separate vectors. This avoids causing a slowdown for the UV-interleaved input format kernels (NV12 and NV21) where we do not want to separate them. Reduction in runtime for selected kernels on Cortex cores (no performance difference observed on Cortex-A55): A510 A76 A720 X1 X2 I422AlphaToARGBRow_NEON -4.3% -7.3% -10.1% -4.0% -4.4% I422ToARGB1555Row_NEON -4.5% +0.4% -7.9% -4.8% -3.9% I422ToARGB4444Row_NEON -7.7% -2.6% -4.1% -1.9% -1.3% I422ToARGBRow_NEON -3.7% -2.9% -10.2% -3.8% -4.4% I422ToRGB24Row_NEON -5.9% +5.4% -3.2% -4.3% -4.3% I422ToRGB565Row_NEON -4.8% -2.8% -8.5% -3.8% -4.6% I422ToRGBARow_NEON -3.7% +4.6% -10.5% -3.0% -4.5% I444AlphaToARGBRow_NEON -3.5% +2.7% -3.7% -5.0% -8.2% I444ToARGBRow_NEON -1.8% -15.1% -3.5% -6.5% -8.1% I444ToRGB24Row_NEON -2.0% -6.8% +0.1% -4.7% +1.2% There are a few cases which are slower on Cortex-A76, but significant speedups elsewhere. Bug: libyuv:976 Change-Id: Ib3b4ef81f7bfc1d7ff9c4c24aef9ad86741410ff Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465580 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:46:59 +00:00
George Steed	bfedc8bc11	[AArch64] Improve ARGB{,1}555TOARGB using SRI instructions The existing transformations can be more cleanly expressed by using SRI instructions to perform a shift and simultaneously merge in to an existing value. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 ARGB1555ToYRow_NEON \| -26.2% \| -14.9% \| -28.2% ARGB1555ToUVRow_NEON \| -25.2% \| -18.4% \| -20.9% ARGB1555ToARGBRow_NEON \| -43.6% \| -32.8% \| -19.7% Bug: libyuv:976 Change-Id: Id07ac6f2cd3eb9bb70f9e29fc1f4b29fe26156ec Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5383444 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:46:10 +00:00
George Steed	95b0a3326c	[AArch64] Improve ARGBTOARGB4444 using SRI instructions The existing sequence to convert from 8-bit ARGB to 4-bit ARGB4444 makes use of a lot of shifts and bit-clears before ORR'ing the pairs together. This is unnecessary since we can do the same with the SRI instruction, so use that instead. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 ARGBToARGB4444Row_NEON \| -15.3% \| -16.6% I422ToARGB4444Row_NEON \| -2.7% \| -11.9% Bug: libyuv:976 Change-Id: I86cd86c7adf1105558787a679272179821f31a9d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5383443 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:26:27 +00:00
George Steed	b265c311b7	[AArch64] Avoid unnecessary work in READYUV400 The value of UV components in the vector are known and the vectors are never overwritten, so we can hoist the UV-specific parts of the calculation out of the loop. Reduction in runtimes for I400ToARGBRow_NEON: Cortex-A55: -10.0% Cortex-A510: -3.7% Cortex-A76: -19.3% Cortex-X2: -14.4% Bug: libyuv:976 Change-Id: I17d6de4e1790f71407e12ff84548568cc3ebbe1a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5457434 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-17 16:47:58 +00:00
George Steed	ea56460300	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBMultiplyRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBMultiplyRow_NEON: Cortex-A55: -22.3% Cortex-A510: -56.6% Cortex-A76: -45.5% Cortex-X2: -54.6% Change-Id: I9103111a109a4d87d358e06eb513746314aaf66a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454832 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:28:56 +00:00
George Steed	7266cda79c	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBSubtractRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBSubtractRow_NEON: Cortex-A55: -15.0% Cortex-A510: -59.8% Cortex-A76: -54.4% Cortex-X2: -70.4% Change-Id: Ifbfce9e6a45159932c09d9b0229215a36fa22f43 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454833 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:04:43 +00:00
George Steed	e646991347	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBAddRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBAddRow_NEON: Cortex-A55: -15.0% Cortex-A510: -59.8% Cortex-A76: -54.4% Cortex-X2: -70.4% Change-Id: Id04e5259d8e5e7511dad5df85cdf9759b392cb99 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454831 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:03:44 +00:00
Cosmina Dunca	9d200b704f	[AArch64] Optimize ScaleARGBRowDown2Box_NEON Use a pair of LD2s to load data interleaved and perform a couple of additions on the registers in order to avoid needing LD4 and ST4 instructions, since these are costly on some micro-architectures. Reduction in run times: Cortex-A55: -20.5% Cortex-A510: -28.3% Cortex-A76: -21.5% Bug: libyuv:976 Change-Id: If66e1e148b031c2cd288ff412f351d7a0b9b91e7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5371774 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-10 20:07:22 +00:00
Cosmina Dunca	9441ddd883	[AArch64] Optimize ScaleARGBRowDownEven_NEON Replace indexed LD1 instructions with LDRs to avoid loop-carried dependencies on unused lanes between consecutive iterations of the loop. Reduction in run times: Cortex-A55: -10.9% Cortex-A510: -70.7% Cortex-A76: -56.8% Bug: libyuv:976 Change-Id: Ia767e76002c7823177e80163ebf034e023e9a6cc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5371771 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-04-10 20:03:39 +00:00
George Steed	e52007eff9	[AArch64] Add SVE2 implementation for I444ToARGBRow Being able to use SVE2 functionality for these kernels has a number of performance wins compared to the existing Neon code: * For the Y component calculation we are able to use UMULH, versus the existing UMULL x2 + UZP2 sequence in Neon. * For the RGBTORGBA8 calculation we are able to take advantage of interleaving narrowing instructions, allowing us to use ST2 rather than ST4 for the store. This is a big performance win on some micro-architectures where ST4 is costly. * The use of predication means we do not need to add "any" kernels, we can simply rerun the calculation with a not-full predicate for the final iteration. To avoid the overhead of generating a predicate register on every iteration we duplicate the loop body and only generate a predicate on the final iteration of the loop. This costs a small amount on the final iteration but should still be significantly quicker than the overhead of a function call needed by the "any" cases. Duplicating the loop body to reduce the use of the WHILELT instruction improves little core performance by ~12% by itself but has negligable impact on other micro-architectures. Reduction in runtime for the new SVE2 implementation compared to the existing Neon implementation on selected micro-architectures: Cortex-A510: -36.5% Cortex-A720: -17.3% Cortex-X2: -11.3% Bug: libyuv:973 Change-Id: I2a485f0dfa077a56f96b80a667ad38bbea47b4b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424739 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:11:01 +00:00
George Steed	9a8be20def	[AArch64] Add :libyuv_sve library in preparation for SVE kernels This commit only adds the bare minimum to get the new library building through GN, the actual content of row_sve.cc is empty for now until we start porting some kernels across. Bug: libyuv:973 Change-Id: Ibdf4fc258761f3e507d700f27a405099c667ac75 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424738 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:10:01 +00:00
George Steed	f2e78e1304	[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow Using the dot-product instructions here allows us to avoid needing LD4 for loading individual colour channels, which gives a big benefit on some micro-architectures where such instructions perform significantly worse than LD1. In addition the dot-product instructions have higher throughput compared to the Neon Observed reduction in runtimes for selected kernels moving from _NEON to _NEON_DotProd: Kernel \| Cortex-A55 \| Cortex-A510 \| Cortex-A76 \| Cortex-X2 ABGRToYJRow \| -6.5% \| -22.5% \| -43.5% \| -71.2% ABGRToYRow \| -6.5% \| -22.5% \| -43.5% \| -68.3% ARGBToYJRow \| -6.5% \| -22.5% \| -43.5% \| -68.1% ARGBToYRow \| -6.5% \| -22.5% \| -43.5% \| -68.1% BGRAToYRow \| -6.5% \| -22.5% \| -42.3% \| -68.4% RGBAToYJRow \| -6.5% \| -22.5% \| -42.2% \| -73.7% RGBAToYRow \| -6.5% \| -22.5% \| -42.3% \| -64.9% Bug: libyuv:977 Change-Id: If244190a7bdacf7e6e6b16af7e6853ee13ff6585 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424737 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:09:36 +00:00
George Steed	a038cda7b8	[AArch64] Enable detection of additional architecture features In particular there are a few extensions that are interesting for us: * FEAT_DotProd adds 4-way dot-product instructions which are useful in e.g. ARGBToY. * FEAT_I8MM adds additional mixed-sign dot-product instructions which could be useful in e.g. ARGBToUV. * FEAT_SVE and FEAT_SVE2 add support for the Scalable Vector Extension, which adds an array of new instructions including new widening loads and narrowing stores for dealing with mixed-width integer arithmetic efficiently and predication for avoiding the need for "any" cleanup loops. This commit simply adds support for detecting the presence of these features by extending the existing /proc/cpuinfo parsing, splitting it into separate Arm and AArch64 functions for simplicity. Since we have no space left in the bitset entries between Arm and X86 entries, we reuse some of the X86 entries for new AArch64 extensions. This doesn't seem obviously problematic as long as we avoid setting kCpuHasX86. Bug: libyuv:973 Bug: libyuv:977 Change-Id: I8e256225fe12a4ba5da24460f54061e16eab6c57 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5378150 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-05 17:48:22 +00:00
George Steed	ba796a32e7	[AArch64] Remove out of date TODO around ARGBMultiplyRow_NEON The comment refers to the code needing to be re-enabled but as far as I can tell it is already enabled, so simply remove the comment. Change-Id: Id014e8b7f5cd43c8211e1d38758299de2fad49de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5387650 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-25 22:44:45 +00:00
George Steed	5d694bec38	[AArch64] Replace UQSHRN{,2} pair by UZP2 in YUVTORGB The existing Neon code makes use of a pair of UQSHRN and UQSHRN2 instructions to extract the top half of a widened multiply result. These instructions would ordinarily saturate, however saturation can never happen in this case since we are shifting by 16 to get the top half of each element, the top bits remain as-is. We could move this to using a slightly simpler non-saturating shift, however in this case it is simpler and faster to just use UZP2 to extract the top half of each 32-bit lane directly. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 I400ToARGBRow_NEON \| -9.4% \| -14.9% \| -13.9% I422AlphaToARGBRow_NEON \| -7.9% \| -11.4% \| -11.5% I422ToARGB1555Row_NEON \| -7.3% \| -17.2% \| -14.7% I422ToARGB4444Row_NEON \| -7.6% \| -17.9% \| -13.7% I422ToARGBRow_NEON \| -8.2% \| -9.8% \| -11.9% I422ToRGB24Row_NEON \| -8.0% \| -13.3% \| -12.8% I422ToRGB565Row_NEON \| -7.5% \| -15.1% \| -14.6% I422ToRGBARow_NEON \| -8.3% \| -13.1% \| -12.2% I444AlphaToARGBRow_NEON \| -8.3% \| -7.6% \| -12.7% I444ToARGBRow_NEON \| -8.6% \| -3.5% \| -13.5% I444ToRGB24Row_NEON \| -8.5% \| -7.8% \| -13.4% NV12ToARGBRow_NEON \| -8.8% \| -1.4% \| -12.0% NV12ToRGB24Row_NEON \| -8.5% \| -11.5% \| -12.3% NV12ToRGB565Row_NEON \| -7.9% \| -15.0% \| -15.7% NV21ToARGBRow_NEON \| -8.7% \| -1.6% \| -12.3% NV21ToRGB24Row_NEON \| -8.4% \| -11.5% \| -12.0% UYVYToARGBRow_NEON \| -8.8% \| -8.9% \| -11.9% YUY2ToARGBRow_NEON \| -8.7% \| -10.8% \| -13.3% Bug: libyuv:976 Change-Id: I6c505fe722e5f91f93718b85fe881ad056d8602d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5366653 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-14 20:04:46 +00:00
George Steed	8d0d885c2f	[AArch64] Avoid LD2 in YUY2ToARGBRow_NEON In this case we have an LD2 instruction followed by a pair of permutes (ZIP1 and TBL). On some micro-architectures LD2 involves use of the vector pipelines, so in these cases it is preferable to do an LD1 and then a different pair of permutes (TRN + TBL) instead to avoid the extra vector pipeline usage. Reduction in runtime on selected kernels (no observed performance delta on Cortex-A55): Kernel \| Cortex-A76 \| Cortex-X2 UYVYToARGBRow_NEON \| -2.6% \| -8.8% YUY2ToARGBRow_NEON \| -6.2% \| -4.9% Bug: libyuv:976 Change-Id: I7ca45e0c7bf7cb50cc5ab37c6a01215d9689039a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5366652 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-14 19:51:05 +00:00
George Steed	188e4e3afb	[AArch64] Avoid unnecessary lane-indexed loads in READYUV The existing code makes use of a pair of lane-indexed load instructions to fill the two halves of the input vector, however this has the effect of introducing an unnecessary dependency on the value of the vector from the previous loop iteration. This doesn't really seem to affect little core performance since these cores never execute enough work concurrently to hit the bottleneck, however we can improve performance on mid and big cores quite a bit by using LDR instead of LD1 to load the low lane, zeroing the upper portion of the vector rather than keeping the previous value. Reduction in runtime for select kernels (no observed performance delta on Cortex-A55): Kernel \| Cortex-A76 \| Cortex-X2 I422ToARGB4444Row_NEON \| -23.1% \| -49.3% I422ToARGBRow_NEON \| -1.2% \| -2.5% I422ToRGB24Row_NEON \| -11.7% \| -7.0% I422ToRGBARow_NEON \| -4.7% \| -3.4% I444AlphaToARGBRow_NEON \| -1.1% \| -2.4% I444ToARGBRow_NEON \| -1.6% \| -3.2% I444ToRGB24Row_NEON \| -9.6% \| -6.8% Bug: libyuv:976 Change-Id: I8c9413e0e6ed97b8f060ce42b6e8abdfb77914b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5365868 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-03-13 18:35:31 +00:00
George Steed	772bddaed7	Add missing memory/cc clobbers to AArch64 Neon kernels There are a few functions in source/scale_neon64.cc which write memory and set condition flags despite not declaring this in the asm clobber list, so add the missing clobbers. Also move a couple of memory/cc clobbers to the start of the clobber list to match other kernels. Bug: libyuv:974 Change-Id: I85f5ff5718e78a4481f7bc53cedaeceb14438895 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5309254 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-03-04 10:22:51 +00:00
Frank Barchard	b66c42d4a8	Revert "AMX detect OS support for linux kernel" This reverts commit 8c8a33762d64b916ae8469cc3fc602a64080a23a. Reason for revert: breaks sandbox Original change's description: > AMX detect OS support for linux kernel > > Bug: b/327013106 > Change-Id: Ie1784249f3a121c52e6504ff502bdc3eb245d858 > Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5329907 > Commit-Queue: Frank Barchard <fbarchard@chromium.org> > Reviewed-by: richard winterton <rrwinterton@gmail.com> Bug: b/327013106 Change-Id: If54bb84bc1167177c1869763f6ccfdf1f92fbe09 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5332617 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-02-29 00:33:29 +00:00
Frank Barchard	8c8a33762d	AMX detect OS support for linux kernel Bug: b/327013106 Change-Id: Ie1784249f3a121c52e6504ff502bdc3eb245d858 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5329907 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2024-02-28 03:13:44 +00:00
Frank Barchard	a6a2ec654b	Add AMXINT8 cpu detect sde -spr -- libyuv_test -- --gunit_filter=Cpu Note: Google Test filter = Cpu [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 3 tests from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x57fff9 Has X86 0x8 Has SSE2 0x10 Has SSSE3 0x20 Has SSE41 0x40 Has SSE42 0x80 Has AVX 0x100 Has AVX2 0x200 Has ERMS 0x400 Has FMA3 0x800 Has F16C 0x1000 Has AVX512BW 0x2000 Has AVX512VL 0x4000 Has AVX512VNNI 0x8000 Has AVX512VBMI 0x10000 Has AVX512VBMI2 0x20000 Has AVX512VBITALG 0x40000 Has AVX10 0x0 HAS AVXVNNI 0x100000 Has AVXVNNIINT8 0x0 Has AMXINT8 0x400000 [ OK ] LibYUVBaseTest.TestCpuHas (34 ms) Bug: b/324356616 Change-Id: I5129b8946363a501bdd570e6dba3936c54aacd6c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5283433 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-02-15 21:44:47 +00:00
Hans Wennborg	2f2c04c157	Drop TARGET_IPHONE_SIMULATOR macro check Recent versions of Clang always define these TARGET_ macros (to 0 or 1 as appropriate) for Apple targets. https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5249072 made the code correctly check the value of the macro rather than whether it was defined or not. However, the code was still broken when actually targeting the iOS simulator (where the macro is now 1). It seems the use of this macro was just incorrect, and the code only worked since it was never defined at all. The original use of the macro in this file was added in `2c8108e6c2` but it 's not quite clear to me why. All other uses have subsequently been removed, e.g. in `6a1d01220a` this removes the last instance, and should fix the iOS simulator builds. Bug: chromium:1519899 Change-Id: Iaf44d2c37086f1153096044df5d9b61797f66a4f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5272224 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-02-06 17:38:45 +00:00
Hans Wennborg	d359a9f922	Correctly check the TARGET_IPHONE_SIMULATOR macro The macro may be defined to 0; the code needs to check the value, not just whether it's defined. Recent Clang versions will define all Apple "target OS" macros by default (see bug). Bug: chromium:1519899 Change-Id: I3d61f1b23de06d7db7db7916182a789f26345bce Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5249072 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-01-31 19:33:56 +00:00

1 2 3 4 5 ...

1803 Commits