libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-02-16 07:09:53 +08:00

Author	SHA1	Message	Date
George Steed	ee830a5f77	[AArch64] Enable feature detection on Windows and Apple Silicon Using the platform-specific functions IsProcessorFeaturePresent and sysctlbyname to check individual features. Bug: libyuv:980 Change-Id: I7971238ca72e5df862c30c2e65331c46dc634074 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465591 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-05-03 18:42:51 +00:00
George Steed	a114f85e50	[AArch64] Fix naming in ARGBToUVMatrixRow_SVE2 etc constants Avoid abbreviations and capitalize ARGB and UV naming, as suggested here: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5505537 Bug: libyuv:973 Change-Id: I0d0143154594c03e6aca7c859b874e39634ca54f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5513544 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-05-03 17:25:14 +00:00
George Steed	6f1d8b1e11	[AArch64] Add SVE2 implementations for ARGBToUVRow and similar By maintaining the interleaved format of the data we can use a common kernel for all input channel orderings and simply pass a different vector of constants instead. A similar approach is possible with only Neon by making use of multiplies and repeated application of ADDP to combine channels, however this is slower on older cores like Cortex-A53 so is not pursued further. For odd problem sizes we need a slightly different implementation for the final element, so introduce an "any" kernel to address that rather than bloating the code for the common case. Observed affect on runtimes compared to the existing Neon kernels: \| Cortex-A510 \| Cortex-A720 \| Cortex-X2 ABGRToUVJRow \| -15.5% \| +5.4% \| -33.1% ABGRToUVRow \| -15.6% \| +5.3% \| -35.9% ARGBToUVJRow \| -10.1% \| +5.4% \| -32.7% ARGBToUVRow \| -10.1% \| +5.4% \| -29.3% BGRAToUVRow \| -15.5% \| +4.6% \| -32.8% RGBAToUVRow \| -10.1% \| +4.2% \| -36.0% Bug: libyuv:973 Change-Id: I041ca44db0ae8a2adffcdf24e822eebe962baf33 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5505537 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-05-01 19:46:43 +00:00
George Steed	67e5e79dbe	[AArch64] Add Neon implementation of HashDjb2 Reduction in runtime observed compared to the existing C code compiled with LLVM 18: Cortex-A55: -46.2% Cortex-A510: -60.4% Cortex-A76: -82.9% Cortex-A720: -87.4% Cortex-X1: -90.0% Cortex-X2: -91.7% Change-Id: I39a4479f78299508043a864e64fb40578c66ce19 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5494094 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-05-01 19:37:31 +00:00
George Steed	1eae2efbc7	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBShadeRow_NEON The use of LD4 and ST4 to de-interleave ARGB color channels is unnecessary here since we can just adjust the scale multiplicand to match the interleaved layout. LD4 and ST4 are known to perform poorly on some micro-architectures so using LD1 and ST1 here should be preferred. Reduction in runtime for ARGBShadeRow_NEON: Cortex-A55: -19.9% Cortex-A510: -50.8% Cortex-A76: -36.0% Cortex-X2: -46.4% Bug: libyuv:976 Change-Id: I10a0e6a0a62242826d39b1e963063770f084226a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5494093 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-30 00:48:35 +00:00
George Steed	ce32eb773f	[AArch64] Avoid extraneous CMP in I{444,422}ToARGBRow_SVE2 impl We can use subs to set condition flags as part of the subtract, no need for a separate compare instruction. No performance difference observed from this change, but it now matches the other SVE2 kernels. Also remove unnecessary volatile from asm blocks. Bug: libyuv:973 Change-Id: I9bb4f5f1101086602f7d5223feaeae0fb63b385c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463951 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-29 18:56:22 +00:00
George Steed	f483007b9a	[AArch64] Add SVE implementation for I422AlphaToARGBRow This is mostly identical to the existing I422ToARGBRow_SVE implementation, we just need to make sure to load the alpha component rather than hard-coding it to 255. Reduction in runtimes observed compared to the existing Neon code: Cortex-A510: -32.1% Cortex-A720: -5.1% Cortex-X2: -10.1% Bug: libyuv:973 Change-Id: I6f800f3ef59f1dc82b409233017b3cb108da0257 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5444426 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-29 18:54:07 +00:00
George Steed	b53b27d6bf	[AArch64] Add SVE implementation for I444AlphaToARGBRow This is mostly identical to the existing I444ToARGBRow_SVE implementation, we just need to make sure to load the alpha component rather than hard-coding it to 255. Reduction in runtimes observed compared to the existing Neon code: Cortex-A510: -34.2% Cortex-A720: -17.6% Cortex-X2: -9.6% Bug: libyuv:973 Change-Id: Ief63965f6f1048ea24baf8f4037aabdd184e2925 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5444425 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-29 18:54:02 +00:00
George Steed	6ac90403a1	[AArch64] Add SVE implementation for I422ToARGBRow We need a new macro for reading I422 data, but is otherwise mostly identical to the existing I444ToARGBRow_SVE implementation. Reduction in runtimes observed compared to the existing Neon code: Cortex-A510: -25.0% Cortex-A720: -5.0% Cortex-X2: -10.8% Change-Id: I27ddb604a46a53e61c9bde21f76dbc7bd91e0cef Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5444424 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-27 18:26:11 +00:00
George Steed	95eed2b75f	[AArch64] Add Neon dot-product implementation of HammingDistance We can use the Neon dot-product instructions as a slightly faster widening accumulation. This also has the advantage of widening to 32 bits so avoids the risk of overflow present in the original Neon code. Reduction in runtimes observed for HammingDistance compared to the existing Neon code: Cortex-A55: -4.4% Cortex-A510: -26.5% Cortex-A76: -8.1% Cortex-A720: -15.5% Cortex-X1: -4.1% Cortex-X2: -5.1% Bug: libyuv:977 Change-Id: I9e5e10d228c339d905cb2e668a9811ff0a6af5de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5490049 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-26 18:39:00 +00:00
George Steed	6433029df7	[AArch64] Unroll SumSquareError_NEON_DotProd The kernel is only ever called with count as a multiple of 32 so it is safe to unroll this and maintain two accumulators. Reduction in runtime observed compared to the existing SumSquareError_NEON_DotProd implementation: Cortex-A55: -28.2% Cortex-A510: -27.6% Cortex-A76: -33.0% Cortex-A720: -35.3% Cortex-X1: -16.9% Cortex-X2: -13.3% Bug: libyuv:977 Change-Id: Iee423106c38e97cc38007d73fa80e8374dd96721 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5490048 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-26 16:22:01 +00:00
George Steed	f5882ed1c5	[AArch64] getauxval(AT_HWCAP{,2}) feature detection, attempt #2 This re-lands commit ba0bba5b2b7e38c9365a5d152b4efa0458863213. Now with additional #ifdef __linux__ guards to avoid compiling Linux-specific code on non-Linux platforms. Non-linux feature detection will be added in a separate patch. Using getauxval(AT_HWCAP{,2}) has the advantage of also working under emulation where faking /proc/cpuinfo is not supported. For the Chromium sandbox, getauxval is supported since API version 18. The minimum supported API version at time of writing is 21 so we should be able to use getauxval unconditionally. On the off-chance the call fails it will return 0 and we will correctly fall-back to using only Neon. If we want to read the current CPU implementer or part number we could do this by checking HWCAP_CPUID and then reading MIDR_EL1. This will cause a kernel trap to emulate the EL1 read but should still be a lot faster than reading the whole of /proc/cpuinfo. Bug: libyuv:980 Change-Id: I8ae103ea7e32ef44db72f3c9896417bfe97ff5c5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465590 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 21:26:31 +00:00
George Steed	356232b687	[AArch64] Replace UQXTN{,2} with UZP2 in Convert16To8Row_NEON The existing code makes use of a pair of shifts to put the bits we want in the low part of each vector lane and then a pair of UQXTN and UQXTN2 instructions to perform a saturating cast down from 16-bit elements to 8-bit elements. We can instead achieve the same thing by adding eight to the first shift amount so that the bits we want appear in the high half of the lane, doing the saturation at the same time, and then simply use UZP2 to pull out the high halves of each lane in a single instruction. Reduction in runtime for Convert16To8Row_NEON: Cortex-A55: -19.7% Cortex-A510: -23.5% Cortex-A76: -35.4% Cortex-X2: -34.1% Bug: libyuv:976 Change-Id: I9a80c0f4f2c6b5203f23e422c0970d3167052f91 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463950 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 21:23:55 +00:00
George Steed	4f52235a67	[AArch64] Replace SHRN{,2} pair by UZP2 in DivideRow_16_NEON Shift instructions have worse throughput than other permute instructions on some micro-architectures, and we can avoid the need for two separate narrowing instructions by taking the high halves of each lane directly through use of the UZP2 instruction. Reduction in runtime for DivideRow_16_NEON: Cortex-A55: -6.2% Cortex-A510: -30.0% Cortex-A76: -11.9% Cortex-X2: -46.8% Bug: libyuv:976 Change-Id: I4aa06eab06ab6134bb80bc3af5328a1a83b3d249 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463949 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 21:21:52 +00:00
George Steed	53b65220da	[AArch64] Add Neon dot-product implementation of SumSquareError The Neon dot-product instructions perform two widening steps rather than one, saving us the need to widen the absolute difference to 16-bits before accumulating. Additionally, the dot-product instructions tend to have better performance characteristics than traditional widening multiply instructions like SMLAL used in the existing SumSquareError_NEON code. Observed reduction in runtimes compared to the existing Neon kernel: Cortex-A55: -9.1% Cortex-A510: -36.7% Cortex-A76: -37.6% Cortex-A720: -48.8% Cortex-X1: -56.1% Cortex-X2: -42.6% Bug: libyuv:977 Change-Id: Ie20c69040cc47a803d8e95620d31e0bf1e1dac12 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463945 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 20:54:48 +00:00
George Steed	9e223c3fc0	[AArch64] Replace instances of ORR with MOV where possible The MOV instruction is an alias of ORR where both registers are the same and should be preferred. Both ORR and MOV are not zero-cost instructions on all micro-architectures so there may be better ways to express these kernels, but this is left for a later commit. Bug: libyuv:975 Change-Id: I29b7f182a57a61855cb7f8a867691080f153b10b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5332385 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-25 20:48:16 +00:00
Frank Barchard	fe51553f5f	Revert "[AArch64] Use getauxval(AT_HWCAP{,2}) for feature detection" This reverts commit ba0bba5b2b7e38c9365a5d152b4efa0458863213. Reason for revert: breaks builds on windows and mac Step _compile_ failed. Error logs are shown below: [1/104] CXX obj/libyuv_internal/cpu_id.o FAILED: obj/libyuv_internal/cpu_id.o ../../buildtools/reclient/rewrapper -cfg=../../buildtools/reclient_cfgs/chromium-browser-clang/rewra...(too long) ../../source/cpu_id.cc:25:10: fatal error: 'sys/auxv.h' file not found 25 \| #include // For getauxval() \| ^~~~~~~~~~~~ 1 error generated. More information in raw_io.output_text[failure_summary] Original change's description: > [AArch64] Use getauxval(AT_HWCAP{,2}) for feature detection > > This has the advantage of also working under emulation where > faking /proc/cpuinfo is not supported. > > For the Chromium sandbox, getauxval is supported since API version 18. > The minimum supported API version at time of writing is 21 so we should > be able to use getauxval unconditionally. On the off-chance the call > fails it will return 0 and we will correctly fall-back to using only > Neon. > > Change-Id: Ibbaa9caec1915ac0725c42d6cd2abc7ce19786c7 > Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5453620 > Reviewed-by: Frank Barchard <fbarchard@chromium.org> Change-Id: Ic0f764217af7b4d998f19a8f78fc04ca85a45a3b No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5463918 Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:52:22 +00:00
George Steed	73f6e82b1a	[AArch64] Add missing clobber, fix zero-init for compare kernels The "memory" clobber needs to be present even if the asm does not store anything to memory, since otherwise the compiler would be allowed to reorder earlier stores to the pointers after they would be needed by the asm. Also fix up the zero-initialisation of accumulators in SumSquareError_NEON, since EOR'ing a register by itself is not a recognised zeroing idiom on most AArch64 micro-architectures. Bug: libyuv:976 Change-Id: I3175367abf6f59db8371b4478f1156950277d7c5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5378705 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:38:06 +00:00
George Steed	ba0bba5b2b	[AArch64] Use getauxval(AT_HWCAP{,2}) for feature detection This has the advantage of also working under emulation where faking /proc/cpuinfo is not supported. For the Chromium sandbox, getauxval is supported since API version 18. The minimum supported API version at time of writing is 21 so we should be able to use getauxval unconditionally. On the off-chance the call fails it will return 0 and we will correctly fall-back to using only Neon. Change-Id: Ibbaa9caec1915ac0725c42d6cd2abc7ce19786c7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5453620 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:37:04 +00:00
George Steed	4838e7a194	[AArch64] Load full vectors in ARGB{Add,Subtract}Row Using full vectors for Add and Subtract is a win across the board. Using full vectors for the multiply is less obviously a win, especially for smaller cores like Cortex-A53 or Cortex-A57, so is not considered for this change. Observed changes in performance with this change compared to the existing Neon code: \| ARGBAddRow_NEON \| ARGBSubtractRow_NEON Cortex-A55 \| -5.1% \| -5.1% Cortex-A510 \| -18.4% \| -18.4% Cortex-A76 \| -28.9% \| -28.7% Cortex-A720 \| -36.1% \| -36.2% Cortex-X1 \| -14.2% \| -14.4% Cortex-X2 \| -12.5% \| -12.5% Bug: libyuv:976 Change-Id: I85316d4399c93b53baa62d0d43b2fa453517f5b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5457433 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 19:02:43 +00:00
George Steed	90070986ae	[AArch64] Improve RGB565TOARGB using SRI instructions The existing code performs a lot of shifts and combines the R and B components into a single vector unnecessarily. We can express this much more cleanly by making use of the SRI instruction to insert and replace shifted bits into the original data, performing the 5/6-bit to 8-bit expansion in a single instruction if the source bits are already in the high bits of the byte. We still need a single separate XTN instruction to narrow the B component before the left shift since Neon does not have a narrowing left shift instruction. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 RGB565ToYRow_NEON \| -22.1% \| -23.4% \| -25.1% RGB565ToUVRow_NEON \| -26.8% \| -20.5% \| -18.8% RGB565ToARGBRow_NEON \| -38.9% \| -32.0% \| -23.5% Bug: libyuv:976 Change-Id: I77b8d58287b70dbb9549451fc15ed3dd0d2a4dda Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5374286 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-04-18 19:01:26 +00:00
George Steed	1ca7c4e1cc	[AArch64] Avoid lane-indexed loads for UV when loading I444/I422 Most micro-architectures seem to prefer an additional ZIP1 instruction in READYUV422 to needing a lane-indexed LD1 load instruction. We introduce a new macro to handle the YUV to RGB conversion where the U and V components are in separate vectors. This avoids causing a slowdown for the UV-interleaved input format kernels (NV12 and NV21) where we do not want to separate them. Reduction in runtime for selected kernels on Cortex cores (no performance difference observed on Cortex-A55): A510 A76 A720 X1 X2 I422AlphaToARGBRow_NEON -4.3% -7.3% -10.1% -4.0% -4.4% I422ToARGB1555Row_NEON -4.5% +0.4% -7.9% -4.8% -3.9% I422ToARGB4444Row_NEON -7.7% -2.6% -4.1% -1.9% -1.3% I422ToARGBRow_NEON -3.7% -2.9% -10.2% -3.8% -4.4% I422ToRGB24Row_NEON -5.9% +5.4% -3.2% -4.3% -4.3% I422ToRGB565Row_NEON -4.8% -2.8% -8.5% -3.8% -4.6% I422ToRGBARow_NEON -3.7% +4.6% -10.5% -3.0% -4.5% I444AlphaToARGBRow_NEON -3.5% +2.7% -3.7% -5.0% -8.2% I444ToARGBRow_NEON -1.8% -15.1% -3.5% -6.5% -8.1% I444ToRGB24Row_NEON -2.0% -6.8% +0.1% -4.7% +1.2% There are a few cases which are slower on Cortex-A76, but significant speedups elsewhere. Bug: libyuv:976 Change-Id: Ib3b4ef81f7bfc1d7ff9c4c24aef9ad86741410ff Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465580 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:46:59 +00:00
George Steed	bfedc8bc11	[AArch64] Improve ARGB{,1}555TOARGB using SRI instructions The existing transformations can be more cleanly expressed by using SRI instructions to perform a shift and simultaneously merge in to an existing value. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 ARGB1555ToYRow_NEON \| -26.2% \| -14.9% \| -28.2% ARGB1555ToUVRow_NEON \| -25.2% \| -18.4% \| -20.9% ARGB1555ToARGBRow_NEON \| -43.6% \| -32.8% \| -19.7% Bug: libyuv:976 Change-Id: Id07ac6f2cd3eb9bb70f9e29fc1f4b29fe26156ec Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5383444 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:46:10 +00:00
George Steed	95b0a3326c	[AArch64] Improve ARGBTOARGB4444 using SRI instructions The existing sequence to convert from 8-bit ARGB to 4-bit ARGB4444 makes use of a lot of shifts and bit-clears before ORR'ing the pairs together. This is unnecessary since we can do the same with the SRI instruction, so use that instead. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 ARGBToARGB4444Row_NEON \| -15.3% \| -16.6% I422ToARGB4444Row_NEON \| -2.7% \| -11.9% Bug: libyuv:976 Change-Id: I86cd86c7adf1105558787a679272179821f31a9d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5383443 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:26:27 +00:00
George Steed	b265c311b7	[AArch64] Avoid unnecessary work in READYUV400 The value of UV components in the vector are known and the vectors are never overwritten, so we can hoist the UV-specific parts of the calculation out of the loop. Reduction in runtimes for I400ToARGBRow_NEON: Cortex-A55: -10.0% Cortex-A510: -3.7% Cortex-A76: -19.3% Cortex-X2: -14.4% Bug: libyuv:976 Change-Id: I17d6de4e1790f71407e12ff84548568cc3ebbe1a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5457434 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-17 16:47:58 +00:00
George Steed	ea56460300	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBMultiplyRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBMultiplyRow_NEON: Cortex-A55: -22.3% Cortex-A510: -56.6% Cortex-A76: -45.5% Cortex-X2: -54.6% Change-Id: I9103111a109a4d87d358e06eb513746314aaf66a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454832 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:28:56 +00:00
George Steed	7266cda79c	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBSubtractRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBSubtractRow_NEON: Cortex-A55: -15.0% Cortex-A510: -59.8% Cortex-A76: -54.4% Cortex-X2: -70.4% Change-Id: Ifbfce9e6a45159932c09d9b0229215a36fa22f43 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454833 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:04:43 +00:00
George Steed	e646991347	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBAddRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBAddRow_NEON: Cortex-A55: -15.0% Cortex-A510: -59.8% Cortex-A76: -54.4% Cortex-X2: -70.4% Change-Id: Id04e5259d8e5e7511dad5df85cdf9759b392cb99 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454831 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:03:44 +00:00
Cosmina Dunca	9d200b704f	[AArch64] Optimize ScaleARGBRowDown2Box_NEON Use a pair of LD2s to load data interleaved and perform a couple of additions on the registers in order to avoid needing LD4 and ST4 instructions, since these are costly on some micro-architectures. Reduction in run times: Cortex-A55: -20.5% Cortex-A510: -28.3% Cortex-A76: -21.5% Bug: libyuv:976 Change-Id: If66e1e148b031c2cd288ff412f351d7a0b9b91e7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5371774 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-10 20:07:22 +00:00
Cosmina Dunca	9441ddd883	[AArch64] Optimize ScaleARGBRowDownEven_NEON Replace indexed LD1 instructions with LDRs to avoid loop-carried dependencies on unused lanes between consecutive iterations of the loop. Reduction in run times: Cortex-A55: -10.9% Cortex-A510: -70.7% Cortex-A76: -56.8% Bug: libyuv:976 Change-Id: Ia767e76002c7823177e80163ebf034e023e9a6cc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5371771 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-04-10 20:03:39 +00:00
George Steed	e52007eff9	[AArch64] Add SVE2 implementation for I444ToARGBRow Being able to use SVE2 functionality for these kernels has a number of performance wins compared to the existing Neon code: * For the Y component calculation we are able to use UMULH, versus the existing UMULL x2 + UZP2 sequence in Neon. * For the RGBTORGBA8 calculation we are able to take advantage of interleaving narrowing instructions, allowing us to use ST2 rather than ST4 for the store. This is a big performance win on some micro-architectures where ST4 is costly. * The use of predication means we do not need to add "any" kernels, we can simply rerun the calculation with a not-full predicate for the final iteration. To avoid the overhead of generating a predicate register on every iteration we duplicate the loop body and only generate a predicate on the final iteration of the loop. This costs a small amount on the final iteration but should still be significantly quicker than the overhead of a function call needed by the "any" cases. Duplicating the loop body to reduce the use of the WHILELT instruction improves little core performance by ~12% by itself but has negligable impact on other micro-architectures. Reduction in runtime for the new SVE2 implementation compared to the existing Neon implementation on selected micro-architectures: Cortex-A510: -36.5% Cortex-A720: -17.3% Cortex-X2: -11.3% Bug: libyuv:973 Change-Id: I2a485f0dfa077a56f96b80a667ad38bbea47b4b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424739 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:11:01 +00:00
George Steed	9a8be20def	[AArch64] Add :libyuv_sve library in preparation for SVE kernels This commit only adds the bare minimum to get the new library building through GN, the actual content of row_sve.cc is empty for now until we start porting some kernels across. Bug: libyuv:973 Change-Id: Ibdf4fc258761f3e507d700f27a405099c667ac75 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424738 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:10:01 +00:00
George Steed	f2e78e1304	[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow Using the dot-product instructions here allows us to avoid needing LD4 for loading individual colour channels, which gives a big benefit on some micro-architectures where such instructions perform significantly worse than LD1. In addition the dot-product instructions have higher throughput compared to the Neon Observed reduction in runtimes for selected kernels moving from _NEON to _NEON_DotProd: Kernel \| Cortex-A55 \| Cortex-A510 \| Cortex-A76 \| Cortex-X2 ABGRToYJRow \| -6.5% \| -22.5% \| -43.5% \| -71.2% ABGRToYRow \| -6.5% \| -22.5% \| -43.5% \| -68.3% ARGBToYJRow \| -6.5% \| -22.5% \| -43.5% \| -68.1% ARGBToYRow \| -6.5% \| -22.5% \| -43.5% \| -68.1% BGRAToYRow \| -6.5% \| -22.5% \| -42.3% \| -68.4% RGBAToYJRow \| -6.5% \| -22.5% \| -42.2% \| -73.7% RGBAToYRow \| -6.5% \| -22.5% \| -42.3% \| -64.9% Bug: libyuv:977 Change-Id: If244190a7bdacf7e6e6b16af7e6853ee13ff6585 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424737 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:09:36 +00:00
George Steed	a038cda7b8	[AArch64] Enable detection of additional architecture features In particular there are a few extensions that are interesting for us: * FEAT_DotProd adds 4-way dot-product instructions which are useful in e.g. ARGBToY. * FEAT_I8MM adds additional mixed-sign dot-product instructions which could be useful in e.g. ARGBToUV. * FEAT_SVE and FEAT_SVE2 add support for the Scalable Vector Extension, which adds an array of new instructions including new widening loads and narrowing stores for dealing with mixed-width integer arithmetic efficiently and predication for avoiding the need for "any" cleanup loops. This commit simply adds support for detecting the presence of these features by extending the existing /proc/cpuinfo parsing, splitting it into separate Arm and AArch64 functions for simplicity. Since we have no space left in the bitset entries between Arm and X86 entries, we reuse some of the X86 entries for new AArch64 extensions. This doesn't seem obviously problematic as long as we avoid setting kCpuHasX86. Bug: libyuv:973 Bug: libyuv:977 Change-Id: I8e256225fe12a4ba5da24460f54061e16eab6c57 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5378150 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-05 17:48:22 +00:00
George Steed	ba796a32e7	[AArch64] Remove out of date TODO around ARGBMultiplyRow_NEON The comment refers to the code needing to be re-enabled but as far as I can tell it is already enabled, so simply remove the comment. Change-Id: Id014e8b7f5cd43c8211e1d38758299de2fad49de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5387650 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-25 22:44:45 +00:00
George Steed	5d694bec38	[AArch64] Replace UQSHRN{,2} pair by UZP2 in YUVTORGB The existing Neon code makes use of a pair of UQSHRN and UQSHRN2 instructions to extract the top half of a widened multiply result. These instructions would ordinarily saturate, however saturation can never happen in this case since we are shifting by 16 to get the top half of each element, the top bits remain as-is. We could move this to using a slightly simpler non-saturating shift, however in this case it is simpler and faster to just use UZP2 to extract the top half of each 32-bit lane directly. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 I400ToARGBRow_NEON \| -9.4% \| -14.9% \| -13.9% I422AlphaToARGBRow_NEON \| -7.9% \| -11.4% \| -11.5% I422ToARGB1555Row_NEON \| -7.3% \| -17.2% \| -14.7% I422ToARGB4444Row_NEON \| -7.6% \| -17.9% \| -13.7% I422ToARGBRow_NEON \| -8.2% \| -9.8% \| -11.9% I422ToRGB24Row_NEON \| -8.0% \| -13.3% \| -12.8% I422ToRGB565Row_NEON \| -7.5% \| -15.1% \| -14.6% I422ToRGBARow_NEON \| -8.3% \| -13.1% \| -12.2% I444AlphaToARGBRow_NEON \| -8.3% \| -7.6% \| -12.7% I444ToARGBRow_NEON \| -8.6% \| -3.5% \| -13.5% I444ToRGB24Row_NEON \| -8.5% \| -7.8% \| -13.4% NV12ToARGBRow_NEON \| -8.8% \| -1.4% \| -12.0% NV12ToRGB24Row_NEON \| -8.5% \| -11.5% \| -12.3% NV12ToRGB565Row_NEON \| -7.9% \| -15.0% \| -15.7% NV21ToARGBRow_NEON \| -8.7% \| -1.6% \| -12.3% NV21ToRGB24Row_NEON \| -8.4% \| -11.5% \| -12.0% UYVYToARGBRow_NEON \| -8.8% \| -8.9% \| -11.9% YUY2ToARGBRow_NEON \| -8.7% \| -10.8% \| -13.3% Bug: libyuv:976 Change-Id: I6c505fe722e5f91f93718b85fe881ad056d8602d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5366653 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-14 20:04:46 +00:00
George Steed	8d0d885c2f	[AArch64] Avoid LD2 in YUY2ToARGBRow_NEON In this case we have an LD2 instruction followed by a pair of permutes (ZIP1 and TBL). On some micro-architectures LD2 involves use of the vector pipelines, so in these cases it is preferable to do an LD1 and then a different pair of permutes (TRN + TBL) instead to avoid the extra vector pipeline usage. Reduction in runtime on selected kernels (no observed performance delta on Cortex-A55): Kernel \| Cortex-A76 \| Cortex-X2 UYVYToARGBRow_NEON \| -2.6% \| -8.8% YUY2ToARGBRow_NEON \| -6.2% \| -4.9% Bug: libyuv:976 Change-Id: I7ca45e0c7bf7cb50cc5ab37c6a01215d9689039a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5366652 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-14 19:51:05 +00:00
George Steed	188e4e3afb	[AArch64] Avoid unnecessary lane-indexed loads in READYUV The existing code makes use of a pair of lane-indexed load instructions to fill the two halves of the input vector, however this has the effect of introducing an unnecessary dependency on the value of the vector from the previous loop iteration. This doesn't really seem to affect little core performance since these cores never execute enough work concurrently to hit the bottleneck, however we can improve performance on mid and big cores quite a bit by using LDR instead of LD1 to load the low lane, zeroing the upper portion of the vector rather than keeping the previous value. Reduction in runtime for select kernels (no observed performance delta on Cortex-A55): Kernel \| Cortex-A76 \| Cortex-X2 I422ToARGB4444Row_NEON \| -23.1% \| -49.3% I422ToARGBRow_NEON \| -1.2% \| -2.5% I422ToRGB24Row_NEON \| -11.7% \| -7.0% I422ToRGBARow_NEON \| -4.7% \| -3.4% I444AlphaToARGBRow_NEON \| -1.1% \| -2.4% I444ToARGBRow_NEON \| -1.6% \| -3.2% I444ToRGB24Row_NEON \| -9.6% \| -6.8% Bug: libyuv:976 Change-Id: I8c9413e0e6ed97b8f060ce42b6e8abdfb77914b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5365868 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-03-13 18:35:31 +00:00
George Steed	772bddaed7	Add missing memory/cc clobbers to AArch64 Neon kernels There are a few functions in source/scale_neon64.cc which write memory and set condition flags despite not declaring this in the asm clobber list, so add the missing clobbers. Also move a couple of memory/cc clobbers to the start of the clobber list to match other kernels. Bug: libyuv:974 Change-Id: I85f5ff5718e78a4481f7bc53cedaeceb14438895 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5309254 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-03-04 10:22:51 +00:00
Frank Barchard	b66c42d4a8	Revert "AMX detect OS support for linux kernel" This reverts commit 8c8a33762d64b916ae8469cc3fc602a64080a23a. Reason for revert: breaks sandbox Original change's description: > AMX detect OS support for linux kernel > > Bug: b/327013106 > Change-Id: Ie1784249f3a121c52e6504ff502bdc3eb245d858 > Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5329907 > Commit-Queue: Frank Barchard <fbarchard@chromium.org> > Reviewed-by: richard winterton <rrwinterton@gmail.com> Bug: b/327013106 Change-Id: If54bb84bc1167177c1869763f6ccfdf1f92fbe09 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5332617 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-02-29 00:33:29 +00:00
Frank Barchard	8c8a33762d	AMX detect OS support for linux kernel Bug: b/327013106 Change-Id: Ie1784249f3a121c52e6504ff502bdc3eb245d858 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5329907 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2024-02-28 03:13:44 +00:00
Frank Barchard	a6a2ec654b	Add AMXINT8 cpu detect sde -spr -- libyuv_test -- --gunit_filter=Cpu Note: Google Test filter = Cpu [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 3 tests from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x57fff9 Has X86 0x8 Has SSE2 0x10 Has SSSE3 0x20 Has SSE41 0x40 Has SSE42 0x80 Has AVX 0x100 Has AVX2 0x200 Has ERMS 0x400 Has FMA3 0x800 Has F16C 0x1000 Has AVX512BW 0x2000 Has AVX512VL 0x4000 Has AVX512VNNI 0x8000 Has AVX512VBMI 0x10000 Has AVX512VBMI2 0x20000 Has AVX512VBITALG 0x40000 Has AVX10 0x0 HAS AVXVNNI 0x100000 Has AVXVNNIINT8 0x0 Has AMXINT8 0x400000 [ OK ] LibYUVBaseTest.TestCpuHas (34 ms) Bug: b/324356616 Change-Id: I5129b8946363a501bdd570e6dba3936c54aacd6c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5283433 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-02-15 21:44:47 +00:00
Hans Wennborg	2f2c04c157	Drop TARGET_IPHONE_SIMULATOR macro check Recent versions of Clang always define these TARGET_ macros (to 0 or 1 as appropriate) for Apple targets. https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5249072 made the code correctly check the value of the macro rather than whether it was defined or not. However, the code was still broken when actually targeting the iOS simulator (where the macro is now 1). It seems the use of this macro was just incorrect, and the code only worked since it was never defined at all. The original use of the macro in this file was added in `2c8108e6c2` but it 's not quite clear to me why. All other uses have subsequently been removed, e.g. in `6a1d01220a` this removes the last instance, and should fix the iOS simulator builds. Bug: chromium:1519899 Change-Id: Iaf44d2c37086f1153096044df5d9b61797f66a4f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5272224 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-02-06 17:38:45 +00:00
Hans Wennborg	d359a9f922	Correctly check the TARGET_IPHONE_SIMULATOR macro The macro may be defined to 0; the code needs to check the value, not just whether it's defined. Recent Clang versions will define all Apple "target OS" macros by default (see bug). Bug: chromium:1519899 Change-Id: I3d61f1b23de06d7db7db7916182a789f26345bce Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5249072 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-01-31 19:33:56 +00:00
Frank Barchard	3e435fe6d4	YUY2ToARGB use ymm6/7 for shuffle constants - 1 load and 2 shuffles from registers replaces 2 loads and 2 memory shuffles - vbroadcastf128 16 byte shuffler replaces 32 byte shufflers - bump version and apply clang-format libyuv_test '--gunit_filter=*.???2ToARGB_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 AMD Zen2 I422ToARGB_Opt (272 ms) NV12ToARGB_Opt (255 ms) YUY2ToARGB_Opt (208 ms) Was YUY2ToARGB_Opt (214 ms) Change-Id: I1fa4d462d04536c877d1cab1a14586be8ed1b2f2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5218447 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2024-01-22 21:47:23 +00:00
Frank Barchard	914624f0b8	YUY2ToARGBMatrix and UYVYToARGBMatrix added to allow any color matrix Bug: libyuv:971 Change-Id: If15d4598d75500a3717f07d02c0c295fdc58254e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5214453 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-01-19 21:21:37 +00:00
Frank Barchard	5625f42424	I444ToI420 and I422ToI420 check U and V pointers and return -1 if NULL. - Add detect linux kernel version number in util/cpuid adbrun -- blaze-bin/third_party/libyuv/cpuid Kernel Version 4.14 Cpu Flags 0x7 Has ARM 0x2 Bug: libyuv:970 Change-Id: I655ed598db3655ca8448be08f1d71fbc328ced66 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5207990 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-01-18 21:56:11 +00:00
Frank Barchard	af6ac8265b	AVX10 cpuid detect added Replace unused popcount feature bit Bug: libyuv:911 Change-Id: Icd88fcc732751d39b0950d5f09a58bc9ac2c4e30 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5179911 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-01-10 00:08:22 +00:00
Hao Chen	ee53a66c5c	Fix compilation errors. Fix the narrowing conversion error from ‘long unsigned int’ to ‘long long int’ that occurs when using the new compiler on the LoongArch platform. Bug: libyuv:913 Change-Id: Ic535946a2453bc48840bab05355854670c52114f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5161066 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-01-03 19:15:56 +00:00
Bruce Lai	1dcbc30553	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default HAS_SCALEARGBROWDOWNEVEN_RVV wasn't defined, so we cannot use ScaleARGBRowDownEven_RVV & ScaleARGBRowDownEvenBox_RVV. - Seperate to two conditional statements when selecting DownEven or DownEvenBox. - Also, add HAS_SCALEARGBROWDOWNEVEN_RVV and disable it by default. Bug: libyuv:965 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Change-Id: Ic7ec40520b64131a456c6f3eea0639b3620f11ae Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4882441 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-12-07 22:54:23 +00:00
Frank Barchard	def473f501	malloc return 1 for failures and assert for internal functions Bug: libyuv:968 Change-Id: Iea2f907061532d2e00347996124bc80d079a7bdc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5010874 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-12-04 22:55:20 +00:00
Wan-Teh Chang	fb6341d326	Change ScalePlane,ScalePlane_16,... to return int Change ScalePlane(), ScalePlane_16(), and ScalePlane_12() to return int so that they can report memory allocation failures (by returning 1). BUG=libyuv:968 Change-Id: Ie5c183ee42e3d595302671f9ecb7b3472dc8fdb5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5005031 Commit-Queue: Wan-Teh Chang <wtc@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-11-03 23:53:24 +00:00
Frank Barchard	31e1d6f896	Check allocations that return NULL and return early BUG=libyuv:968 Change-Id: I9e8594440a6035958511f9c50072820131331fc8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4977552 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-10-27 17:41:36 +00:00
Frank Barchard	331c361581	AVX-VNNI detect - Add kCpuHasAVXVNNI flag - Remove deprecated GFNI detect to make space. Meteor Lake has AVX-VNNI but not AVX512 ~/intelsde/sde -mtl -- blaze-bin/third_party/libyuv/libyuv_test --gunit_filter=CpuHas doyuv3 Note: Google Test filter = CpuHas [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x203ff1 Has X86 0x10 Has SSE2 0x20 Has SSSE3 0x40 Has SSE41 0x80 Has SSE42 0x100 Has AVX 0x200 Has AVX2 0x400 Has ERMS 0x800 Has FMA3 0x1000 Has F16C 0x2000 Has AVX512BW 0x0 Has AVX512VL 0x0 Has AVX512VNNI 0x0 Has AVX512VBMI 0x0 Has AVX512VBMI2 0x0 Has AVX512VBITALG 0x0 Has AVX512VPOPCNTDQ 0x0 HAS AVXVNNI 0x200000 Has AVXVNNIINT8 0x0 AVX-VNNI detect - Add kCpuHasAVXVNNI flag - Remove deprecated GFNI detect to make space. https://bugs.chromium.org/p/libyuv/issues/detail?id=967 Meteor Lake has AVX-VNNI but not AVX512 ~/intelsde/sde -mtl -- blaze-bin/third_party/libyuv/libyuv_test --gunit_filter=CpuHas doyuv3 Note: Google Test filter = CpuHas [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x203ff1 Has X86 0x10 Has SSE2 0x20 Has SSSE3 0x40 Has SSE41 0x80 Has SSE42 0x100 Has AVX 0x200 Has AVX2 0x400 Has ERMS 0x800 Has FMA3 0x1000 Has F16C 0x2000 Has AVX512BW 0x0 Has AVX512VL 0x0 Has AVX512VNNI 0x0 Has AVX512VBMI 0x0 Has AVX512VBMI2 0x0 Has AVX512VBITALG 0x0 Has AVX512VPOPCNTDQ 0x0 HAS AVXVNNI 0x200000 Has AVXVNNIINT8 0x0 Running on all cpus the following report avx-vnni grep 'AVXVNNI 0x2' / adl/libyuv64.txt:HAS AVXVNNI 0x200000 gnr/libyuv64.txt:HAS AVXVNNI 0x200000 grr/libyuv64.txt:HAS AVXVNNI 0x200000 mtl/libyuv64.txt:HAS AVXVNNI 0x200000 rpl/libyuv64.txt:HAS AVXVNNI 0x200000 spr/libyuv64.txt:HAS AVXVNNI 0x200000 srf/libyuv64.txt:HAS AVXVNNI 0x200000 while these support avx512 vnni grep 'VNNI 0x1' / clx/libyuv64.txt:Has AVX512VNNI 0x10000 cpx/libyuv64.txt:Has AVX512VNNI 0x10000 gnr/libyuv64.txt:Has AVX512VNNI 0x10000 icl/libyuv64.txt:Has AVX512VNNI 0x10000 icx/libyuv64.txt:Has AVX512VNNI 0x10000 spr/libyuv64.txt:Has AVX512VNNI 0x10000 tgl/libyuv64.txt:Has AVX512VNNI 0x10000 and these support avx-vnni-int8 grep AVXVNNIINT8.0x4 / grr/libyuv64.txt:Has AVXVNNIINT8 0x400000 srf/libyuv64.txt:Has AVXVNNIINT8 0x400000 Bug: libyuv:967 Change-Id: I84cd71d1b320e7c284173eb695fc1d3b72d14ddb Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4912017 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2023-10-05 21:24:09 +00:00
Frank Barchard	709d60e6ee	VNNI-INT8 detect - Add kCpuHasAVXVNNIINT8 flag - Move mips flags up a bit to make space. ~/intelsde/sde -srf -- blaze-bin/third_party/libyuv/libyuv_test --gunit_filter=CpuHas Note: Google Test filter = CpuHas [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x403ff1 Has X86 0x10 Has SSE2 0x20 Has SSSE3 0x40 Has SSE41 0x80 Has SSE42 0x100 Has AVX 0x200 Has AVX2 0x400 Has ERMS 0x800 Has FMA3 0x1000 Has F16C 0x2000 Has AVX512BW 0x0 Has AVX512VL 0x0 Has AVX512VNNI 0x0 Has AVX512VBMI 0x0 Has AVX512VBMI2 0x0 Has AVX512VBITALG 0x0 Has AVX512VPOPCNTDQ 0x0 Has AVXVNNIINT8 0x400000 Has GFNI 0x0 [ OK ] LibYUVBaseTest.TestCpuHas (32 ms) INT8 supported on srf and grr -srf Set chip-check and CPUID for Intel(R) Sierra Forest CPU -grr Set chip-check and CPUID for Intel(R) Grand Ridge CPU Bug: b/303434603 Change-Id: I628007929ff0518b2b36e1469b4d9aed71a9fa8f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4912015 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-10-04 16:31:36 +00:00
Yannis Guyon	a3b9c36eb9	Fix unused arg errors in ScalePlane*() in Release src_width parameter is used for assertions and unused with NDEBUG. Fix the warning treated as an error when -Wall -Wextra -Werror is used to build that part of the code. BUG=libyuv:967 Change-Id: I4c02ab013e8e2684b3bed5ce9693e1493d7751b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4905033 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-10-03 15:19:25 +00:00
Bruce Lai	ec2e9ca000	[RVV] Support AR64ToAB64 and RGBA-family color conversions Add scalar code for AR64ToAB64, ARGBToRGBA, ARGBToBGRA, ARGBToABGR, RGBAToARGB, BGRAToARGB, and ABGRToARGB. They are originally implemented by ARGBShffle. This CL independetly implements them, and only enables for risc-v now. This CL also add RVV implementation for `RGBA-family <-> RGBA-family` color conversions. * Run on SiFive internal FPGA(VLEN=128): Test Case Speedup AR64ToAB64_Opt x4.6 ARGBToRGBA_Opt x6 ARGBToBGRA_Opt x6 ARGBToABGR_Opt x6 RGBAToARGB_Opt x6 Change-Id: Ie0630901046084aa259699fcdeccc64170d7103f Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4797451 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-09-05 22:44:48 +00:00
Frank Barchard	696e619571	RVV check __riscv_v_intrinsic version Bug: libyuv:965 Change-Id: I9b02abd13ab3345288655fa7a16383f59cf66bb8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4750230 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-08-04 18:39:27 +00:00
Wan-Teh Chang	a8a37a25c9	Eliminate a common subexpression in YPixel() Save the value of a common subexpression in a local variable. Change-Id: I5724fcf341900cb2a65eb37b505194b8d3c3da9a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4735651 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-07-31 20:53:54 +00:00
Bruce Lai	c60ac4025c	[RVV] Enable ScaleRowDown38_RVV & ScaleRowDown38_{2,3}_Box_RVV * Run on SiFive internal FPGA: Test Case Speedup I420ScaleDownBy3by8_None 4.2 I420ScaleDownBy3by8_Linear 1.7 I420ScaleDownBy3by8_Bilinear 1.7 I420ScaleDownBy3by8_Box 1.7 I444ScaleDownBy3by8_None 4.2 I444ScaleDownBy3by8_Linear 1.8 I444ScaleDownBy3by8_Bilinear 1.8 I444ScaleDownBy3by8_Box 1.8 Change-Id: Ic2e98de2494d9e7b25f5db115a7f21c618eaefed Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4711857 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-07-27 02:59:47 +00:00
Darren Hsieh	10de943a12	[RVV] Enable ScaleRowUp2_(Bi)linear_RVV/ScaleUVRowUp2_(Bi)linear_RVV ScaleUVRowUp2_(Bi)linear_RVV function is equal to other platforms' ScaleRowUp2_(Bi)linear_Any_XXX. We process entire row in this function. Other platforms only implement non-edge part of image and process edge with scalar. ScaleRowUp2_(Bi)linear_Any_XXX: Combine ScaleRowUp2_(Bi)linear_XXX(non-edge) + ScaleRowUp2_(Bi)linear_C(edge) by SBUH2LANY/SU2BLANY. * Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleFrom640x360_Bilinear ScaleRowUp2_Bilinear_RVV 8.21 I444ScaleFrom640x360_Linear ScaleRowUp2_Linear_RVV 8.08 UVScaleFrom640x360_Bilinear ScaleUVRowUp2_Bilinear_RVV 7.80 UVScaleFrom640x360_Linear ScaleUVRowUp2_Linear_RVV 7.03 Change-Id: I539245ce51858f077506a78f0e7e82377ac6a95d Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4666062 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-07-26 18:05:50 +00:00
Bruce Lai	d33edd2373	[RVV] Enable ARGBBlendRow_RVV/BlendPlaneRow_RVV * Run on SiFive internal FPGA: Test case Speedup ARGBBlend_Opt 4.60 BlendPlane_Opt 5.96 I420Blend_Opt 5.83 - Also, add code to use ScaleRowDown2Box_RVV in I420Blend Change-Id: Icc75e05d26b3427a98269d2a33c4474074033264 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4681100 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-07-25 16:38:55 +00:00
Darren Hsieh	aed6dbef17	[RVV] Enable NV{12,21}To{ARGB,RGB24}Row_RVV * Run on SiFive internal FPGA(w/ -march=rv64gcv): Test Case Speedup NV12ToARGB_Opt 12.0 NV21ToARGB_Opt 12.1 NV12ToABGR_Opt 12.6 NV21ToABGR_Opt 12.0 NV12ToRGB24_Opt 12.5 NV21ToRGB24_Opt 11.7 NV12ToRAW_Opt 12.1 NV21ToRAW_Opt 11.4 Change-Id: Icae2bac2b4ebbd4c5a89e847fde9a74fe6481878 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4707804 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-07-24 17:07:01 +00:00
Frank Barchard	650be7496f	Fix warnings for missing prototypes - Add static to internal scale and rotate functions - Remove unittest that tested an internal scale function - Remove unused private functions - Include missing scale_argb.h header - Bump version and apply clang format Bug: libyuv:830 Change-Id: I45bab0423b86334f9707f935aedd0c6efc442dd4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4658956 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-06-30 17:46:56 +00:00
Frank Barchard	a34a0ba687	ARGBExtractAlpha rename variables to match format Bug: libyuv:956 Change-Id: I31070791754fc69b72c6dcc61be2e038d2676ed9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4646636 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2023-06-27 03:50:35 +00:00
Bruce Lai	873d0db989	[RVV] Fix TestARGBInterpolate test fail Root cause: Because InterpolateRow_RVV doesn't setup rounding mode to round-to-nearest-up when y1_fraction == 128. The rounding mode register is set to round-down in ARGBAttenuateRow_RVV. It cause InterpolateRow_RVV(y1_fraction == 128) runs on round-down mode. Running on round-down mode make output result differs from round-to-nearest-up mode. Solved by: ensure to use correct rounding mode in InterpolateRow_RVV. Also, removing unnecessary rounding mode setup in ARGBAttenuateRow_RVV. Bug: libyuv:956 Change-Id: Ib5265d42bad76b036e42b8f91ee42a9afe1f768d Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4624492 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-19 16:49:52 +00:00
Bruce Lai	4472b5b849	[RVV] Update ARGBAttenuateRow_RVV implementation Bug: libyuv:956 Change-Id: Ib539c2196767e88fa6e419ed2f22d95b6deaf406 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4623172 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-17 15:50:34 +00:00
Bruce Lai	7939e039e7	[RVV] Fix compile warning in row_rvv 1. Fix compile warning in row_rvv.cc 2. Avoid compile row_rvv.cc/scale_rvv.cc when using GCC There is no RVV segment load & store on GCC. Hence, avoid compiling rvv code on GCC temporarily. 3. Add several compile options to cmake build flow -Wno-sign-compare -Wno-unused-function -Wunused-variable -Wuninitialized Bug: libyuv:956 Change-Id: I9577f98190fc9b28fb6fde65d82d0c67ce54f9ee Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4615441 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-17 15:41:45 +00:00
Frank Barchard	a366ad714a	ARGBAttenuate use (a + b + 255) >> 8 - Makes ARM and Intel match and fixes some off by 1 cases - Add ARGBToUV444MatrixRow_NEON - Add ConvertFP16ToFP32Column_NEON - scale_rvv fix intinsic build error - disable row_win version of ARGBAttenuate/Unattenuate Bug: libyuv:936, libyuv:956 Change-Id: Ied99aaad3a11a8eb69212b628c58f86ec0723c38 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4617013 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-16 21:37:53 +00:00
Bruce Lai	04821d1e7d	[RVV] Enable ARGBExtractAlphaRow/ARGBCopyYToAlphaRow * Run on SiFive internal FPGA: TestARGBExtractAlpha(~3.2x vs scalar) TestARGBCopyYToAlpha(~1.6x vs scalar) Change-Id: I36525c67e8ac3f71ea9d1a58c7dc15a4009d9da1 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4617955 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-15 23:45:24 +00:00
Darren Hsieh	552571e8b2	[RVV] Enable ScaleRowDown34_RVV & ScaleRowDown34_{0,1}_Box_RVV Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleDownBy3by4_None ScaleRowDown34_RVV 5.8 I444ScaleDownBy3by4_Linear ScaleRowDown34_0/1_Box_RVV 6.5 I444ScaleDownBy3by4_Bilinear ScaleRowDown34_0/1_Box_RVV 6.3 Bug: libyuv:956 Change-Id: I8ef221ab14d631e14f1ba1aaa25d2b30d4e710db Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4607777 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-14 00:57:00 +00:00
Frank Barchard	2a5d7e2fbc	FilterRows_NEON - remove unused function - same as InterpolateRow_NEON - Bump version to 1872 - Add scale_rvv to build files Bug: libyuv:956 Change-Id: Ib9e9fd840a0774bd35bcdcca55a2596f33272383 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4608519 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-13 15:20:02 +00:00
Darren Hsieh	873eaa3bbf	[RVV] Enable Scale{ARGB,UV}RowDown{2,4,EVEN}_RVV Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleDownBy3_Box ScaleAddRow_RVV+ScaleAddCols(scalar) 2.8 ARGBScaleDownBy2_None ScaleARGBRowDown2_RVV 2.2 ARGBScaleDownBy2_Linear ScaleARGBRowDown2Linear_RVV 5.0 ARGBScaleDownBy2_Box ScaleARGBRowDown2Box_RVV 4.3 ARGBScaleDownBy4_None ScaleARGBRowDownEven_RVV 1.2 ARGBScaleDownBy8_Box ScaleARGBRowDownEvenBox_RVV 3.2 ARGBScaleDownBy4_Box ScaleARGBRowDown2Box_RVV 4.5 I444ScaleDownBy2_None ScaleRowDown2_RVV 5.8 I444ScaleDownBy2_Linear ScaleRowDown2Linear_RVV 6.1 I444ScaleDownBy2_Box ScaleRowDown2Box_RVV 5.0 I444ScaleDownBy4_None ScaleRowDown4_RVV 3.6 I444ScaleDownBy4_Box ScaleRowDown4Box_RVV 3.5 UVScaleDownBy2_None ScaleUVRowDown2_RVV 5.8 UVScaleDownBy2_Linear ScaleUVRowDown2Linear_RVV 5.6 UVScaleDownBy2_Box ScaleUVRowDown2Box_RVV 4.1 UVScaleDownBy4_None ScaleUVRowDown4_RVV 1.7 UVScaleDownBy4_Box ScaleUVRowDown2Box_RVV 4.5 avg-speedup: 4 Note: Specialize ScaleUVRowDown with step_size=4 by ScaleUVRowDown4_RVV. Bug: libyuv:956 Change-Id: If9604a6aadf681193f282507602c57c726332202 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4601684 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-13 00:40:39 +00:00
Frank Barchard	b08ccb6a83	FP16 to FP32 float conversion row function Bug: None Change-Id: I97aab6aafd41c3bf36bfbf33fdcc424e5b3fd6e3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4590225 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2023-06-07 00:02:40 +00:00
Frank Barchard	157b153b60	Fix tidy warning that uint32_t dither4 should not be const - Remove const from uint32_t dither4 parameter to fix clang-tidy warning - Apply clang format - Bump version - Remove unused MMI source; superceded by MSA Bug: None Change-Id: Id49991db25bca4e99590b415312542d917471c62 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4581882 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-02 00:42:02 +00:00
Vignesh Venkatasubramanian	c0f64c14ca	Add I412/I212 to I420 functions They re-use the same method as I410/I210 to I420 with a depth value of 12 instead of 10. Bug: b/268505204 Change-Id: I299862b4556461d8c95f0fc1dcd5260e1c1f25cd Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4581867 Commit-Queue: Vignesh Venkatasubramanian <vigneshv@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-01 19:50:16 +00:00
Bruce Lai	4b6373d189	[RVV] Use LMUL=2 for I4{44,22}To{ARGB,RGB24,RGBA} conversion Replace vv+m1(LMUL=1) with vx+m2(LMUL=2). Some kernels' asm code might contain register spill(1~2). Change-Id: Ie3655f250d17f37c1ba9039474ece43ede98ede0 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4573159 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-30 09:42:10 +00:00
Darren Hsieh	d14bd701c8	[RVV] Enable CopyRow_RVV, InterpolateRow_RVV, {Merge,Split}UVRow_RVV * Run on SiFive internal FPGA: MergeUVPlane_Opt(~6x vs scalar) SplitUVPlane_Opt(~6x vs scalar) TestCopyPlane(~8x vs scalar) ARGBInterpolate0_Opt(~10x vs scalar) ARGBInterpolate64_Opt(~9x vs scalar) ARGBInterpolate168_Opt(~9x vs scalar) ARGBInterpolate192_Opt(~8.5x vs scalar) ARGBInterpolate255_Opt(~8x vs scalar) Bug: libyuv:956 Change-Id: I8372341865f75f42e30371ef943d5c2e4be7b79a Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4574186 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-30 09:10:35 +00:00
Frank Barchard	78d168054b	Remove extraneous quote from clobber list Bug: None Change-Id: Ie20574d0f9c8c2f074247405b294b49c3406448d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4568770 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-05-30 09:03:05 +00:00
Justin Green	0e111d2c58	Wrap neon registers in {} for the neon MT2T unpack implementation. Some compilers throw a syntax error otherwise. Change-Id: Ic169dcfe4d9bb9bf6d0dcae977d6cf510a7a60bf Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4568904 Commit-Queue: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-26 17:12:02 +00:00
Frank Barchard	22c7a51452	Fix SplitRGB clobber list to include all registers used Bug: None Change-Id: Icac4becb0537903ab87495fb0e2a2b750e1eca4f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4563355 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: David Gao <davidgao@google.com>	2023-05-24 21:44:59 +00:00
Wan-Teh Chang	dcbe082070	Save boxwidth - minboxwidth in a local variable Avoid repetitions of the expression boxwidth - minboxwidth. Change-Id: Ib53fb6b06a926b80ff9a64cc5d499aeef0894c99 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408062 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-22 19:10:13 +00:00
Bruce Lai	de3e7fd147	Manually remove rounding value inside yb(yuvconstant) in row_rvv.cc After libyuv:961 is completed, yb(yuvconstant) will no longer contain rounding bias +32 for fixed-point. This CL removes rounding bias(-32) manmually in row_rvv.cc. Hence, all fixed-point related codes' rounding mode is changed to round-to-nearest-up "0" in row_rvv.cc. Also, replace vwmul+vnsrl w/ vmulh in I400ToARGBRow_RVV. Bug: libyuv:956, libyuv:961 Change-Id: I10e34668a2332e38393e9d68414f07aafb6c7cf7 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4550591 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-22 18:15:27 +00:00
Wan-Teh Chang	179b0203e5	Enable {J400/I400}ToARGBRow_RVV Run on SiFive internal FPGA*: I400ToARGB_Opt (~8x vs scalar) J400ToARGB_Opt (~10x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Bug: libyuv:956, libyuv:961 Change-Id: If4e21ec85c4ff79083ec16a6faae0e457129a8de Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4544972 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-05-20 23:29:33 +00:00
Lu Wang	8670bcf17f	Optimize the following 19 functions with LSX in row_lsx.cc. UYVYToYRow_LSX, UYVYToUVRow_LSX, UYVYToUV422Row_LSX, ARGBToUVRow_LSX, ARGBToRGB24Row_LSX, ARGBToRAWRow_LSX, ARGBToRGB565Row_LSX, ARGBToARGB1555Row_LSX, ARGBToARGB4444Row_LSX, ARGBToUV444Row_LSX, ARGBMultiplyRow_LSX, ARGBAddRow_LSX, ARGBSubtractRow_LSX, ARGBAttenuateRow_LSX, ARGBToRGB565DitherRow_LSX, ARGBShuffleRow_LSX, ARGBShadeRow_LSX, ARGBGrayRow_LSX, ARGBSepiaRow_LSX Bug: libyuv:913 Change-Id: I02c0c9d68b229c4a66c96837e9b928c2f5dda1f3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4546814 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-19 18:55:58 +00:00
Frank Barchard	a37799344d	ARGBToI420Alpha function to convert ARGB to I420 with Alpha Bug: b/281866362 Change-Id: Ic1093a887fb483f134c78909cf1ee7495e7345ba Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4534100 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2023-05-17 00:23:24 +00:00
Bruce Lai	11d4536002	Enable I{422,444}AlphaToARGBRow_RVV & ARGBAttentuateRow_RVV Run on SiFive internal FPGA: I444AlphaToARGB_Opt (~16x vs scalar) I422AlphaToARGB_Opt (~10x vs scalar) ARGBAttenuate_Opt (~3x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: I0046eb7af8104bc8e13cee1cb91a19f90940d5b0 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4535657 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-16 19:20:49 +00:00
Frank Barchard	6a68b18a96	Bump version and apply clang format Bug: libyuv:956 Change-Id: I2375a02583789af2a5f13f8dba6c663d5975aaa9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4522352 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-11 11:27:28 +00:00
Bruce Lai	59eae49f17	Enable ARGBToYMatrixRow_RVV/RGBAToYMatrixRow_RVV/RGBToYMatrixRow_RVV Run on SiFive internal FPGA: ARGBToJ400_Opt (~6x vs scalar) RGBAToJ400_Opt (~6x vs scalar) RGB24ToJ400_Opt (~5.5x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: Ia3ce8cea7962fbd8618cc23e850a7913c9cabf4f Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4521783 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-11 10:17:51 +00:00
Darren Hsieh	497ea35688	Enable I444To{ARGB,RGB24}Row_RVV Run on SiFive internal FPGA: I444ToARGB_Opt (~16x vs scalar) I444ToRGB24_Opt (~10x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: Idae7dc46ef648beaa14b58ba3eb56b67b17c9b3b Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4520761 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-10 19:50:56 +00:00
Darren Hsieh	964d963afb	Enable I422To{ARGB,RGBA,RGB24}Row_RVV Run on SiFive internal FPGA: I422ToARGB_Opt (~10x vs scalar) I422ToRGBA_Opt (~10x vs scalar) I420ToRGB24_Opt (~8x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 This CL manually sets rounding mode, since we use fixed-point vector narrowing clip. There is no definition about default value for fixed-point rounding mode. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#38-vector-fixed-point-rounding-mode-register-vxrm The behavior could be different on differet paltforms. To avoid unexpected behavior, we set rounding mode manually. Change-Id: I90f0dcb90c37f7da7caab8eb1df6c9c7a3c874a8 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4512373 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-10 00:29:20 +00:00
Lu Wang	1d940cc570	Optimize the following functions with LSX. MirrorRow_LSX, MirrorUVRow_LSX, ARGBMirrorRow_LSX, I422ToYUY2Row_LSX, I422ToUYVYRow_LSX, I422ToARGBRow_LSX, I422ToRGBARow_LSX, I422AlphaToARGBRow_LSX, I422ToRGB24Row_LSX, I422ToRGB565Row_LSX, I422ToARGB4444Row_LSX, I422ToARGB1555Row_LSX, YUY2ToYRow_LSX, YUY2ToUVRow_LSX, YUY2ToUV422Row_LSX Bug: libyuv:913 Change-Id: I46cec605001d7ddd73846eed6d0a77f936b6dc53 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4515191 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-10 00:25:48 +00:00
James Zern	b372510c56	row_win.cc: fix ARM64EC build include intrin.h rather than emmintrin.h; fixes: C:\...\VC\Tools\MSVC\14.35.32215\include\emmintrin.h(28,1): fatal error C1189: #error: this header should only be included through Change-Id: Ief9c81f6f1971e552c8aac301d678b64fe5bd7cc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4513825 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-09 19:56:35 +00:00
shaodiwei	4c209d264d	MergeUVRow_AVX2 implementation is consistent in row_win.cc and row_gcc.cc，the commit can fix memory is wrote out of bounds Change-Id: I4b771a46fc853effc4c0fa3ae8032322a8369dc9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4514810 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-09 18:54:25 +00:00
Bruce Lai	f4bd840794	Fix compile error for riscv scalar & simplify cmake cross build flow 1. Fix compile error when build riscv without using vector 2. Fix run_qemu.sh misused v=true for USE_RVV=OFF case 3. [cmake] Fix warning by rename TEST to UNIT_TEST Warning log: CMake Warning (dev) at CMakeLists.txt:57 (if): [54/1931] Policy CMP0064 is not set: Support new TEST if() operator. Run "cmake --help-policy CMP0064" for policy details. Use the cmake_policy command to set the policy and suppress this warning. TEST will be interpreted as an operator when the policy is set to NEW. Since the policy is not set the OLD behavior will be used. This warning is for project developers. Use -Wno-dev to suppress it. 4. [cmake] Simplify logic for cross-build Bug: libyuv:956 Change-Id: I120402fc7d6d86403e7d974180b81f4f9c663e36 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4486239 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-04 18:09:00 +00:00
Bruce Lai	8811ad8ba1	Fix TestLinuxRVV test fail Fail log: [ RUN ] LibYUVBaseTest.TestLinuxRVV Note: testing to load "../../unit_test/testdata/riscv64.txt" /scratch/brucel/libyuv/src/unit_test/cpu_test.cc:290: Failure Expected equality of these values: kCpuHasRVV \| kCpuHasRVVZVFH Which is: 1610612736 RiscvCpuCaps("../../unit_test/testdata/riscv64_rvv_zvfh.txt") Which is: 536870912 [ FAILED ] LibYUVBaseTest.TestLinuxRVV (17 ms) Reason: The root cause is "\n" may be contained in the ext variable. The last of extension substring contains "\n". For instance, test case riscv64_rvv_zvfh.txt, the last substring is "zvfh\n" instead of "zvfh". Solved this failure by removing "\n" which is at the end of line. NOTE: We avoid using strstr() to solve the problem here. Becasue using strstr() will violate the parsing rule, if future extension contains "zvfh"(e.g zvfhxxx). Log after modification: [ RUN ] LibYUVBaseTest.TestLinuxRVV Note: testing to load "../../unit_test/testdata/riscv64.txt" [ OK ] LibYUVBaseTest.TestLinuxRVV (38 ms) Change-Id: I7b7db98dbc5388cbc148423da6892b8f0be64599 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4498101 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-04 03:26:25 +00:00
Darren Hsieh	1b3c4c12d4	Add Split/Merge RGB/ARGB/XRGB Row_RVV * Run on SiFive internal FPGA: SplitRGBPlane_Opt (~6.87x vs scalar) SplitARGBPlane_Opt (~10.77x vs scalar) SplitXRGBPlane_Opt (~18.69x vs scalar) MergeRGBPlane_Opt (~3.63x vs scalar) MergeARGBPlane_Opt (~3.50x vs scalar) MergeXRGBPlane_Opt (~2.90x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 - include a fix to avoid implict conversion warning between size_t & int. Bug: libyuv:956 Change-Id: Icd79b282b04ea3981e7fd4e6d547da6708d82516 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4443411 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-04-28 18:34:46 +00:00
Frank Barchard	7c6a7e5737	cpuid for arm/mips/riscv initialize buffer - change cpu printf to hex to better show flags util/cpuid: Cpu Flags 0x30000001 Has RISCV 0x10000000 Has RVV 0x20000000 [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x30000001 Has RISCV 0x10000000 Has RVV 0x20000000 Has RVVZVFH 0x0 [ OK ] LibYUVBaseTest.TestCpuHas (1 ms) [ RUN ] LibYUVBaseTest.TestCompilerMacros __ATOMIC_RELAXED 0 __cplusplus 201703 __clang_major__ 9999 __clang_minor__ 0 __GNUC__ 4 __GNUC_MINOR__ 2 __riscv 1 __riscv_vector 1 __clang__ 1 __llvm__ 1 __pic__ 2 INT_TYPES_DEFINED __has_feature Bug: libyuv:956 Change-Id: Iee4f1f34799434390e756de1e6c2c4596d82ace5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4484957 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-27 22:46:27 +00:00
Frank Barchard	cf21b5ea5c	Rename variables to match layout of ABGR Bug: None Change-Id: Ia1d596b6e108307fe042a03c34162b25152293d4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4461967 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-26 16:57:33 +00:00
Bruce Lai	1330a79e9f	Optimized AR64/AB64 <-> ARGB with RVV * Run on SiFive internal FPGA: ARGBToAR64_Opt (~13.7x vs scalar) ARGBToAB64_Opt (~5.81x vs scalar) AR64ToARGB_Opt (~15.8x vs scalar) AB64ToARGB_Opt (~2.40x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Bug: libyuv:956 Change-Id: Ida642a5077f59d25fb7c5328f671956b2293dadd Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4442913 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-20 19:49:55 +00:00
Frank Barchard	c994782086	Enable RVV if qemu is detected - include a fix for jpeg unittests to do at least 1 iteration - include a fix for scale uv to only use linearup2 if filter is linear Tested on qemu with Intel host: [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 805306369 Has RISCV 268435456 Has RVV 536870912 Has RVVZVFH 0 Has X86 0 Bug: libyuv:956, libyuv:959, libyuv:960 Change-Id: I4a1b66f83d82ba127780f52526153d586db90111 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4429570 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Randall Bosetti <rlb@google.com>	2023-04-18 20:29:04 +00:00
Darren Hsieh	44396e6e9a	Add ARGBToRAWRow_RVV, ARGBToRGB24Row_RVV, RGB24ToARGBRow_RVV * Run on SiFive internal FPGA: ARGBToRAW_Opt (~1.55x vs scalar) ARGBToRGB24_Opt (~1.44x vs scalar) RGB24ToARGB_Opt (~1.77x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Bug: libyuv:956 Change-Id: I26722f6848cd68684d95d9a7ee06ce0416e7985d Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4413083 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-13 19:33:16 +00:00
Frank Barchard	68659d0d68	UVScale down by 2 fix for C and optimize for NEON - update cpu_id to use "re" for fopen to avoid leaking handles if a thread is started while the file is open. Bug: libyuv:958 Change-Id: I1af9de68fce12e440e1226fc8070634ccb1bf090 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4417176 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-12 22:49:20 +00:00
Frank Barchard	ee3e71c7ce	Any functions use memset(vin, 0, sizeof(vin)) for GCC warning fix - Fix -Wmemset-elt-size warning for GCC - Use vin for inputs and vout for outputs Bug: None Change-Id: Iefd418dc884b4d062e1fdd9215319c8838c49eaa Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4412065 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2023-04-10 20:44:10 +00:00
Darren Hsieh	724e7aee03	Fix macro define typo in scale_uv.cc The correct define can be found in scale_row.h Change-Id: I633ed47006c7bd8014038493005c2d934489ff18 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4411353 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-10 16:55:48 +00:00
James Zern	0200037a5a	row_any,ANYDETILE: fix -Wmemset-elt-size warning under gcc 12.2.0 using -Wall: source/row_any.cc: In function ‘void libyuv::DetileRow_16_Any_SSE2(const uint16_t, ptrdiff_t, uint16_t, int)’: source/row_any.cc:2287:11: warning: ‘memset’ used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size] 2287 \| memset(temp, 0, 16 * BPP); /* for msan */ \| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ source/row_any.cc:2308:1: note: in expansion of macro ‘ANYDETILE’ 2308 \| ANYDETILE(DetileRow_16_Any_SSE2, DetileRow_16_SSE2, uint16_t, 2, 15) This increases the memset to the full buffer size, which may not be strictly necessary. Change-Id: Iea2fc649990ee84ea9aa8020d6f6b25e012b18fb Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4406599 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-04-08 19:01:02 +00:00
Darren Hsieh	e8af6cb2e4	Add RAWToARGBRow_RVV,RAWToRGBARow_RVV,RAWToRGB24Row_RVV * Run on SiFive internal FPGA: RAWToARGB_Opt (~2x vs scalar) RAWToRGBA_Opt (~2x vs scalar) RAWToRGB24_Opt (~1.5x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: I21a13d646589ea2aa3822cb9225f5191068c285b Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408357 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-07 18:45:08 +00:00
Darren Hsieh	aa47d668d8	Add riscv cpu info detection. * Supports: * The standard single-letter Vector detection. * Vector fp16 detection. Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Change-Id: Ia7ee1bd8ec1a990f1b2b1700805942e99c0aa87b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4401738 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-04-06 15:58:29 +00:00
Wan-Teh Chang	ec48e4328e	Add assertions for the Clang static analyzer The Clang static analyzer (scan-build) in LLVM 14 warns about array index out of bounds in scaletbl[boxwidth - minboxwidth] in ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two elements. It's not clear the index boxwidth - minboxwidth is either 0 or 1. Change-Id: I072476e86950154beffe6b1a89915755118b3cbd Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-04-05 21:58:22 +00:00
Frank Barchard	464c51a035	AArch32 YUVTORGB_SETUP use load and dup to avoid modifying pointer - Allows code to be optimized with clang 17 -flto-thin - Bump version number to 1864 to allow detection of fix - Apply clang format to standardize formatting; No impact on code generated Bug: chromium:1424089 Change-Id: Ib745836b27915a5e4cb1d7d928ee52659360612b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370052 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2023-03-24 19:32:30 +00:00
Frank Barchard	1a971f8cc3	clang 17 -flto-thin bug fix for Neon YUVtoRGB and ARGBToRGB565Dither - YUV to RGB AArch32 kRGBCoeffBias rewind pointer - ARGBToRGB565Dither declare width and source pointers as modified Bug: chromium:1424089 Change-Id: I987180652331bab16ce27d8d166399a687ee890e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370099 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-03-24 10:59:40 +00:00
Frank Barchard	3f219a3501	GCC warning fix for MT2T - Fix redundent assignment compile warning in GCC - Apply clang-format - Bump version to 1863 Bug: libyuv:955 Change-Id: If2b6588cd5a7f068a1745fe7763e90caa7277101 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4344729 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-03-16 06:57:20 +00:00
Justin Green	76468711d5	M2T2 Unpack fixes Fix the algorithm for unpacking the lower 2 bits of M2T2 pixels. Bug: b:258474032 Change-Id: Iea1d63f26e3f127a70ead26bc04ea3d939e793e3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4337978 Commit-Queue: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-03-14 14:59:26 +00:00
Frank Barchard	f9b23b9cc0	Transpose 4x4 for SSE2 and AVX2 Skylake Xeon AVX2 Transpose4x4_Opt (290 ms) SSE2 Transpose4x4_Opt (302 ms) C Transpose4x4_Opt (522 ms) AMD Zen2 AVX2 Transpose4x4_Opt (136 ms) SSE2 Transpose4x4_Opt (137 ms) C Transpose4x4_Opt (431 ms) Bug: None Change-Id: I4997dbd5c5387c22bfd6c5960b421504e4bc8a2a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4292946 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-03-03 17:46:23 +00:00
Frank Barchard	88b050f337	MergeUV AVX512BW use assembly - Convert MergeUVRow_AVX512BW to assembly - Enable MergeUVRow_AVX512BW for Windows with clangcl - MergeUVRow_AVX2 use vpmovzxbw and vpsllw - MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V AMD Zen 4 640x360 100000 iterations Was AVX512 MergeUVPlane_Opt (884 ms) AVX2 MergeUVPlane_Opt (945 ms) AVX2 MergeUVPlane_16_Opt (2167 ms) Now AVX512 MergeUVPlane_Opt (865 ms) AVX2 MergeUVPlane_Opt (943 ms) SSE2 MergeUVPlane_Opt (973 ms) AVX2 MergeUVPlane_16_Opt (2102 ms) Bug: None Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230 Reviewed-by: Fritz Koenig <frkoenig@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2023-02-22 21:19:08 +00:00
Frank Barchard	2bdc210be9	MergeUV_AVX512BW for I420ToNV12 On Skylake Xeon 640x360 100000 iterations AVX512 MergeUVPlane_Opt (1196 ms) AVX2 MergeUVPlane_Opt (1565 ms) SSE2 MergeUVPlane_Opt (1780 ms) Pixel 7 MergeUVPlane_Opt (1177 ms) Bug: None Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2023-02-13 20:14:57 +00:00
Sergio Garcia Murillo	b2528b0be9	Add support for odd width and height in I410ToI420 Bug: libyuv:950 Change-Id: Ic9a094463af875aefd927023f730b5f35f8551de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4154630 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-01-23 19:05:00 +00:00
Hao Chen	0809713775	Refine some functions on the Longarch platform. Add ARGBToYMatrixRow_LSX/LASX, RGBAToYMatrixRow_LSX/LASX and RGBToYMatrixRow_LSX/LASX functions with RgbConstants argument. Bug: libyuv:912 Change-Id: I956e639d1f0da4a47a55b79c9d41dcd29e29bdc5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4167860 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-18 18:54:14 +00:00
Frank Barchard	0faf8dd0e0	Fix for DivideRow_NEON functions - was dup of 8h but mul of 4s. now use umull Bug: libyuv:951 Change-Id: If6cb01f5f006c2235886b81ce120642d7e24a9bb Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166563 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-18 00:30:05 +00:00
Frank Barchard	541d8efbaf	Fix for divide row functions used by P010ToI010 Bug: libyuv:951 Change-Id: Id323656cb6f99b1be0be7aaa854d3cc15feeba69 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166562 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-17 21:40:45 +00:00
Frank Barchard	d5aa3d4a76	P010ToI010 and P012ToI012 conversion functions - Convert 10 and 12 bit biplanar formats to planar. - Shift 10 MSB to 10 LSB - P010 is similar to NV12 in layout, but uses 10 MSB of 16 bit values. - I010 is similar to I420 in layout, but uses 10 LSB of 16 bit values. Bug: libyuv:951 Change-Id: I16a1bc64239d0fa4f41810910da448bf5720935f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166560 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-13 19:20:12 +00:00
Frank Barchard	6e4b0acb4b	I422Rotate take stride for temporary buffers - Minor variable name changes first/last to top/bottom - Comments explaining rotate temporary buffers usage - Add asserts for scale parameter - Use NULL and stddef.h instead of 0 - Use void * for allocation in row.h - Add () around size parameter in macros Bug: libyuv:926, libyuv:949 Change-Id: Ib55417570926ccada0a0f8abd1753dc12e5b162e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4136762 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-04 23:11:52 +00:00
Sergio Garcia Murillo	f8626a7224	Add 10 bit rotate methods. This initial implementation is based on current unoptimized code in webrtc using just plain for loops. Bug: libyuv:949 Change-Id: Ic87ee49c3a0b62edbaaa4255c263c1f7be4ea02b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110782 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-04 21:10:01 +00:00
Sergio Garcia Murillo	22a579c438	Use ScalePlaneDown2_16To8 for avoiding the 2 step process Bug: libyuv:950 Change-Id: I5a77bca9a0230fe00abd810939e217833a14683f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4134524 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-03 21:41:21 +00:00
Sergio Garcia Murillo	f583b1b4b8	Add I410Copy and I410ToI420 methods The I410To420 implementation does a two step approach for scaling down and 10-to-8 bit conversion using the Y plane as temporal storage. Bug: libyuv:950 Change-Id: I3d35fad4b99e17253230456233fbd947e013c0ec Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110783 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-01-03 20:27:28 +00:00
Frank Barchard	3abd6f36b6	Casting for scale functions - MT2T support for source strides added, but only works for positive values. - Reduced casting in row_common - one cast per assignment. - scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t Bug: libyuv:948, b/257266635, b/262468594 Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Ernest Hua <ernesthua@google.com>	2022-12-15 22:34:22 +00:00
Frank Barchard	610e0cdead	MT2T Warning fixes for fuchsia Bug: b/258474032, b/257266635 Change-Id: Ic5cbbc60e2e1463361e359a2fe3e97976c1ea929 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4081348 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2022-12-06 19:54:40 +00:00
Frank Barchard	ea26d7adb1	DetilePlane_16 AVX version - fix ifdefs for DetilePlane_16 to use 16 bit versions, not 8 bit. (no functional change) Bug: b/258474032 Change-Id: Ic07e02d9801e21126ebee0ceb5779aa712a493ce Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4034812 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-11-18 23:59:06 +00:00
Frank Barchard	8713ba3f0b	Add vzeroupper to AVX row functions - move power of two macro to planar functions source - revert row.h IS_ALIGNED change Bug: b/258474032 Change-Id: If87bb8d55c9b9930dd3e378614f8e4faae0870e9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4035166 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-11-17 23:00:08 +00:00
Frank Barchard	2d2cee418a	Add Detile_16 planar function for 10 bit MT2T format - Neon and SSE2 - Any for odd widths Pixel 2 little core AArch32 build C TestDetilePlane_16 (1275 ms) TestDetilePlane (1203 ms) Neon TestDetilePlane_16 (693 ms) TestDetilePlane (660 ms) Bug: b/258474032 Change-Id: Idbd09c5e9324e4deef5f1d54090d4b63cc7db812 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4031848 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-11-17 02:47:57 +00:00
Frank Barchard	fe9ced6e3c	ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows - Preserve xmm7 in ScaleRowUp2_Bilinear_12_SSSE3 - Previously xmm7 was used in ScaleRowUp2_Bilinear_12_SSSE3 without being preserved, which violates the Windows x64 calling conventions and can cause undefined behavior. Bug: libyuv:945, 1218384 Change-Id: If18b292b588573355f9b4ba8c5b9c3fbe143d36b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3972137 Reviewed-by: Bruce Dawson <brucedawson@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-10-21 19:35:17 +00:00
Frank Barchard	3da24c3ca3	Detile vld for gcc build fix - add {} around loaded register Bug: libyuv:944 Change-Id: I0d916e37beb50bda0838e4867742eb7afa57e1cc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3957634 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-10-14 19:12:53 +00:00
Frank Barchard	cb35d5f90e	BGRAToI420 use SSSE3 for Y but C for UV when LIBYUV_BIT_EXACT enabled - Previously was C for both Y and UV. Was BGRAToI420_Opt (17780 ms) Now BGRAToI420_Opt (9546 ms) Bug: b/253491233 Change-Id: Id103d8d5ba0fed0f7a427dd5955e1830275eff6b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3953131 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-10-14 03:09:56 +00:00
Jeremy Maitin-Shepard	c365da9c6c	Use `find_package(JPEG)` in place of `include(FindJPEG)` The former allows the package to be overridden with a local build by `FetchContent`. Also includes the fix to _MSC_VER conditions from: https://aomedia.googlesource.com/aom/+/refs/heads/main/third_party/libyuv/README.libaom Change-Id: I666e591becb3efaa7b5b68d27476319a2909b88e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3933570 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-10-03 22:19:28 +00:00
Frank Barchard	00950840d1	YUY2ToNV12 using YUY2ToY and YUY2ToNVUV - Optimized YUY2ToNV12 that reduces it from 3 steps to 2 steps - Was SplitUV, memcpy Y, InterpolateUV - Now YUY2ToY, YUY2ToNVUV - rollback LIBYUV_UNLIMITED_DATA 3840x2160 1000 iterations: Pixel 2 Cortex A73 Was YUY2ToNV12_Opt (6515 ms) Now YUY2ToNV12_Opt (3350 ms) AB7 Mediatek P35 Cortex A53 Was YUY2ToNV12_Opt (6435 ms) Now YUY2ToNV12_Opt (3301 ms) Skylake AVX2 x64 Was YUY2ToNV12_Opt (1872 ms) Now YUY2ToNV12_Opt (1657 ms) SSE2 x64 Was YUY2ToNV12_Opt (2008 ms) Now YUY2ToNV12_Opt (1691 ms) Windows Skylake AVX2 32 bit x86 Was YUY2ToNV12_Opt (2161 ms) Now YUY2ToNV12_Opt (1628 ms) Bug: libyuv:943 Change-Id: I6c2ba2ae765413426baf770b837de114f808f6d0 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3929843 Reviewed-by: Wan-Teh Chang <wtc@google.com> Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-09-30 22:41:21 +00:00
Frank Barchard	b9adaef113	Enable unlimited data for YUV to RGB - Provide LIBYUV_LIMITED_DATA macro for backwards compatiblity Bug: b/474156256 Change-Id: I5d5d7fb640d51ae3c5ad363f2a28c8bfbd3048a5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3912081 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-09-23 12:51:37 +00:00
Frank Barchard	f9fda6e7d8	Fix shift amount for SSSE3 assembly for I012 format conversions Bug: libyuv:938, libyuv:942 Change-Id: I6fb6e7e17fa941785e398bc630f465baf72fcabd Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906091 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-09-20 23:07:53 +00:00
Frank Barchard	8fc02134c8	10/12 bit YUV replicate upper bits to low bits before converting to RGB - shift high bits of 10 and 12 bit into lower bits Bug: libyuv:941, libyuv:942, Change-Id: I14381dbf226ef27dcce06893ea88860835639baa Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906085 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-09-20 20:56:43 +00:00
Frank Barchard	e4b1ddd8fe	Fix immediate offsets for row_neon build on gcc Bug: libyuv:942 Change-Id: I7d2dc87a44cc1cc5c79c37f407583e0c907dc2de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906088 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-09-20 20:16:13 +00:00
Frank Barchard	248172e2ba	I422ToRGB24, I422ToRAW, I422ToRGB24MatrixFilter conversion functions added. - YUV to RGB use linear for first and last row. - add assert(yuvconstants) - rename pointers to match row functions. - use macros that match row functions. - use 12 bit upsampler for conversions of 10 and 12 bits Cortex A53 AArch32 I420ToRGB24_Opt (3627 ms) I422ToRGB24_Opt (4099 ms) I444ToRGB24_Opt (4186 ms) I420ToRGB24Filter_Opt (5451 ms) I422ToRGB24Filter_Opt (5430 ms) AVX2 Was I420ToRGB24Filter_Opt (583 ms) Now I420ToRGB24Filter_Opt (560 ms) Neon Cortex A7 Was I420ToRGB24Filter_Opt (5447 ms) Now I420ToRGB24Filter_Opt (5439 ms) Bug: libyuv:938 Change-Id: I1731f2dd591073ae11a756f06574103ba0f803c7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906082 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-09-20 02:00:52 +00:00
Frank Barchard	f71c83552d	I420ToRGB24MatrixFilter function added - Implemented as 3 steps: Upsample UV to 4:4:4, I444ToARGB, ARGBToRGB24 - Fix some build warnings for missing prototypes. Pixel 4 I420ToRGB24_Opt (743 ms) I420ToRGB24Filter_Opt (1331 ms) Windows with skylake xeon: x86 32 bit I420ToRGB24_Opt (387 ms) I420ToRGB24Filter_Opt (571 ms) x64 64 bit I420ToRGB24_Opt (384 ms) I420ToRGB24Filter_Opt (582 ms) Bug: libyuv:938, libyuv:830 Change-Id: Ie27f70816ec084437014f8a1c630ae011ee2348c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3900298 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-09-16 19:46:47 +00:00
Frank Barchard	3e38ce5058	SSE2 MM21->YUY2 conversion Add SSE2 optimization for MM21ToYUY2 conversion. Bug: b/238137982 Change-Id: I189f712514308322f651b082b496bce9c015c4ee Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832525 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2022-08-17 18:39:05 +00:00
Frank Barchard	65e7c9d570	MM21ToYUY2 and ABGRToJ420 conversion MM21 to YUY2 use zip1 for performance Cortex A510 Was MM21ToYUY2 (612 ms) Now MM21ToYUY2 (573 ms) Prefetches help Cortex A53 Was MM21ToYUY2 (4998 ms) Now MM21ToYUY2 (1900 ms) Pixel 4 Cortex A76 Was MM21ToYUY2 (215 ms) Now MM21ToYUY2 (173 ms) ABGRToJ420 - NEON, SSSE3 and AVX2 row functions - J400, J420 and J422 formats. - Added AVX2 for UV on ARGBToJ420. Was SSSE3 Same code/performance as ARGBToJ420 but with constants re-ordered. Pixel 4 ABGRToJ420_Opt (623 ms) ABGRToJ422_Opt (702 ms) ABGRToJ400_Opt (238 ms) Skylake Xeon With LIBYUV_BIT_EXACT which uses C for UV ABGRToJ420_Opt (988 ms) ABGRToJ422_Opt (1872 ms) ABGRToJ400_Opt (186 ms) Skylake Xeon using AVX2 ABGRToJ420_Opt (251 ms) ABGRToJ422_Opt (245 ms) ABGRToJ400_Opt (184 ms) Skylake Xeon using SSSE3 ABGRToJ420_Opt (328 ms) ABGRToJ422_Opt (362 ms) ABGRToJ400_Opt (185 ms) Bug: b/238137982 Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Wan-Teh Chang <wtc@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-08-16 22:07:38 +00:00
Frank Barchard	1c5a8bb17a	AB64ToARGB fix for inplace conversion - add tests for all single plane formats that reduce or stay same in size Bug: b/242233673 Change-Id: Ic25d808114f11995ac56ea9c31b99f66ba36d345 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3828485 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-08-12 01:28:13 +00:00
Vignesh Venkatasubramanian	a5a1102a60	Add I422ToRGB565Matrix The code already exists to use a specific matrix. This CL simply adds a function to use a generic YUV matrix for the conversion. Bug: b/241451603 Change-Id: I0eea7e96a891d045905a9c963b56c053097029ec Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3820903 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-08-09 20:15:44 +00:00
Frank Barchard	d53f1beecd	RAWToJ400 require multiple of 16 pixels for NEON - fix crash when width is not a multiple of 16 - apply clang format - bump version Bug: libyuv:940, b/240094327 Change-Id: Ic18e5b7b64f78f26e8b7d8440bf490a679bda200 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3812594 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-08-04 22:55:48 +00:00
Vignesh Venkatasubramanian	394436b289	row_neon*: Explicitly initialize pad in RgbConstants Explicitly initialize the 'pad' field of RgbConstants to 0. This prevents the following warning/error in some compilers: error: missing field 'pad' initializer [-Werror,-Wmissing-field-initializers] Bug: b/241008246 Change-Id: Id6a0beb75c5c709404290c75915049f8a3898c83 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3808044 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-08-04 18:19:46 +00:00
Wan-Teh Chang	9892d70c96	Fix MSVC warnings by adding casts Fix the following MSVC warnings: src\source\row_win.cc(117): warning C4309: 'argument': truncation of constant value src\source\row_win.cc(136): warning C4309: 'argument': truncation of constant value src\source\row_win.cc(155): warning C4309: 'argument': truncation of constant value src\source\row_win.cc(174): warning C4309: 'argument': truncation of constant value src\source\row_common.cc(1712): warning C4244: 'initializing': conversion from 'uint16_t' to 'int8_t', possible loss of data src\source\row_common.cc(1731): warning C4244: 'initializing': conversion from 'int16_t' to 'int8_t', possible loss of data src\source\row_common.cc(1786): warning C4244: 'initializing': conversion from 'uint16_t' to 'int8_t', possible loss of data src\source\row_common.cc(1805): warning C4244: 'initializing': conversion from 'uint16_t' to 'int8_t', possible loss of data Bug: libyuv:939 Change-Id: Ie87ba6e716732d1ff1ae5c236dfd9cfdac13439d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3807105 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-08-03 21:24:21 +00:00
Yuan Tong	98ec7c28d5	Fix SSE2 version of ScalePlaneUp2_16_Bilinear - Define HAS_SCALEROWUP2_BILINEAR_16_SSE2: it's now fixed. - Correct function name to ScaleRowUp2_Bilinear_16_Any_SSE2: this row function uses only SSE2 instructions. Bug: libyuv:882 Change-Id: Ib1c7ac5b09997cb5b32bc54109d8c566af762433 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3800842 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-08-02 20:35:48 +00:00
Frank Barchard	b028453ba6	Disable bilinear 16 bit scale up for SSE2 - Undefine HAS_SCALEROWUP2_BILINEAR_16_SSE2 - Save XMM7 in ScaleRowUp2_Bilinear_16_SSE2(). - Rename HAS_SCALEROWUP2LINEAR_xxx to HAS_SCALEROWUP2_LINEAR_xxx - DetileSplitUVRow_C() is implemented using SplitUVRow_C(). - Changes to unit_test/planar_test.cc. Bug: libyuv:882 Change-Id: I0a8e8e5fb43bdf58ded87244e802343eacb789f2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3795063 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-08-01 22:54:48 +00:00
Frank Barchard	6900494d90	Merge/SplitRGB fix -mcmodel=large x86 and InterpolateRow_16To8_NEON MergeRGB and SplitRGB use a register to point to 9 shuffle tables. - fixes an out of registers error with -mcmodel=large InterpolateRow_16To8_NEON improves performance for I210ToI420: On Pixel 4 for 720p x1000 images Was I210ToI420_Opt (608 ms) Now I210ToI420_Opt (336 ms) On Skylake Xeon Was I210ToI420_Opt (259 ms) Now I210ToI420_Opt (209 ms) Bug: libyuv:931, libyuv:930 Change-Id: I20f8244803f06da511299bf1a2ffc7945eb35221 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717054 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2022-06-29 00:00:46 +00:00
Frank Barchard	fe4a50df8e	Bilinear scale up msan fix - Avoid stepping to height + 1 for bilinear filter 2nd row for last row of source - Box filter ubsan fix for 3/4 and 3/8 scaling for 16 bit planar - Height 1 asan fixes Bug: libyuv:935, b/206716399 Change-Id: I56088520f2a884a37b987ee5265def175047673e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717263 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-06-22 00:11:49 +00:00
Frank Barchard	e906ba9fe9	InterpolateRow_Any test if fraction is 0 and dont memcpy 2nd row. Bug: b/228605787 Change-Id: Ia8912e4c1599401320ee82882a2593e78bf56582 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3708833 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-06-17 18:15:09 +00:00
Frank Barchard	30f9b28048	Add I210ToI420 Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 Change-Id: Ib135d0b4ff17665f6a4ab60edb782a7b314219a4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3696042 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2022-06-09 08:07:50 +00:00
Frank Barchard	baef414478	Convert16To8Row_NEON use shift without rounding Fixes chromium PaintCanvasVideoRendererTest.HighBitDepth sqdmulh was creating a 9 bit value with rounding, and then shifted it right 1 with no rounding. The rounding had an off by 1 impact in some tests. Pixel 3 C I010ToI420_Opt (749 ms) Was sqdmulh I010ToI420_Opt (370 ms) Now ushl I010ToI420_Opt (324 ms) Pixel 4 C I010ToI420_Opt (581 ms) Was sqdmulh I010ToI420_Opt (240 ms) Now ushl I010ToI420_Opt (231 ms) Bug: b/216321733, b/233233302 Change-Id: I26f673bb411401d1e4a8126bf22d61c649223e9b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3694143 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-06-08 19:40:30 +00:00
Frank Barchard	d011314f14	Revert "I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix" This reverts commit 60254a1d846a93a4d7559009004cdd91bcc04d82. Reason for revert: breaks PaintCanvasVideoRendererTest.HighBitDepth Original change's description: > I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix > > - Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit > - Add NEON InterpolateRow_16 for fast 10 bit scaling > - When scaling up, set step to interpolate toward height - 1 to avoid buffer overread > - When scaling down, center the 2 rows used for source to achieve filtering. > - CopyPlane check for 0 size and return > > Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 > Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1 > Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552 > Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> > Reviewed-by: richard winterton <rrwinterton@gmail.com> Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 Change-Id: Icc05bb340db0e7fe864061fb501d0a861c764116 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3692886 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2022-06-07 09:16:05 +00:00
Frank Barchard	60254a1d84	I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix - Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit - Add NEON InterpolateRow_16 for fast 10 bit scaling - When scaling up, set step to interpolate toward height - 1 to avoid buffer overread - When scaling down, center the 2 rows used for source to achieve filtering. - CopyPlane check for 0 size and return Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2022-06-07 01:41:56 +00:00
Joe Downing	c0c8c40b31	Update CopyPlane to handle 0 width and height dimensions If a width, height, and src/dst strides passed in are all 0, height is updated to 1 which means some CPU optimized functions may try to copy data when the dst rect is not valid. Bug: b:234340482 Change-Id: I63be1c6ba05d669d67f5079d812acbec09c8f6c9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3689909 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-06-07 01:20:14 +00:00
Frank Barchard	eb2c88e499	Convert16To8 NEON Pixel 3 Was C I010ToI420_Opt (749 ms) Now NEON I010ToI420_Opt (356 ms) Pixel 4 Was C I010ToI420_Opt (581 ms) Now NEON I010ToI420_Opt (163 ms) Bug: b/233233302, b/233634772 Change-Id: I60a84648a66f77d97c0a7822b29bd18b8e3a3355 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3661401 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-05-24 18:07:16 +00:00
Frank Barchard	715150b5aa	Add UYVYToY function This function reads 2 byte values and writes the 2nd byte to the destination. It turns out this is useful for P010ToNV12 as well, so adding the planar function allows a high level to call this. And adds UYVY support for something YUY2 already had. Which is writing the 1st byte. Bug: b/233233302, b/233634772 Change-Id: I10a9454cb4f5b2c4ac5532fa86feddf78284d8b8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3659055 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-05-24 01:42:31 +00:00
Frank Barchard	d62ee21e66	UVScale fix for vertical-only scaling Bug: b/228841445 Change-Id: I0342856e1bfcea69851d718459d66926bb170219 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3595240 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Miguel Casas-Sanchez <mcasas@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-04-20 01:27:33 +00:00
Frank Barchard	3c0f408607	Enable HAS_DETILESPLITUVROW_NEON On Pixel 4 Was C AArch64 TestDetileSplitUVPlane_Benchmark (935 ms) AArch32 TestDetileSplitUVPlane_Benchmark (787 ms) Now NEON AArch64 TestDetileSplitUVPlane_Benchmark (248 ms) AArch32 TestDetileSplitUVPlane_Benchmark (256 ms) Bug: libyuv:915, b/228518489 Change-Id: Ib82b702c1321285738c044ad8c2a7805b16f074a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3594524 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-04-19 21:17:03 +00:00
Frank Barchard	eec8dd37e8	Change ScaleUVRowUp2_Biinear_16_SSE2 to SSE41 Bug: libyuv:928 xed -i scale_gcc.o: SYM ScaleUVRowUp2_Linear_16_SSE2: XDIS 0: LOGICAL SSE2 660FEFED pxor xmm5, xmm5 XDIS 4: SSE SSE2 660F76E4 pcmpeqd xmm4, xmm4 XDIS 8: SSE SSE2 660F72D41F psrld xmm4, 0x1f XDIS d: SSE SSE2 660F72F401 pslld xmm4, 0x1 XDIS 12: DATAXFER SSE2 F30F7E07 movq xmm0, qword ptr [rdi] XDIS 16: DATAXFER SSE2 F30F7E4F04 movq xmm1, qword ptr [rdi+0x4] XDIS 1b: SSE SSE2 660F61C5 punpcklwd xmm0, xmm5 XDIS 1f: SSE SSE2 660F61CD punpcklwd xmm1, xmm5 XDIS 23: DATAXFER SSE2 660F6FD0 movdqa xmm2, xmm0 XDIS 27: DATAXFER SSE2 660F6FD9 movdqa xmm3, xmm1 XDIS 2b: SSE SSE2 660F70D24E pshufd xmm2, xmm2, 0x4e XDIS 30: SSE SSE2 660F70DB4E pshufd xmm3, xmm3, 0x4e XDIS 35: SSE SSE2 660FFED4 paddd xmm2, xmm4 XDIS 39: SSE SSE2 660FFEDC paddd xmm3, xmm4 XDIS 3d: SSE SSE2 660FFED0 paddd xmm2, xmm0 XDIS 41: SSE SSE2 660FFED9 paddd xmm3, xmm1 XDIS 45: SSE SSE2 660FFEC0 paddd xmm0, xmm0 XDIS 49: SSE SSE2 660FFEC9 paddd xmm1, xmm1 XDIS 4d: SSE SSE2 660FFEC2 paddd xmm0, xmm2 XDIS 51: SSE SSE2 660FFECB paddd xmm1, xmm3 XDIS 55: SSE SSE2 660F72D002 psrld xmm0, 0x2 XDIS 5a: SSE SSE2 660F72D102 psrld xmm1, 0x2 XDIS 5f: SSE SSE4 660F382BC1 packusdw xmm0, xmm1 XDIS 64: DATAXFER SSE2 F30F7F06 movdqu xmmword ptr [rsi], xmm0 XDIS 68: MISC BASE 488D7F08 lea rdi, ptr [rdi+0x8] XDIS 6c: MISC BASE 488D7610 lea rsi, ptr [rsi+0x10] XDIS 70: BINARY BASE 83EA04 sub edx, 0x4 XDIS 73: COND_BR BASE 7F9D jnle 0x12 <ScaleUVRowUp2_Linear_16_SSE2+0x12> XDIS 75: RET BASE C3 ret SYM ScaleUVRowUp2_Bilinear_16_SSE2: XDIS 0: LOGICAL SSE2 660FEFFF pxor xmm7, xmm7 XDIS 4: SSE SSE2 660F76F6 pcmpeqd xmm6, xmm6 XDIS 8: SSE SSE2 660F72D61F psrld xmm6, 0x1f XDIS d: SSE SSE2 660F72F603 pslld xmm6, 0x3 XDIS 12: DATAXFER SSE2 F30F7E07 movq xmm0, qword ptr [rdi] XDIS 16: DATAXFER SSE2 F30F7E4F04 movq xmm1, qword ptr [rdi+0x4] XDIS 1b: SSE SSE2 660F61C7 punpcklwd xmm0, xmm7 XDIS 1f: SSE SSE2 660F61CF punpcklwd xmm1, xmm7 XDIS 23: DATAXFER SSE2 660F6FD0 movdqa xmm2, xmm0 XDIS 27: DATAXFER SSE2 660F6FD9 movdqa xmm3, xmm1 XDIS 2b: SSE SSE2 660F70D24E pshufd xmm2, xmm2, 0x4e XDIS 30: SSE SSE2 660F70DB4E pshufd xmm3, xmm3, 0x4e XDIS 35: SSE SSE2 660FFED0 paddd xmm2, xmm0 XDIS 39: SSE SSE2 660FFED9 paddd xmm3, xmm1 XDIS 3d: SSE SSE2 660FFEC0 paddd xmm0, xmm0 XDIS 41: SSE SSE2 660FFEC9 paddd xmm1, xmm1 XDIS 45: SSE SSE2 660FFEC2 paddd xmm0, xmm2 XDIS 49: SSE SSE2 660FFECB paddd xmm1, xmm3 XDIS 4d: DATAXFER SSE2 F30F7E1477 movq xmm2, qword ptr [rdi+rsi2] XDIS 52: DATAXFER SSE2 F30F7E5C7704 movq xmm3, qword ptr [rdi+rsi2+0x4] XDIS 58: SSE SSE2 660F61D7 punpcklwd xmm2, xmm7 XDIS 5c: SSE SSE2 660F61DF punpcklwd xmm3, xmm7 XDIS 60: DATAXFER SSE2 660F6FE2 movdqa xmm4, xmm2 XDIS 64: DATAXFER SSE2 660F6FEB movdqa xmm5, xmm3 XDIS 68: SSE SSE2 660F70E44E pshufd xmm4, xmm4, 0x4e XDIS 6d: SSE SSE2 660F70ED4E pshufd xmm5, xmm5, 0x4e XDIS 72: SSE SSE2 660FFEE2 paddd xmm4, xmm2 XDIS 76: SSE SSE2 660FFEEB paddd xmm5, xmm3 XDIS 7a: SSE SSE2 660FFED2 paddd xmm2, xmm2 XDIS 7e: SSE SSE2 660FFEDB paddd xmm3, xmm3 XDIS 82: SSE SSE2 660FFED4 paddd xmm2, xmm4 XDIS 86: SSE SSE2 660FFEDD paddd xmm3, xmm5 XDIS 8a: DATAXFER SSE2 660F6FE0 movdqa xmm4, xmm0 XDIS 8e: DATAXFER SSE2 660F6FEA movdqa xmm5, xmm2 XDIS 92: SSE SSE2 660FFEE0 paddd xmm4, xmm0 XDIS 96: SSE SSE2 660FFEEE paddd xmm5, xmm6 XDIS 9a: SSE SSE2 660FFEE0 paddd xmm4, xmm0 XDIS 9e: SSE SSE2 660FFEE5 paddd xmm4, xmm5 XDIS a2: SSE SSE2 660F72D404 psrld xmm4, 0x4 XDIS a7: DATAXFER SSE2 660F6FEA movdqa xmm5, xmm2 XDIS ab: SSE SSE2 660FFEEA paddd xmm5, xmm2 XDIS af: SSE SSE2 660FFEC6 paddd xmm0, xmm6 XDIS b3: SSE SSE2 660FFEEA paddd xmm5, xmm2 XDIS b7: SSE SSE2 660FFEE8 paddd xmm5, xmm0 XDIS bb: SSE SSE2 660F72D504 psrld xmm5, 0x4 XDIS c0: DATAXFER SSE2 660F6FC1 movdqa xmm0, xmm1 XDIS c4: DATAXFER SSE2 660F6FD3 movdqa xmm2, xmm3 XDIS c8: SSE SSE2 660FFEC1 paddd xmm0, xmm1 XDIS cc: SSE SSE2 660FFED6 paddd xmm2, xmm6 XDIS d0: SSE SSE2 660FFEC1 paddd xmm0, xmm1 XDIS d4: SSE SSE2 660FFEC2 paddd xmm0, xmm2 XDIS d8: SSE SSE2 660F72D004 psrld xmm0, 0x4 XDIS dd: DATAXFER SSE2 660F6FD3 movdqa xmm2, xmm3 XDIS e1: SSE SSE2 660FFED3 paddd xmm2, xmm3 XDIS e5: SSE SSE2 660FFECE paddd xmm1, xmm6 XDIS e9: SSE SSE2 660FFED3 paddd xmm2, xmm3 XDIS ed: SSE SSE2 660FFED1 paddd xmm2, xmm1 XDIS f1: SSE SSE2 660F72D204 psrld xmm2, 0x4 XDIS f6: SSE SSE4 660F382BE0 packusdw xmm4, xmm0 XDIS fb: DATAXFER SSE2 F30F7F22 movdqu xmmword ptr [rdx], xmm4 XDIS ff: SSE SSE4 660F382BEA packusdw xmm5, xmm2 XDIS 104: DATAXFER SSE2 F30F7F2C4A movdqu xmmword ptr [rdx+rcx*2], xmm5 XDIS 109: MISC BASE 488D7F08 lea rdi, ptr [rdi+0x8] XDIS 10d: MISC BASE 488D5210 lea rdx, ptr [rdx+0x10] XDIS 111: BINARY BASE 4183E804 sub r8d, 0x4 XDIS 115: COND_BR BASE 0F8FF7FEFFFF jnle 0x12 <ScaleUVRowUp2_Bilinear_16_SSE2+0x12> XDIS 11b: RET BASE C3 ret Change-Id: Ia20860e9c3c45368822cfd8877167ff0bf973dcc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3587602 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-04-15 18:46:09 +00:00
Wan-Teh Chang	18f9110516	Avoid AVX instructions in ScaleRowUp2_Linear_SSSE3 The "vpackuswb %%xmm2,%%xmm0,%%xmm0" and "vmovdqu %%xmm0,(%1)" instructions in ScaleRowUp2_Linear_SSSE3() are AVX instructions. They cause an illegal instruction exception on CPUs that do not support AVX. Bug: libyuv:927 Bug: chromium:1312551 Change-Id: I87b2aaf041e7d185e7e8fb07172d4f37482e9d08 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3585881 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2022-04-15 00:18:39 +00:00
Frank Barchard	2ad73733d9	I422Rotate update to remove name space for ios build warning - Remove libyuv:: from within libyuv to resolve a build warning on IOS. - Check src_y parameter is not NULL if there is a dst_y parameter - Apply clang-format - Bump version Performance on Intel Skylake Xeon ARGBRotate90_Opt (795 ms) I420Rotate90_Opt (283 ms) I422Rotate90_Opt (867 ms) <-- scales and rotates I444Rotate90_Opt (565 ms) NV12Rotate90_Opt (289 ms) Performance on Pixel 4 (Cortex A76) ARGBRotate90_Opt (4208 ms) I420Rotate90_Opt (273 ms) I422Rotate90_Opt (1207 ms) I444Rotate90_Opt (718 ms) NV12Rotate90_Opt (282 ms) Bug: libyuv:926 Change-Id: I42e1b93a9595f6ed075918e91bed977dd3d23f6f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3576778 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-04-07 21:06:44 +00:00
Sergio Garcia Murillo	a77d615e10	Add tentative I422Rotate. When doing 90 or 270 degrees rotation we need to do a rotate&scale of the UV planes, as there are no helper optimized functions to do this, we use the Y plane as temporal memory and perform each of the transforms independently: First U plane is rotated, putting the result in the Y plane. After the rotation, the output has double the samples horizontally and half the samples vertically, so it is scaled into the final U plane. Same process is done with the V plane. Last the Y plane that can be just rotated without scaling. It would be great to have an optimized version for this, but maybe this is helpfull for triggering the discussions. Bug: libyuv:926 Change-Id: I188af103c4d0e3f9522021b4bf2b63c9d5de8b93 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3568424 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-04-06 23:49:35 +00:00
Sergio Garcia Murillo	4589081cea	Add I422 and I210 functions Bug: webrtc:13826 Change-Id: I68235a668abecf76133f7b89472b192b1442bed4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3557217 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-31 15:30:53 +00:00
Wan-Teh Chang	f4d2530846	Declare RgbConstants structs as static const These RgbConstants structs were added in https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023. They should be declared as static const. These four symbols were detected by Chrome's android-binary-size trybot as "Mutable Constants". Change-Id: I3b4d4ff4b32e261ba528c07647b9d69ac368ab5b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3553035 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2022-03-25 20:39:38 +00:00
Wan-Teh Chang	ebd9e130f0	Fix bugs in I010AlphaToARGBMatrixBilinear() Add a missing increment of src_a and ARGBAttenuateRow() call. Bug: libyuv:922 Change-Id: I26e04e70c6a1a231cbe54b60c249f4c2e8af112a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3549976 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2022-03-25 15:39:52 +00:00
Wan-Teh Chang	173ed374c0	Add null pointer checks for the src_a parameters Change-Id: Icc96e18eab07080c18b6542171a340c97f059c78 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3550016 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>	2022-03-25 15:14:38 +00:00
Frank Barchard	124bf08fee	RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24 1920x1080 to/from 1280x720 to ARGB on Intel Skylake Xeon RGBScaleTo1920x1080_Bilinear (2625 ms) RGBScaleFrom1920x1080_Bilinear (2115 ms) ARGBScaleTo1920x1080_Bilinear (1668 ms) ARGBScaleFrom1920x1080_Bilinear (1164 ms) Bug: b/224814071 Change-Id: Ifc7611b597409771728b13c9c39e5a7e06131021 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3537341 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-19 01:44:06 +00:00
Frank Barchard	95b14b2446	RAWToJ400 faster version for ARM - Unrolled to 16 pixels - Take constants via structure, allowing different colorspace and channel order - Use ADDHN to add 16.5 and take upper 8 bits of 16 bit values, narrowing to 8 bits - clang-format applied, affecting mips code On Cortex A510 Was RAWToJ400_Opt (1623 ms) Now RAWToJ400_Opt (862 ms) C RAWToJ400_Opt (1627 ms) Bug: b/220171611 Change-Id: I06a9baf9650ebe2802fb6ff6dfbd524e2c06ada0 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-18 07:22:36 +00:00
Yuan Tong	25d0a5110b	Fix FilterMode enum type When used in C enum keyword can't be eliminated. Bug: libyuv:872 Change-Id: Iacff5a8bd84ec7caa1f90889e48f81ffc10071ae Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3513317 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-09 21:25:44 +00:00
Yuan Tong	ebb27d6916	Add YUV to RGB conversion function with filter parameter Add the following functions: I420ToARGBMatrixFilter I422ToARGBMatrixFilter I010ToAR30MatrixFilter I210ToAR30MatrixFilter I010ToARGBMatrixFilter I210ToARGBMatrixFilter I420AlphaToARGBMatrixFilter I422AlphaToARGBMatrixFilter I010AlphaToARGBMatrixFilter I210AlphaToARGBMatrixFilter Bug: libyuv:872 Change-Id: Ib33b09fd7d304688c5e06c55e0a576a964665a51 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3430334 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-09 11:50:35 +00:00
Hao Chen	91bae707e1	Optimize functions for LASX in row_lasx.cc. 1. Optimize 18 functions in source/row_lasx.cc file. 2. Make small modifications to LSX. 3. Remove some unnecessary content. Bug: libyuv:912 Change-Id: Ifd1d85366efb9cdb3b99491e30fa450ff1848661 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3507640 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-09 08:52:54 +00:00
Frank Barchard	42d76a342f	RAWToJNV21 function with 2 step conversion RAWToJ420 + J420ToNV21 on row level Pixel 6 RAWToJNV21_Opt (320 ms) Skylake Xeon RAWToJNV21_Opt (302 ms) Bug: b/220171611 Change-Id: I39dcce9cf56c576b95666bb4fb1baccf9fbc7f7a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3495876 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-01 19:33:49 +00:00
Hao Chen	2dd3ea6f39	Fix Bugs on mips platform V2. This patch adds some deleted control macros so that these MSA optimization functions can be called normally on mips platform. There are also some modifications to adapt to the clang compiler. Bug: libyuv:918 Change-Id: I6ffadc6582682b5eaeae2e0f4033d66d370b48b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3494667 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-01 13:16:31 +00:00
Frank Barchard	e77531f6f1	Fix RotatePlane by 90 on Neon when source width is not a multiple of 8 Bug: b/220888716, b/218875554, b/220205245 Change-Id: I17e118ac9b9a7013386a5f0ad27a2dd249474ae5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3483576 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-02-23 19:16:53 +00:00
Hao Chen	3b8c86d23a	Fix bugs on mips platform. This patch fixes compilation errors caused by the removal of kUVBias and two failed test cases of LibYUVConvertTest.RGB565ToI420_Opt and LibYUVConvertTest.ARGB1555ToI420_Opt. Bug: libyuv:918 Change-Id: I1a66bcd7ef616aacbeca5b4015013015ccdf0f18 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3477416 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-02-22 18:24:02 +00:00
Justin Green	b4ddbaf549	Add support for MM21. Add support for MM21 to NV12 and I420 conversion, and add SIMD optimizations for arm, aarch64, SSE2, and SSSE3 machines. Bug: libyuv:915, b/215425056 Change-Id: Iecb0c33287f35766a6169d4adf3b7397f1ba8b5d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3433269 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Justin Green <greenjustin@google.com>	2022-02-03 17:01:49 +00:00
Frank Barchard	804980bbab	DetilePlane and unittest for NEON Bug: libyuv:915, b/215425056 Change-Id: Iccab1ed3f6d385f02895d44faa94d198ad79d693 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3424820 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-01-31 20:05:55 +00:00
Frank Barchard	2c6bfc02d5	Remove MMI support Bug: libyuv:916 Change-Id: I345b7e271ceb4b32fe91e292915e66be40812810 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3415817 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-01-26 08:41:33 +00:00
Hao Chen	2f87e9a713	Add optimization functions in scale_lsx.cc file. Optimize 20 functions in source/scale_lsx.cc file. All test cases passed on loongarch platform. Bug: libyuv:913 Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-01-21 01:34:38 +00:00
Hao Chen	f8e2da48ae	Add optimization functions in rotate_lsx.cc file. Optimize two functions in source/rotate_lsx.cc file. All test cases passed on loongarch platform. Bug: libyuv:913 Change-Id: Idf670a1bc078f6284a499a292e0cb795f5b603b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351468 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-01-21 01:34:38 +00:00
Hao Chen	dfe046d272	Add optimization functions in row_lsx.cc file. Optimize 44 functions in source/row_lsx.cc file. All test cases passed on loongarch platform. Bug: libyuv:913 Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-01-21 01:34:38 +00:00
Hao Chen	de8ae8c679	Add optimization functions in row_lasx.cc file. Optimize 32 functions in source/row_lasx.cc file. All test cases passed on loongarch platform. Bug: libyuv:912 Signed-off-by: Hao Chen <chenhao@loongson.cn> Change-Id: I7d3f649f753f72ca9bd052d5e0562dbc6f6ccfed Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351466 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-01-21 01:34:38 +00:00
Hao Chen	51de1e16f2	Add supports for loongarch LSX and LASX. 1. Add supports for LSX and LASX. 2. Three optimization functions are added in loongarch/row_lasx.cc file: I422ToARGBRow_LASX,I422ToRGBARow_LASX,I422AlphaToARGBRow_LASX. Bug: libyuv:912, Bug: libyuv:913 Change-Id: I043c2704f99a5215724b5c0b7f97e6bf5f7a199b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329189 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-01-20 19:25:38 +00:00
Frank Barchard	90ffd5cba9	I420ToARGB for AVX512 On Skylake Xeon AVX512 I420ToARGB_Opt (2050 ms) AVX2 I420ToARGB_Opt (2533 ms) SSSE3 I420ToARGB_Opt (3688 ms) Bug: libyuv:911 Change-Id: I2214cc15dec24b06541895ca59d88990edbb2216 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3382100 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-01-14 09:17:33 +00:00
Frank Barchard	cdd62da670	VNNI detect Bug: libyuv:911 Change-Id: Ic4e7720b4d5c20010470f06a7021d1a2426e765f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3381495 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-01-12 07:08:20 +00:00
Frank Barchard	78625492cb	InterpolateRow_AVX2 use AVX2 instead of ERMS for 100% Bug: b/210066781 Change-Id: I709e403f03bd6b9f8fe693b165b242b784076fe0 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329072 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-12-15 03:54:18 +00:00
Frank Barchard	fdc71956bd	InterpolateRow_AVX2 - extend width count to 64 bits Bug: b/210066781 Change-Id: Ib9052d8edfce29b95ca02a6f7254d3ff35d2b64d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329070 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-12-10 04:11:18 +00:00
Frank Barchard	d7a2d5da87	J400ToARGB optimized for Exynos using ZIP+ST1 Bug: 204562143 Change-Id: I56c98198c02bd0dd1283f1c14837730c92832c39 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3328702 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-12-10 01:00:07 +00:00
Frank Barchard	000806f373	NV21ToYUV24 replace ST3 with ST1. ARGBToAR64 replace ST2 with ST1 On Samsung S8 Exynos M2 Was ST3 NV21ToYUV24_Opt (769 ms) Now ST1 NV21ToYUV24_Opt (473 ms) Was ST2 ARGBToAR64_Opt (1759 ms) Now ST1 ARGBToAR64_Opt (987 ms) Skylake Xeon, AVX2 version: Was NV21ToYUV24_Opt (885 ms) Now NV21ToYUV24_Opt (194 ms) Bug: b/204562143, b/124413599 Change-Id: Icc9cb64d822cd11937789a4e04fbb773b3e33aa3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3290664 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-11-24 07:38:49 +00:00
Frank Barchard	a04e4f87fb	Fix scale any mask parameter bug for NV12Scale Bug: None Change-Id: Ib4e174c086162ee709faf4b04c7d5d5847a7de3d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3267488 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-11-08 20:00:04 +00:00
Frank Barchard	fa043c7a64	Android420ToI420Rotate function to convert with rotation - adapted from Android420ToI420, adding a rotation parameter - SplitRotateUV added to rotate and split the UV channel of NV12 or NV21 - rename RotateUV functions to SplitRotateUV Bug: b/203549508 Change-Id: I6774da5fb5908fdf1fc12393f0001f41bbda9851 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3251282 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-10-28 22:38:04 +00:00
Frank Barchard	b179f1847a	Enable SIMD for exact RGB to Y conversions Bug: libyuv:908, b/202888439 Change-Id: Icc5470b85d91b441ded9958ee04b4f32246646f0 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3230489 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-10-19 07:54:50 +00:00
Frank Barchard	f0cfc1f1c8	ubsan friendly unaligned tests - ubsan complains on unaligned tests when an int16 or int32 is stored unaligned in C. Although current Intel, ARM, Mips and PPC can do unaligned load/store, its not guaranteed and could crash a CPU that doesnt support it. - unaligned tests use offset of 2 or 4, which ubsan accepts. - unittest fills in random buffer with 2 bytes at a time instead of a short. - row common functions for int16 types use 2 shorts instead of 1 int. Bug: libyuv:908, b/203243873 Change-Id: Idf13fa901647d7b0975f1947291caa781999a9bc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3229782 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-10-18 18:03:28 +00:00
Frank Barchard	55b97cb48f	BIT_EXACT for unattenuate and attenuate. - reenable Intel SIMD unaffected by BIT_EXACT - add bit exact version of ARGBAttenuate, which uses ARM version of formula. - add bit exact version of ARGBUnatenuate, which mimics the AVX code. Apply clang format to cleanup code. Bug: libyuv:908, b/202888439 Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-10-15 19:46:02 +00:00
Frank Barchard	11cbf8f976	Add LIBYUV_BIT_EXACT macro to force C to match SIMD - C code use ARM path, so NEON and C match - C used on Intel platforms, disabling AVX. Bug: libyuv:908, b/202888439 Change-Id: Ie035a150a60d3cf4ee7c849a96819d43640cf020 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3223507 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-10-14 20:37:39 +00:00
Frank Barchard	daf9778a24	Fix for failed compile with armv-7a neon gcc Bug: libyuv:907 Change-Id: I955e83c72b57ce5ba45730030b32f337be610a21 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3216739 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-10-12 18:17:50 +00:00
Frank Barchard	b92a60320f	ConvertFromI420 respect destination stride for NV12 and NV21 Bug: libyuv:904 Change-Id: Ie1fd39c693e64661eb52f75492a261384db70776 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3176483 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-09-22 19:44:06 +00:00
Frank Barchard	33a68ec779	JPeg decoder remove assert when out of data Bug: b/186665202 Change-Id: I406cc2ef8cfa2cdf987d41c4bd85d3024aedfaab Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3166710 Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-09-16 23:11:14 +00:00
Frank Barchard	ed5a9c81de	change ld1 to ldr for memory references to allow GCC to use an offset Bug: chromium:819294, libyuv:903 Change-Id: I1cd19cc5a068c421d1112c9ea6090e18fb002a4c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3152821 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-09-09 23:46:46 +00:00
Stephan Hartmann	c6ed1b8f0e	GCC: force memory address without offset on aarch64 With "m" GCC generates a memory address with offset which is not allowed with ld1 on aarch64. Change constraint to "Q" to force address without offset. Bug: chromium:819294, libyuv:903 Change-Id: Iaae24bc6882cdef823259040a37fdbfc31f91185 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922146 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-09-09 22:20:27 +00:00
Frank Barchard	639dd4ea76	Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x. - swap U and V when crop x is odd - document YUY2 and UYVY formats - apply clang-format Bug: libyuv:902 Change-Id: I045e44c907f4a9eb625d7c024b669bb308055f32 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3039549 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-07-19 22:22:22 +00:00
Frank Barchard	d19f69d9df	Update Android.bp to always enable NEON Relax Cpu unittest to allow ARM emulator to run. Bug: libyuv:863, libyuv:877, b/178283356 Change-Id: I3c751574219fdf731a3f9d4a79934a349acba446 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2950938 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2021-06-10 19:31:48 +00:00
Stephan Hartmann	6ea7647b6e	GCC: replace mov .8h with mov .16b mov Vy.8h, Vx.8h isn't a valid instruction. Clang/LLVM automatically replace it with mov Vy.16b, Vx.16b. Bug: chromium:819294 Change-Id: I8a0cbf2e6c4efcc6c1e38812cee949bde7e99b11 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922147 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-06-01 17:44:56 +00:00
Frank Barchard	5b3351bd07	Fix ARGB1555ToI420 odd width bug in C code. Was [ RUN ] LibYUVConvertTest.ARGB1555ToI420_Any third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure Expected equality of these values: dst_uv_c[i * kStrideUV + j] Which is: '\x8B' (139) dst_uv_opt[i * kStrideUV + j] Which is: '\x92' (146) third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure [ FAILED ] LibYUVConvertTest.ARGB1555ToI420_Any Now [ RUN ] LibYUVConvertTest.ARGB1555ToI420_Any [ OK ] LibYUVConvertTest.ARGB1555ToI420_Any (0 ms) Bug: libyuv:894, b/155722711 Change-Id: I12dcacd0ecfff4ede5693a2554e9bb10dc8586c1 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2870484 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-05-04 19:03:22 +00:00
Frank Barchard	49ebc996aa	Make 2 step transitive tests measure 2 step time. Add tests of all macros used by libyuv public headers When a 1 step conversion is added, a 2 step test can compare the old 2 step method to the 1 step. A 1 step unittest is also added which compares C to SIMD. Making the 2 step conversions measure performance of the 2 steps allows the old 2 step performance to be compared to 1 step. All macros used in public headers are added to an ifdef test. Showing them in a unittest allows some diagnostics when a test is failing. Bug: libyuv:901 Change-Id: I7ffa6ed0cb3b506fa1b7fd4b7b1b729658c3c266 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2857916 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-04-30 18:14:57 +00:00
Yuan Tong	99cddd8051	Fix ARM YuvConstants value R=fbarchard@chromium.org Bug: libyuv:901 Change-Id: Ie2f9ac214a2a7462cc613f510b64308d3b861b74 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2856225 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-04-29 15:57:20 +00:00
Yuan Tong	c9843de02a	Optimize unlimited data for Intel Use unsigned coefficient and signed UV value in YUVTORGB. R=fbarchard@chromium.org Bug: libyuv:862, libyuv:863 Change-Id: I32e58b2cee383fb98104c055beb0867a7ad05bfe Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2850016 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-04-27 20:35:27 +00:00
Frank Barchard	5e05f26a2b	Switch win32 to row_gcc for clangcl. Bug: libyuv:900, libyuv:848, b/178283356, b/185922513 Change-Id: I7697953753391c555a778198db36412c853fb29e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2844962 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Dale Curtis <dalecurtis@chromium.org>	2021-04-22 19:32:32 +00:00
Yuan Tong	8c8d907d29	Unlimited data for Windows Port unlimited data YUVToRGB code to windows. Disable MIPS YUVToRGB assembly for now to get correct result. R=fbarchard@chromium.org Bug: libyuv:862, libyuv:863 Change-Id: Ib3e99c98082badfef4eb671205a151dd1de56b67 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2839383 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-04-22 01:48:48 +00:00
Frank Barchard	5e83cac0d5	Disable win32 SIMD Bug: libyuv:900, libyuv:848, b/178283356, b/185922513 Change-Id: Iee7d9970c7991856c8f51158cd12ec72ee9c57eb Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2844779 Reviewed-by: Dale Curtis <dalecurtis@chromium.org>	2021-04-21 21:37:44 +00:00
Yuan Tong	a1814576bf	Unlimited data for Intel Use unsigned coefficients on Intel. Make C, NEON and AVX2 match under LIBYUV_UNLIMITED_DATA. Bug: libyuv:862, libyuv:863 Change-Id: I6c02147ea3c1875c4fc23863435aea86dcf5880a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2830180 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-04-19 20:29:10 +00:00
Yuan Tong	590c17ce40	Refactor NEON YUVToRGB, Remove subsampling Refactor NEON YUVToRGB Assembly to support HBD data as input and output. Work on YUV444 internally, remove subsampling in I444ToARGB. libyuv_unittest --gtest_filter=.NV??ToARGB_Opt:UYVYToARGB_Opt:YUY2ToARGB_Opt:I4*ToARGB_Opt Bug: libyuv:895, libyuv:862, libyuv:863 Change-Id: I05b56ea8ea56d9e523720b842fa6e4b122ed4115 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2810060 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-04-15 19:13:10 +00:00
Frank Barchard	287158925b	use width + 1 for odd width tests Bug: libyuv:894, libyuv:898, libyuv:899 Change-Id: Ieba8eaeb8b06f0323824967776673e339b263220 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2809701 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-04-09 20:17:55 +00:00
Yuan Tong	2cd098f83b	Fix MergeAR64Plane on odd width R=fbarchard@chromium.org Bug: libyuv:898 Change-Id: I031e008ea91baba1c7598efa0eda70750cbfce85 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2810066 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-04-08 09:18:34 +00:00
Frank Barchard	d1bfc6ead6	gcc fix for row_gcc.cc vbroadcastss Bug: libyuv:893 Change-Id: I5b70e6a94356878deb348cbd19c9e1e50b2a18aa Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2808793 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-04-06 21:31:29 +00:00
Frank Barchard	60db98b6fa	clang-tidy applied Bug: libyuv:886, libyuv:889 Change-Id: I2d14d03c19402381256d3c6d988e0b7307bdffd8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2800147 Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-04-01 21:42:47 +00:00
Mirko Bonadei	34bf48e160	Check if LIBYUV_UNLIMITED_DATA is defined to avoid -Wundef. No-Try: True Bug: None Change-Id: I32f6da42c82628210f82ce446d4ec69e2013a2ff Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2799761 Commit-Queue: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-04-01 18:41:25 +00:00
Yuan Tong	8a13626e42	Add MergeAR30Plane, MergeAR64Plane, MergeARGB16To8Plane These functions merge high bit depth planar RGB pixels into packed format. Change-Id: I506935a164b069e6b2fed8bf152cb874310c0916 Bug: libyuv:886, libyuv:889 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780468 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-03-31 20:46:02 +00:00
Frank Barchard	312c02a5aa	Fixes for SplitUVPlane_16 and MergeUVPlane_16 Planar functions pass depth instead of scale factor. Row functions pass shift instead of depth. Add assert to C. AVX shift instruction expects a single shift value in XMM. Neon pass shift as input (not output). Split Neon reimplemented as left shift on shorts by negative to achieve right shift. Add planar unitests Bug: libyuv:888 Change-Id: I8fe62d3d777effc5321c361cd595c58b7f93807e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2782086 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-03-24 21:37:10 +00:00
Frank Barchard	d8f1bfc981	Add RAWToJ420 Add J420 output from RAW. Optimize RGB24 and RAW To J420 on ARM by using NEON for the 2 step conversion. Also fix sign-compare warning that was breaking Windows build Bug: libyuv:887, b/183534734 Change-Id: I8c39334552dc0b28414e638708db413d6adf8d6e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2783382 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2021-03-23 23:45:54 +00:00
Frank Barchard	b046131c0b	Replace MOV .4s with MOV .16b for GCC compatability MOV Vy.4s, Vx.4s is not a valid instruction form (even though LLVM allows it). It should be MOV Vy.16b, Vx.16b (.8b for 64-bit variants) Bug: None Change-Id: I3c3b42288a0ebc275962fa3adad707b351d00d4c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780155 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-03-23 18:03:06 +00:00
Yuan Tong	f37014fcff	Add support for AR64 format Add following conversions: ARGB,ABGR <-> AR64,AB64 AR64 <-> AB64 R=fbarchard@chromium.org Change-Id: I5ca5b40a98bffea11981e136afae4a511ba6c564 Bug: libyuv:886 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2746780 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-03-13 20:55:21 +00:00
Yuan Tong	d47031c0d4	Fix x86 windows build error Correct rule for marking relevant functions as available. Fix some clang-tidy issues. R=fbarchard@chromium.org Change-Id: I66fa0d7ae5a681356f94bfc1bc82b7f1f407d5df Bug: libyuv:884, libyuv:885 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2738414 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-03-05 03:24:44 +00:00
Frank Barchard	ba033a11e3	Add 12 bit YUV to 10 bit RGB Bug: libyuv:843 Change-Id: I0104c8fcaeed09e83d2fd654c6a5e7d41bcb74cf Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2727775 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2021-03-05 01:09:37 +00:00
Martin Storsjö	95ff456c33	Fix the mask for odd widths for ScaleRowUp2_Linear*_Any_NEON These NEON functions produce 16 pixels per iteration each, thus use the mask 15, not 7. Change-Id: I1f3eb691a9ca4af705393b2842b18b65f6878926 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2731801 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-03-03 16:19:17 +00:00
Yuan Tong	cdabad5bfa	Add more 10 bit YUV To RGB function The following functions are added: planar YUV: I410ToAR30, I410ToARGB planar YUVA: I010AlphaToARGB, I210AlphaToARGB, I410AlphaToARGB biplanar YUV: P010ToARGB, P210ToARGB P010ToAR30, P210ToAR30 biplanar functions can also handle 12 bit and 16 bit samples. libyuv_unittest --gtest_filter=LibYUVConvertTest.10ToA:LibYUVConvertTest.P?1?ToA* R=fbarchard@chromium.org Bug: libyuv:751, libyuv:844 Change-Id: I2be02244dfa23335e1e7bc241fb0613990208de5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2707003 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-03-03 15:48:47 +00:00
Yuan Tong	c41eabe3d4	Add full 16 bit scaling up by 2x function R=fbarchard@chromium.org Change-Id: I4a869aefdc16e34357a615727711594c5d8e3a80 Bug: libyuv:882 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2719842 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-03-02 19:29:02 +00:00
Yuan Tong	a8c181050c	Add 10/12 bit YUV To YUV functions The following functions (and their 12 bit variant) are added: planar, 10->10: I410ToI010, I210ToI010 planar, 10->8: I410ToI444, I210ToI422 planar<->biplanar, 10->10: I010ToP010, I210ToP210, I410ToP410 P010ToI010, P210ToI210, P410ToI410 R=fbarchard@chromium.org Change-Id: I9aa2bafa0d6a6e1e38ce4e20cbb437e10f9b0158 Bug: libyuv:834, libyuv:873 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2709822 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-02-25 23:16:54 +00:00
Frank Barchard	08815a2976	Scale 12 functions that are scale 16 but with only low 12 bits valid Rename yuvconstants to .c and use round from math.h Bug: libyuv:882, b/180472591 Change-Id: I70720bf3e0833ba00df0d721f12020fba0b07a03 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2706966 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-02-19 18:04:48 +00:00
Frank Barchard	d768774299	add yuvconvstants util miscellaneous cleanup of other code/comments Bug: libyuv:873, libyuv:877 Change-Id: I0d8caf9a65908ff8898b25494f7c724775f84fa3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2692930 Reviewed-by: Wan-Teh Chang <wtc@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-02-12 19:45:16 +00:00
Yuan Tong	d4ecb70610	Add P010ToP410 and P210ToP410 These are 16 bit bi-planar convert functions to scale UV plane to Y plane's size using (bi)linear filter. libyuv_unittest --gtest_filter=ToP41 R=fbarchard@chromium.org Bug: libyuv:872 Change-Id: I3cb4fafe2b2c9eedd0d91cf4c619abb9ee107bc1 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2690102 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-02-12 14:55:24 +00:00
Frank Barchard	12a4a2372c	Rounding added to scaling upsampler Bug: libyuv:872, b/178521093 Change-Id: I86749f73f5e55d5fd8b87ea6938084cbacb1cda7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2686945 Reviewed-by: Wan-Teh Chang <wtc@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-02-10 18:51:02 +00:00
Yuan Tong	f7fc83f46d	Add NV12ToNV24 and NV16ToNV24 These are bi-planar convert functions to scale UV plane to Y plane's size using (bi)linear filter. libyuv_unittest --gtest_filter=ToNV24 R=fbarchard@chromium.org Change-Id: I3d98f833feeef00af3c903ac9ad0e41bdcbcb51f Bug: libyuv:872 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2682152 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-02-09 07:38:40 +00:00
Frank Barchard	942c508448	BT.2020 Full Range yuvconstants new color util to compute constants needed based on white point. [ RUN ] LibYUVColorTest.TestFullYUVV hist -2 -1 0 1 2 red 0 1627136 13670144 1479936 0 green 319285 3456836 9243059 3440771 317265 blue 0 1561088 14202112 1014016 0 Bug: libyuv:877, b/178283356 Change-Id: If432ebfab76b01302fdb416a153c4f26ca0832d6 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2678859 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-02-06 00:26:55 +00:00
Yuan Tong	fc61dde1eb	Add special optimization for I420ToI444 and I422ToI444 These functions use (bi)linear filter, to scale U and V planes to the size of Y plane. This will help enhance the quality of YUV to RGB conversion. Also added 10bit and 12bit version: I010ToI410 I210ToI410 I012ToI412 I212ToI412 libyuv_unittest --gtest_filter=LibYUVConvertTest.I42ToI444:LibYUVConvertTest.I1ToI41* R=fbarchard@chromium.org Change-Id: Ie4a711a5ba28f2ff1f44c021f7a5c149022264c5 Bug: libyuv:872 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2658097 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-02-03 10:53:02 +00:00
Frank Barchard	c28d404936	win32 build fix for I422ToRGBA Bug: libyuv:877, b/178713286 Change-Id: Iad55df99083b9a4bb9306e052e0e687e58570d96 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2657701 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-01-29 10:07:08 +00:00
Frank Barchard	39240f7149	Fix in row_gcc.cc to change subq to sub subq is only available for x64 sub works for both 32 bit x86 and 64 bit x64 Fox in row_gcc.cc for 32 bit x86 running out of registers. Fix in row_neon.cc for split function argb paramter name. Bug: libyuv:877, b/178283356, b/178713286 Change-Id: If2b12a2d6168eab08005a2cdf2c17a470a924dd1 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2656771 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2021-01-28 19:34:29 +00:00
Yuan Tong	a85cc26fde	Add MergeARGBPlane and SplitARGBPlane These functions convert between planar and interleaved ARGB, optionally fill 255 to alpha / discard alpha. This can help handle YUV(A) with Identity matrix, which is basically planar ARGB. libyuv_unittest --gtest_filter=LibYUVPlanarTest.ARGBPlane:LibYUVPlanarTest.XRGBPlane R=fbarchard@google.com Change-Id: I522a189b434f490ba1723ce51317727e7c5eb112 Bug: libyuv:877 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2649887 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-01-27 19:33:51 +00:00
Frank Barchard	37480f12c6	Add BT.709 Full Range yuv constants. MAKEYUVCONSTANTS macro to generate struct for YUV to RGB Fix I444AlphaToARGB unit test for ARM by adjusting C version to match Neon implementation. Bug: libyuv:879, libyuv:878, libyuv:877, libyuv:862, b/178283356 Change-Id: Iedb171fbf668316e7d45ab9e3481de6205ed31e2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2646472 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2021-01-26 18:36:56 +00:00
Yuan Tong	08d0dce5fc	Add I422AlphaToARGB and I444AlphaToARGB Bug: libyuv:878 Change-Id: I64c314326ac7ae5242acc64e20016e30adc6d17f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2639439 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-01-23 00:40:33 +00:00
Frank Barchard	93b1b332cd	NV12 Bilinear upsampling bug fix Reenable InterpolateRow_AVX2 Bug: libyuv:838, b/68638384, b/176195584 Change-Id: I990fcc204d89ee9b8f5264184558a08aa21d6a9f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2626067 Reviewed-by: Eugene Zemtsov <eugene@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-01-12 23:10:42 +00:00
Frank Barchard	1d3f901aa0	Scale bug fix with msan when scaling up in height and down in width with box filter. runyuv3 Scale*Rotate_Box --libyuv_width=200 --libyuv_height=50 Bug: chromium:1158178, libyuv:875, b/176195584 Change-Id: Ic9a380179433bf3dffb951e7b5563491592d5aa5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2603877 Reviewed-by: Eugene Zemtsov <eugene@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2020-12-26 20:23:13 +00:00
Evan Shrubsole	dfaf7534e0	NV12 Copy, include scale_uv.h Bug: None Change-Id: I8148def3f1253913eb62fcc000e5f72704262a17 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2569748 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2020-12-08 18:54:16 +00:00
Frank Barchard	b7a1c5ee5d	Scale by even factor low level row function Bug: b/171884264 Change-Id: I6a94bde0aa05e681bb4590ea8beec33a61ddbfc9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2518361 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2020-11-03 21:25:18 +00:00
Frank Barchard	cec28e7088	PlaneScale, UVScale and ARGBScale test 3x and 4x down sample. Intel SkylakeX UVTest3x (1925 ms) UVTest4x (2915 ms) PlaneTest3x (2040 ms) PlaneTest4x (4292 ms) ARGBTest3x (2079 ms) ARGBTest4x (1854 ms) Pixel 2 ARGBTest3x (3602 ms) ARGBTest4x (4064 ms) PlaneTest3x (3331 ms) PlaneTest4x (8977 ms) UVTest3x (3473 ms) UVTest4x (6970 ms) Bug: b/171798872, b/171884264 Change-Id: Iebc70fed907857b6cb71a9baf2aba9861ef1e3f7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2505601 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2020-10-28 20:41:59 +00:00
Frank Barchard	5c4dc242f4	MJPGToNV12 added and build files sorted Bug: None Change-Id: I87aa64a14bb3f0785f984f492e56fcf2313431ce Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2502780 Reviewed-by: Evan Shrubsole <eshr@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2020-10-28 16:24:38 +00:00

... 3 4 5 6 7 ...

1997 Commits