libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-02-07 18:26:43 +08:00

Author	SHA1	Message	Date
George Steed	73f6e82b1a	[AArch64] Add missing clobber, fix zero-init for compare kernels The "memory" clobber needs to be present even if the asm does not store anything to memory, since otherwise the compiler would be allowed to reorder earlier stores to the pointers after they would be needed by the asm. Also fix up the zero-initialisation of accumulators in SumSquareError_NEON, since EOR'ing a register by itself is not a recognised zeroing idiom on most AArch64 micro-architectures. Bug: libyuv:976 Change-Id: I3175367abf6f59db8371b4478f1156950277d7c5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5378705 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:38:06 +00:00
George Steed	ba0bba5b2b	[AArch64] Use getauxval(AT_HWCAP{,2}) for feature detection This has the advantage of also working under emulation where faking /proc/cpuinfo is not supported. For the Chromium sandbox, getauxval is supported since API version 18. The minimum supported API version at time of writing is 21 so we should be able to use getauxval unconditionally. On the off-chance the call fails it will return 0 and we will correctly fall-back to using only Neon. Change-Id: Ibbaa9caec1915ac0725c42d6cd2abc7ce19786c7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5453620 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-19 06:37:04 +00:00
George Steed	4838e7a194	[AArch64] Load full vectors in ARGB{Add,Subtract}Row Using full vectors for Add and Subtract is a win across the board. Using full vectors for the multiply is less obviously a win, especially for smaller cores like Cortex-A53 or Cortex-A57, so is not considered for this change. Observed changes in performance with this change compared to the existing Neon code: \| ARGBAddRow_NEON \| ARGBSubtractRow_NEON Cortex-A55 \| -5.1% \| -5.1% Cortex-A510 \| -18.4% \| -18.4% Cortex-A76 \| -28.9% \| -28.7% Cortex-A720 \| -36.1% \| -36.2% Cortex-X1 \| -14.2% \| -14.4% Cortex-X2 \| -12.5% \| -12.5% Bug: libyuv:976 Change-Id: I85316d4399c93b53baa62d0d43b2fa453517f5b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5457433 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 19:02:43 +00:00
George Steed	90070986ae	[AArch64] Improve RGB565TOARGB using SRI instructions The existing code performs a lot of shifts and combines the R and B components into a single vector unnecessarily. We can express this much more cleanly by making use of the SRI instruction to insert and replace shifted bits into the original data, performing the 5/6-bit to 8-bit expansion in a single instruction if the source bits are already in the high bits of the byte. We still need a single separate XTN instruction to narrow the B component before the left shift since Neon does not have a narrowing left shift instruction. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 RGB565ToYRow_NEON \| -22.1% \| -23.4% \| -25.1% RGB565ToUVRow_NEON \| -26.8% \| -20.5% \| -18.8% RGB565ToARGBRow_NEON \| -38.9% \| -32.0% \| -23.5% Bug: libyuv:976 Change-Id: I77b8d58287b70dbb9549451fc15ed3dd0d2a4dda Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5374286 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-04-18 19:01:26 +00:00
George Steed	1ca7c4e1cc	[AArch64] Avoid lane-indexed loads for UV when loading I444/I422 Most micro-architectures seem to prefer an additional ZIP1 instruction in READYUV422 to needing a lane-indexed LD1 load instruction. We introduce a new macro to handle the YUV to RGB conversion where the U and V components are in separate vectors. This avoids causing a slowdown for the UV-interleaved input format kernels (NV12 and NV21) where we do not want to separate them. Reduction in runtime for selected kernels on Cortex cores (no performance difference observed on Cortex-A55): A510 A76 A720 X1 X2 I422AlphaToARGBRow_NEON -4.3% -7.3% -10.1% -4.0% -4.4% I422ToARGB1555Row_NEON -4.5% +0.4% -7.9% -4.8% -3.9% I422ToARGB4444Row_NEON -7.7% -2.6% -4.1% -1.9% -1.3% I422ToARGBRow_NEON -3.7% -2.9% -10.2% -3.8% -4.4% I422ToRGB24Row_NEON -5.9% +5.4% -3.2% -4.3% -4.3% I422ToRGB565Row_NEON -4.8% -2.8% -8.5% -3.8% -4.6% I422ToRGBARow_NEON -3.7% +4.6% -10.5% -3.0% -4.5% I444AlphaToARGBRow_NEON -3.5% +2.7% -3.7% -5.0% -8.2% I444ToARGBRow_NEON -1.8% -15.1% -3.5% -6.5% -8.1% I444ToRGB24Row_NEON -2.0% -6.8% +0.1% -4.7% +1.2% There are a few cases which are slower on Cortex-A76, but significant speedups elsewhere. Bug: libyuv:976 Change-Id: Ib3b4ef81f7bfc1d7ff9c4c24aef9ad86741410ff Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5465580 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:46:59 +00:00
George Steed	bfedc8bc11	[AArch64] Improve ARGB{,1}555TOARGB using SRI instructions The existing transformations can be more cleanly expressed by using SRI instructions to perform a shift and simultaneously merge in to an existing value. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 ARGB1555ToYRow_NEON \| -26.2% \| -14.9% \| -28.2% ARGB1555ToUVRow_NEON \| -25.2% \| -18.4% \| -20.9% ARGB1555ToARGBRow_NEON \| -43.6% \| -32.8% \| -19.7% Bug: libyuv:976 Change-Id: Id07ac6f2cd3eb9bb70f9e29fc1f4b29fe26156ec Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5383444 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:46:10 +00:00
George Steed	95b0a3326c	[AArch64] Improve ARGBTOARGB4444 using SRI instructions The existing sequence to convert from 8-bit ARGB to 4-bit ARGB4444 makes use of a lot of shifts and bit-clears before ORR'ing the pairs together. This is unnecessary since we can do the same with the SRI instruction, so use that instead. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 ARGBToARGB4444Row_NEON \| -15.3% \| -16.6% I422ToARGB4444Row_NEON \| -2.7% \| -11.9% Bug: libyuv:976 Change-Id: I86cd86c7adf1105558787a679272179821f31a9d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5383443 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-18 18:26:27 +00:00
George Steed	b265c311b7	[AArch64] Avoid unnecessary work in READYUV400 The value of UV components in the vector are known and the vectors are never overwritten, so we can hoist the UV-specific parts of the calculation out of the loop. Reduction in runtimes for I400ToARGBRow_NEON: Cortex-A55: -10.0% Cortex-A510: -3.7% Cortex-A76: -19.3% Cortex-X2: -14.4% Bug: libyuv:976 Change-Id: I17d6de4e1790f71407e12ff84548568cc3ebbe1a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5457434 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-17 16:47:58 +00:00
George Steed	ea56460300	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBMultiplyRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBMultiplyRow_NEON: Cortex-A55: -22.3% Cortex-A510: -56.6% Cortex-A76: -45.5% Cortex-X2: -54.6% Change-Id: I9103111a109a4d87d358e06eb513746314aaf66a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454832 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:28:56 +00:00
George Steed	7266cda79c	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBSubtractRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBSubtractRow_NEON: Cortex-A55: -15.0% Cortex-A510: -59.8% Cortex-A76: -54.4% Cortex-X2: -70.4% Change-Id: Ifbfce9e6a45159932c09d9b0229215a36fa22f43 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454833 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:04:43 +00:00
George Steed	e646991347	[AArch64] Use LD1/ST1 rather than LD4/ST4 in ARGBAddRow_NEON There is no need to de-interleave channels here since we are applying the same operation across all lanes. LD4 and ST4 are known to be significantly slower than LD1/ST1 on some micro-architectures so we should prefer to avoid them where possible. Reduction in runtimes observed for ARGBAddRow_NEON: Cortex-A55: -15.0% Cortex-A510: -59.8% Cortex-A76: -54.4% Cortex-X2: -70.4% Change-Id: Id04e5259d8e5e7511dad5df85cdf9759b392cb99 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5454831 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-16 07:03:44 +00:00
Cosmina Dunca	9d200b704f	[AArch64] Optimize ScaleARGBRowDown2Box_NEON Use a pair of LD2s to load data interleaved and perform a couple of additions on the registers in order to avoid needing LD4 and ST4 instructions, since these are costly on some micro-architectures. Reduction in run times: Cortex-A55: -20.5% Cortex-A510: -28.3% Cortex-A76: -21.5% Bug: libyuv:976 Change-Id: If66e1e148b031c2cd288ff412f351d7a0b9b91e7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5371774 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-04-10 20:07:22 +00:00
Cosmina Dunca	9441ddd883	[AArch64] Optimize ScaleARGBRowDownEven_NEON Replace indexed LD1 instructions with LDRs to avoid loop-carried dependencies on unused lanes between consecutive iterations of the loop. Reduction in run times: Cortex-A55: -10.9% Cortex-A510: -70.7% Cortex-A76: -56.8% Bug: libyuv:976 Change-Id: Ia767e76002c7823177e80163ebf034e023e9a6cc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5371771 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-04-10 20:03:39 +00:00
George Steed	e52007eff9	[AArch64] Add SVE2 implementation for I444ToARGBRow Being able to use SVE2 functionality for these kernels has a number of performance wins compared to the existing Neon code: * For the Y component calculation we are able to use UMULH, versus the existing UMULL x2 + UZP2 sequence in Neon. * For the RGBTORGBA8 calculation we are able to take advantage of interleaving narrowing instructions, allowing us to use ST2 rather than ST4 for the store. This is a big performance win on some micro-architectures where ST4 is costly. * The use of predication means we do not need to add "any" kernels, we can simply rerun the calculation with a not-full predicate for the final iteration. To avoid the overhead of generating a predicate register on every iteration we duplicate the loop body and only generate a predicate on the final iteration of the loop. This costs a small amount on the final iteration but should still be significantly quicker than the overhead of a function call needed by the "any" cases. Duplicating the loop body to reduce the use of the WHILELT instruction improves little core performance by ~12% by itself but has negligable impact on other micro-architectures. Reduction in runtime for the new SVE2 implementation compared to the existing Neon implementation on selected micro-architectures: Cortex-A510: -36.5% Cortex-A720: -17.3% Cortex-X2: -11.3% Bug: libyuv:973 Change-Id: I2a485f0dfa077a56f96b80a667ad38bbea47b4b4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424739 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:11:01 +00:00
George Steed	9a8be20def	[AArch64] Add :libyuv_sve library in preparation for SVE kernels This commit only adds the bare minimum to get the new library building through GN, the actual content of row_sve.cc is empty for now until we start porting some kernels across. Bug: libyuv:973 Change-Id: Ibdf4fc258761f3e507d700f27a405099c667ac75 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424738 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:10:01 +00:00
George Steed	f2e78e1304	[AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow Using the dot-product instructions here allows us to avoid needing LD4 for loading individual colour channels, which gives a big benefit on some micro-architectures where such instructions perform significantly worse than LD1. In addition the dot-product instructions have higher throughput compared to the Neon Observed reduction in runtimes for selected kernels moving from _NEON to _NEON_DotProd: Kernel \| Cortex-A55 \| Cortex-A510 \| Cortex-A76 \| Cortex-X2 ABGRToYJRow \| -6.5% \| -22.5% \| -43.5% \| -71.2% ABGRToYRow \| -6.5% \| -22.5% \| -43.5% \| -68.3% ARGBToYJRow \| -6.5% \| -22.5% \| -43.5% \| -68.1% ARGBToYRow \| -6.5% \| -22.5% \| -43.5% \| -68.1% BGRAToYRow \| -6.5% \| -22.5% \| -42.3% \| -68.4% RGBAToYJRow \| -6.5% \| -22.5% \| -42.2% \| -73.7% RGBAToYRow \| -6.5% \| -22.5% \| -42.3% \| -64.9% Bug: libyuv:977 Change-Id: If244190a7bdacf7e6e6b16af7e6853ee13ff6585 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424737 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-09 03:09:36 +00:00
George Steed	a038cda7b8	[AArch64] Enable detection of additional architecture features In particular there are a few extensions that are interesting for us: * FEAT_DotProd adds 4-way dot-product instructions which are useful in e.g. ARGBToY. * FEAT_I8MM adds additional mixed-sign dot-product instructions which could be useful in e.g. ARGBToUV. * FEAT_SVE and FEAT_SVE2 add support for the Scalable Vector Extension, which adds an array of new instructions including new widening loads and narrowing stores for dealing with mixed-width integer arithmetic efficiently and predication for avoiding the need for "any" cleanup loops. This commit simply adds support for detecting the presence of these features by extending the existing /proc/cpuinfo parsing, splitting it into separate Arm and AArch64 functions for simplicity. Since we have no space left in the bitset entries between Arm and X86 entries, we reuse some of the X86 entries for new AArch64 extensions. This doesn't seem obviously problematic as long as we avoid setting kCpuHasX86. Bug: libyuv:973 Bug: libyuv:977 Change-Id: I8e256225fe12a4ba5da24460f54061e16eab6c57 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5378150 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-04-05 17:48:22 +00:00
George Steed	ba796a32e7	[AArch64] Remove out of date TODO around ARGBMultiplyRow_NEON The comment refers to the code needing to be re-enabled but as far as I can tell it is already enabled, so simply remove the comment. Change-Id: Id014e8b7f5cd43c8211e1d38758299de2fad49de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5387650 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-25 22:44:45 +00:00
George Steed	5d694bec38	[AArch64] Replace UQSHRN{,2} pair by UZP2 in YUVTORGB The existing Neon code makes use of a pair of UQSHRN and UQSHRN2 instructions to extract the top half of a widened multiply result. These instructions would ordinarily saturate, however saturation can never happen in this case since we are shifting by 16 to get the top half of each element, the top bits remain as-is. We could move this to using a slightly simpler non-saturating shift, however in this case it is simpler and faster to just use UZP2 to extract the top half of each 32-bit lane directly. Reduction in runtime for selected kernels: Kernel \| Cortex-A55 \| Cortex-A76 \| Cortex-X2 I400ToARGBRow_NEON \| -9.4% \| -14.9% \| -13.9% I422AlphaToARGBRow_NEON \| -7.9% \| -11.4% \| -11.5% I422ToARGB1555Row_NEON \| -7.3% \| -17.2% \| -14.7% I422ToARGB4444Row_NEON \| -7.6% \| -17.9% \| -13.7% I422ToARGBRow_NEON \| -8.2% \| -9.8% \| -11.9% I422ToRGB24Row_NEON \| -8.0% \| -13.3% \| -12.8% I422ToRGB565Row_NEON \| -7.5% \| -15.1% \| -14.6% I422ToRGBARow_NEON \| -8.3% \| -13.1% \| -12.2% I444AlphaToARGBRow_NEON \| -8.3% \| -7.6% \| -12.7% I444ToARGBRow_NEON \| -8.6% \| -3.5% \| -13.5% I444ToRGB24Row_NEON \| -8.5% \| -7.8% \| -13.4% NV12ToARGBRow_NEON \| -8.8% \| -1.4% \| -12.0% NV12ToRGB24Row_NEON \| -8.5% \| -11.5% \| -12.3% NV12ToRGB565Row_NEON \| -7.9% \| -15.0% \| -15.7% NV21ToARGBRow_NEON \| -8.7% \| -1.6% \| -12.3% NV21ToRGB24Row_NEON \| -8.4% \| -11.5% \| -12.0% UYVYToARGBRow_NEON \| -8.8% \| -8.9% \| -11.9% YUY2ToARGBRow_NEON \| -8.7% \| -10.8% \| -13.3% Bug: libyuv:976 Change-Id: I6c505fe722e5f91f93718b85fe881ad056d8602d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5366653 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-14 20:04:46 +00:00
George Steed	8d0d885c2f	[AArch64] Avoid LD2 in YUY2ToARGBRow_NEON In this case we have an LD2 instruction followed by a pair of permutes (ZIP1 and TBL). On some micro-architectures LD2 involves use of the vector pipelines, so in these cases it is preferable to do an LD1 and then a different pair of permutes (TRN + TBL) instead to avoid the extra vector pipeline usage. Reduction in runtime on selected kernels (no observed performance delta on Cortex-A55): Kernel \| Cortex-A76 \| Cortex-X2 UYVYToARGBRow_NEON \| -2.6% \| -8.8% YUY2ToARGBRow_NEON \| -6.2% \| -4.9% Bug: libyuv:976 Change-Id: I7ca45e0c7bf7cb50cc5ab37c6a01215d9689039a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5366652 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-03-14 19:51:05 +00:00
George Steed	188e4e3afb	[AArch64] Avoid unnecessary lane-indexed loads in READYUV The existing code makes use of a pair of lane-indexed load instructions to fill the two halves of the input vector, however this has the effect of introducing an unnecessary dependency on the value of the vector from the previous loop iteration. This doesn't really seem to affect little core performance since these cores never execute enough work concurrently to hit the bottleneck, however we can improve performance on mid and big cores quite a bit by using LDR instead of LD1 to load the low lane, zeroing the upper portion of the vector rather than keeping the previous value. Reduction in runtime for select kernels (no observed performance delta on Cortex-A55): Kernel \| Cortex-A76 \| Cortex-X2 I422ToARGB4444Row_NEON \| -23.1% \| -49.3% I422ToARGBRow_NEON \| -1.2% \| -2.5% I422ToRGB24Row_NEON \| -11.7% \| -7.0% I422ToRGBARow_NEON \| -4.7% \| -3.4% I444AlphaToARGBRow_NEON \| -1.1% \| -2.4% I444ToARGBRow_NEON \| -1.6% \| -3.2% I444ToRGB24Row_NEON \| -9.6% \| -6.8% Bug: libyuv:976 Change-Id: I8c9413e0e6ed97b8f060ce42b6e8abdfb77914b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5365868 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-03-13 18:35:31 +00:00
George Steed	772bddaed7	Add missing memory/cc clobbers to AArch64 Neon kernels There are a few functions in source/scale_neon64.cc which write memory and set condition flags despite not declaring this in the asm clobber list, so add the missing clobbers. Also move a couple of memory/cc clobbers to the start of the clobber list to match other kernels. Bug: libyuv:974 Change-Id: I85f5ff5718e78a4481f7bc53cedaeceb14438895 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5309254 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-03-04 10:22:51 +00:00
Frank Barchard	b66c42d4a8	Revert "AMX detect OS support for linux kernel" This reverts commit 8c8a33762d64b916ae8469cc3fc602a64080a23a. Reason for revert: breaks sandbox Original change's description: > AMX detect OS support for linux kernel > > Bug: b/327013106 > Change-Id: Ie1784249f3a121c52e6504ff502bdc3eb245d858 > Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5329907 > Commit-Queue: Frank Barchard <fbarchard@chromium.org> > Reviewed-by: richard winterton <rrwinterton@gmail.com> Bug: b/327013106 Change-Id: If54bb84bc1167177c1869763f6ccfdf1f92fbe09 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5332617 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Bot-Commit: Rubber Stamper <rubber-stamper@appspot.gserviceaccount.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-02-29 00:33:29 +00:00
Frank Barchard	8c8a33762d	AMX detect OS support for linux kernel Bug: b/327013106 Change-Id: Ie1784249f3a121c52e6504ff502bdc3eb245d858 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5329907 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2024-02-28 03:13:44 +00:00
Frank Barchard	a6a2ec654b	Add AMXINT8 cpu detect sde -spr -- libyuv_test -- --gunit_filter=Cpu Note: Google Test filter = Cpu [==========] Running 4 tests from 2 test suites. [----------] Global test environment set-up. [----------] 3 tests from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x57fff9 Has X86 0x8 Has SSE2 0x10 Has SSSE3 0x20 Has SSE41 0x40 Has SSE42 0x80 Has AVX 0x100 Has AVX2 0x200 Has ERMS 0x400 Has FMA3 0x800 Has F16C 0x1000 Has AVX512BW 0x2000 Has AVX512VL 0x4000 Has AVX512VNNI 0x8000 Has AVX512VBMI 0x10000 Has AVX512VBMI2 0x20000 Has AVX512VBITALG 0x40000 Has AVX10 0x0 HAS AVXVNNI 0x100000 Has AVXVNNIINT8 0x0 Has AMXINT8 0x400000 [ OK ] LibYUVBaseTest.TestCpuHas (34 ms) Bug: b/324356616 Change-Id: I5129b8946363a501bdd570e6dba3936c54aacd6c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5283433 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-02-15 21:44:47 +00:00
Hans Wennborg	2f2c04c157	Drop TARGET_IPHONE_SIMULATOR macro check Recent versions of Clang always define these TARGET_ macros (to 0 or 1 as appropriate) for Apple targets. https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5249072 made the code correctly check the value of the macro rather than whether it was defined or not. However, the code was still broken when actually targeting the iOS simulator (where the macro is now 1). It seems the use of this macro was just incorrect, and the code only worked since it was never defined at all. The original use of the macro in this file was added in `2c8108e6c2` but it 's not quite clear to me why. All other uses have subsequently been removed, e.g. in `6a1d01220a` this removes the last instance, and should fix the iOS simulator builds. Bug: chromium:1519899 Change-Id: Iaf44d2c37086f1153096044df5d9b61797f66a4f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5272224 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-02-06 17:38:45 +00:00
Hans Wennborg	d359a9f922	Correctly check the TARGET_IPHONE_SIMULATOR macro The macro may be defined to 0; the code needs to check the value, not just whether it's defined. Recent Clang versions will define all Apple "target OS" macros by default (see bug). Bug: chromium:1519899 Change-Id: I3d61f1b23de06d7db7db7916182a789f26345bce Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5249072 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-01-31 19:33:56 +00:00
Frank Barchard	3e435fe6d4	YUY2ToARGB use ymm6/7 for shuffle constants - 1 load and 2 shuffles from registers replaces 2 loads and 2 memory shuffles - vbroadcastf128 16 byte shuffler replaces 32 byte shufflers - bump version and apply clang-format libyuv_test '--gunit_filter=*.???2ToARGB_Opt' --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 AMD Zen2 I422ToARGB_Opt (272 ms) NV12ToARGB_Opt (255 ms) YUY2ToARGB_Opt (208 ms) Was YUY2ToARGB_Opt (214 ms) Change-Id: I1fa4d462d04536c877d1cab1a14586be8ed1b2f2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5218447 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2024-01-22 21:47:23 +00:00
Frank Barchard	914624f0b8	YUY2ToARGBMatrix and UYVYToARGBMatrix added to allow any color matrix Bug: libyuv:971 Change-Id: If15d4598d75500a3717f07d02c0c295fdc58254e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5214453 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-01-19 21:21:37 +00:00
Frank Barchard	5625f42424	I444ToI420 and I422ToI420 check U and V pointers and return -1 if NULL. - Add detect linux kernel version number in util/cpuid adbrun -- blaze-bin/third_party/libyuv/cpuid Kernel Version 4.14 Cpu Flags 0x7 Has ARM 0x2 Bug: libyuv:970 Change-Id: I655ed598db3655ca8448be08f1d71fbc328ced66 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5207990 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-01-18 21:56:11 +00:00
Frank Barchard	af6ac8265b	AVX10 cpuid detect added Replace unused popcount feature bit Bug: libyuv:911 Change-Id: Icd88fcc732751d39b0950d5f09a58bc9ac2c4e30 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5179911 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2024-01-10 00:08:22 +00:00
Hao Chen	ee53a66c5c	Fix compilation errors. Fix the narrowing conversion error from ‘long unsigned int’ to ‘long long int’ that occurs when using the new compiler on the LoongArch platform. Bug: libyuv:913 Change-Id: Ic535946a2453bc48840bab05355854670c52114f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5161066 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-01-03 19:15:56 +00:00
Bruce Lai	1dcbc30553	Add HAS_SCALEARGBROWDOWNEVEN_RVV marco and disable it by default HAS_SCALEARGBROWDOWNEVEN_RVV wasn't defined, so we cannot use ScaleARGBRowDownEven_RVV & ScaleARGBRowDownEvenBox_RVV. - Seperate to two conditional statements when selecting DownEven or DownEvenBox. - Also, add HAS_SCALEARGBROWDOWNEVEN_RVV and disable it by default. Bug: libyuv:965 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Change-Id: Ic7ec40520b64131a456c6f3eea0639b3620f11ae Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4882441 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-12-07 22:54:23 +00:00
Frank Barchard	def473f501	malloc return 1 for failures and assert for internal functions Bug: libyuv:968 Change-Id: Iea2f907061532d2e00347996124bc80d079a7bdc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5010874 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-12-04 22:55:20 +00:00
Wan-Teh Chang	fb6341d326	Change ScalePlane,ScalePlane_16,... to return int Change ScalePlane(), ScalePlane_16(), and ScalePlane_12() to return int so that they can report memory allocation failures (by returning 1). BUG=libyuv:968 Change-Id: Ie5c183ee42e3d595302671f9ecb7b3472dc8fdb5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5005031 Commit-Queue: Wan-Teh Chang <wtc@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-11-03 23:53:24 +00:00
Frank Barchard	31e1d6f896	Check allocations that return NULL and return early BUG=libyuv:968 Change-Id: I9e8594440a6035958511f9c50072820131331fc8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4977552 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-10-27 17:41:36 +00:00
Frank Barchard	331c361581	AVX-VNNI detect - Add kCpuHasAVXVNNI flag - Remove deprecated GFNI detect to make space. Meteor Lake has AVX-VNNI but not AVX512 ~/intelsde/sde -mtl -- blaze-bin/third_party/libyuv/libyuv_test --gunit_filter=CpuHas doyuv3 Note: Google Test filter = CpuHas [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x203ff1 Has X86 0x10 Has SSE2 0x20 Has SSSE3 0x40 Has SSE41 0x80 Has SSE42 0x100 Has AVX 0x200 Has AVX2 0x400 Has ERMS 0x800 Has FMA3 0x1000 Has F16C 0x2000 Has AVX512BW 0x0 Has AVX512VL 0x0 Has AVX512VNNI 0x0 Has AVX512VBMI 0x0 Has AVX512VBMI2 0x0 Has AVX512VBITALG 0x0 Has AVX512VPOPCNTDQ 0x0 HAS AVXVNNI 0x200000 Has AVXVNNIINT8 0x0 AVX-VNNI detect - Add kCpuHasAVXVNNI flag - Remove deprecated GFNI detect to make space. https://bugs.chromium.org/p/libyuv/issues/detail?id=967 Meteor Lake has AVX-VNNI but not AVX512 ~/intelsde/sde -mtl -- blaze-bin/third_party/libyuv/libyuv_test --gunit_filter=CpuHas doyuv3 Note: Google Test filter = CpuHas [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x203ff1 Has X86 0x10 Has SSE2 0x20 Has SSSE3 0x40 Has SSE41 0x80 Has SSE42 0x100 Has AVX 0x200 Has AVX2 0x400 Has ERMS 0x800 Has FMA3 0x1000 Has F16C 0x2000 Has AVX512BW 0x0 Has AVX512VL 0x0 Has AVX512VNNI 0x0 Has AVX512VBMI 0x0 Has AVX512VBMI2 0x0 Has AVX512VBITALG 0x0 Has AVX512VPOPCNTDQ 0x0 HAS AVXVNNI 0x200000 Has AVXVNNIINT8 0x0 Running on all cpus the following report avx-vnni grep 'AVXVNNI 0x2' / adl/libyuv64.txt:HAS AVXVNNI 0x200000 gnr/libyuv64.txt:HAS AVXVNNI 0x200000 grr/libyuv64.txt:HAS AVXVNNI 0x200000 mtl/libyuv64.txt:HAS AVXVNNI 0x200000 rpl/libyuv64.txt:HAS AVXVNNI 0x200000 spr/libyuv64.txt:HAS AVXVNNI 0x200000 srf/libyuv64.txt:HAS AVXVNNI 0x200000 while these support avx512 vnni grep 'VNNI 0x1' / clx/libyuv64.txt:Has AVX512VNNI 0x10000 cpx/libyuv64.txt:Has AVX512VNNI 0x10000 gnr/libyuv64.txt:Has AVX512VNNI 0x10000 icl/libyuv64.txt:Has AVX512VNNI 0x10000 icx/libyuv64.txt:Has AVX512VNNI 0x10000 spr/libyuv64.txt:Has AVX512VNNI 0x10000 tgl/libyuv64.txt:Has AVX512VNNI 0x10000 and these support avx-vnni-int8 grep AVXVNNIINT8.0x4 / grr/libyuv64.txt:Has AVXVNNIINT8 0x400000 srf/libyuv64.txt:Has AVXVNNIINT8 0x400000 Bug: libyuv:967 Change-Id: I84cd71d1b320e7c284173eb695fc1d3b72d14ddb Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4912017 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2023-10-05 21:24:09 +00:00
Frank Barchard	709d60e6ee	VNNI-INT8 detect - Add kCpuHasAVXVNNIINT8 flag - Move mips flags up a bit to make space. ~/intelsde/sde -srf -- blaze-bin/third_party/libyuv/libyuv_test --gunit_filter=CpuHas Note: Google Test filter = CpuHas [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from LibYUVBaseTest [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x403ff1 Has X86 0x10 Has SSE2 0x20 Has SSSE3 0x40 Has SSE41 0x80 Has SSE42 0x100 Has AVX 0x200 Has AVX2 0x400 Has ERMS 0x800 Has FMA3 0x1000 Has F16C 0x2000 Has AVX512BW 0x0 Has AVX512VL 0x0 Has AVX512VNNI 0x0 Has AVX512VBMI 0x0 Has AVX512VBMI2 0x0 Has AVX512VBITALG 0x0 Has AVX512VPOPCNTDQ 0x0 Has AVXVNNIINT8 0x400000 Has GFNI 0x0 [ OK ] LibYUVBaseTest.TestCpuHas (32 ms) INT8 supported on srf and grr -srf Set chip-check and CPUID for Intel(R) Sierra Forest CPU -grr Set chip-check and CPUID for Intel(R) Grand Ridge CPU Bug: b/303434603 Change-Id: I628007929ff0518b2b36e1469b4d9aed71a9fa8f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4912015 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-10-04 16:31:36 +00:00
Yannis Guyon	a3b9c36eb9	Fix unused arg errors in ScalePlane*() in Release src_width parameter is used for assertions and unused with NDEBUG. Fix the warning treated as an error when -Wall -Wextra -Werror is used to build that part of the code. BUG=libyuv:967 Change-Id: I4c02ab013e8e2684b3bed5ce9693e1493d7751b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4905033 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-10-03 15:19:25 +00:00
Bruce Lai	ec2e9ca000	[RVV] Support AR64ToAB64 and RGBA-family color conversions Add scalar code for AR64ToAB64, ARGBToRGBA, ARGBToBGRA, ARGBToABGR, RGBAToARGB, BGRAToARGB, and ABGRToARGB. They are originally implemented by ARGBShffle. This CL independetly implements them, and only enables for risc-v now. This CL also add RVV implementation for `RGBA-family <-> RGBA-family` color conversions. * Run on SiFive internal FPGA(VLEN=128): Test Case Speedup AR64ToAB64_Opt x4.6 ARGBToRGBA_Opt x6 ARGBToBGRA_Opt x6 ARGBToABGR_Opt x6 RGBAToARGB_Opt x6 Change-Id: Ie0630901046084aa259699fcdeccc64170d7103f Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4797451 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-09-05 22:44:48 +00:00
Frank Barchard	696e619571	RVV check __riscv_v_intrinsic version Bug: libyuv:965 Change-Id: I9b02abd13ab3345288655fa7a16383f59cf66bb8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4750230 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-08-04 18:39:27 +00:00
Wan-Teh Chang	a8a37a25c9	Eliminate a common subexpression in YPixel() Save the value of a common subexpression in a local variable. Change-Id: I5724fcf341900cb2a65eb37b505194b8d3c3da9a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4735651 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-07-31 20:53:54 +00:00
Bruce Lai	c60ac4025c	[RVV] Enable ScaleRowDown38_RVV & ScaleRowDown38_{2,3}_Box_RVV * Run on SiFive internal FPGA: Test Case Speedup I420ScaleDownBy3by8_None 4.2 I420ScaleDownBy3by8_Linear 1.7 I420ScaleDownBy3by8_Bilinear 1.7 I420ScaleDownBy3by8_Box 1.7 I444ScaleDownBy3by8_None 4.2 I444ScaleDownBy3by8_Linear 1.8 I444ScaleDownBy3by8_Bilinear 1.8 I444ScaleDownBy3by8_Box 1.8 Change-Id: Ic2e98de2494d9e7b25f5db115a7f21c618eaefed Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4711857 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-07-27 02:59:47 +00:00
Darren Hsieh	10de943a12	[RVV] Enable ScaleRowUp2_(Bi)linear_RVV/ScaleUVRowUp2_(Bi)linear_RVV ScaleUVRowUp2_(Bi)linear_RVV function is equal to other platforms' ScaleRowUp2_(Bi)linear_Any_XXX. We process entire row in this function. Other platforms only implement non-edge part of image and process edge with scalar. ScaleRowUp2_(Bi)linear_Any_XXX: Combine ScaleRowUp2_(Bi)linear_XXX(non-edge) + ScaleRowUp2_(Bi)linear_C(edge) by SBUH2LANY/SU2BLANY. * Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleFrom640x360_Bilinear ScaleRowUp2_Bilinear_RVV 8.21 I444ScaleFrom640x360_Linear ScaleRowUp2_Linear_RVV 8.08 UVScaleFrom640x360_Bilinear ScaleUVRowUp2_Bilinear_RVV 7.80 UVScaleFrom640x360_Linear ScaleUVRowUp2_Linear_RVV 7.03 Change-Id: I539245ce51858f077506a78f0e7e82377ac6a95d Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4666062 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-07-26 18:05:50 +00:00
Bruce Lai	d33edd2373	[RVV] Enable ARGBBlendRow_RVV/BlendPlaneRow_RVV * Run on SiFive internal FPGA: Test case Speedup ARGBBlend_Opt 4.60 BlendPlane_Opt 5.96 I420Blend_Opt 5.83 - Also, add code to use ScaleRowDown2Box_RVV in I420Blend Change-Id: Icc75e05d26b3427a98269d2a33c4474074033264 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4681100 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-07-25 16:38:55 +00:00
Darren Hsieh	aed6dbef17	[RVV] Enable NV{12,21}To{ARGB,RGB24}Row_RVV * Run on SiFive internal FPGA(w/ -march=rv64gcv): Test Case Speedup NV12ToARGB_Opt 12.0 NV21ToARGB_Opt 12.1 NV12ToABGR_Opt 12.6 NV21ToABGR_Opt 12.0 NV12ToRGB24_Opt 12.5 NV21ToRGB24_Opt 11.7 NV12ToRAW_Opt 12.1 NV21ToRAW_Opt 11.4 Change-Id: Icae2bac2b4ebbd4c5a89e847fde9a74fe6481878 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4707804 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-07-24 17:07:01 +00:00
Frank Barchard	650be7496f	Fix warnings for missing prototypes - Add static to internal scale and rotate functions - Remove unittest that tested an internal scale function - Remove unused private functions - Include missing scale_argb.h header - Bump version and apply clang format Bug: libyuv:830 Change-Id: I45bab0423b86334f9707f935aedd0c6efc442dd4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4658956 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-06-30 17:46:56 +00:00
Frank Barchard	a34a0ba687	ARGBExtractAlpha rename variables to match format Bug: libyuv:956 Change-Id: I31070791754fc69b72c6dcc61be2e038d2676ed9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4646636 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2023-06-27 03:50:35 +00:00
Bruce Lai	873d0db989	[RVV] Fix TestARGBInterpolate test fail Root cause: Because InterpolateRow_RVV doesn't setup rounding mode to round-to-nearest-up when y1_fraction == 128. The rounding mode register is set to round-down in ARGBAttenuateRow_RVV. It cause InterpolateRow_RVV(y1_fraction == 128) runs on round-down mode. Running on round-down mode make output result differs from round-to-nearest-up mode. Solved by: ensure to use correct rounding mode in InterpolateRow_RVV. Also, removing unnecessary rounding mode setup in ARGBAttenuateRow_RVV. Bug: libyuv:956 Change-Id: Ib5265d42bad76b036e42b8f91ee42a9afe1f768d Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4624492 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-19 16:49:52 +00:00
Bruce Lai	4472b5b849	[RVV] Update ARGBAttenuateRow_RVV implementation Bug: libyuv:956 Change-Id: Ib539c2196767e88fa6e419ed2f22d95b6deaf406 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4623172 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-17 15:50:34 +00:00
Bruce Lai	7939e039e7	[RVV] Fix compile warning in row_rvv 1. Fix compile warning in row_rvv.cc 2. Avoid compile row_rvv.cc/scale_rvv.cc when using GCC There is no RVV segment load & store on GCC. Hence, avoid compiling rvv code on GCC temporarily. 3. Add several compile options to cmake build flow -Wno-sign-compare -Wno-unused-function -Wunused-variable -Wuninitialized Bug: libyuv:956 Change-Id: I9577f98190fc9b28fb6fde65d82d0c67ce54f9ee Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4615441 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-17 15:41:45 +00:00
Frank Barchard	a366ad714a	ARGBAttenuate use (a + b + 255) >> 8 - Makes ARM and Intel match and fixes some off by 1 cases - Add ARGBToUV444MatrixRow_NEON - Add ConvertFP16ToFP32Column_NEON - scale_rvv fix intinsic build error - disable row_win version of ARGBAttenuate/Unattenuate Bug: libyuv:936, libyuv:956 Change-Id: Ied99aaad3a11a8eb69212b628c58f86ec0723c38 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4617013 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-16 21:37:53 +00:00
Bruce Lai	04821d1e7d	[RVV] Enable ARGBExtractAlphaRow/ARGBCopyYToAlphaRow * Run on SiFive internal FPGA: TestARGBExtractAlpha(~3.2x vs scalar) TestARGBCopyYToAlpha(~1.6x vs scalar) Change-Id: I36525c67e8ac3f71ea9d1a58c7dc15a4009d9da1 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4617955 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-15 23:45:24 +00:00
Darren Hsieh	552571e8b2	[RVV] Enable ScaleRowDown34_RVV & ScaleRowDown34_{0,1}_Box_RVV Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleDownBy3by4_None ScaleRowDown34_RVV 5.8 I444ScaleDownBy3by4_Linear ScaleRowDown34_0/1_Box_RVV 6.5 I444ScaleDownBy3by4_Bilinear ScaleRowDown34_0/1_Box_RVV 6.3 Bug: libyuv:956 Change-Id: I8ef221ab14d631e14f1ba1aaa25d2b30d4e710db Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4607777 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-14 00:57:00 +00:00
Frank Barchard	2a5d7e2fbc	FilterRows_NEON - remove unused function - same as InterpolateRow_NEON - Bump version to 1872 - Add scale_rvv to build files Bug: libyuv:956 Change-Id: Ib9e9fd840a0774bd35bcdcca55a2596f33272383 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4608519 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-13 15:20:02 +00:00
Darren Hsieh	873eaa3bbf	[RVV] Enable Scale{ARGB,UV}RowDown{2,4,EVEN}_RVV Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleDownBy3_Box ScaleAddRow_RVV+ScaleAddCols(scalar) 2.8 ARGBScaleDownBy2_None ScaleARGBRowDown2_RVV 2.2 ARGBScaleDownBy2_Linear ScaleARGBRowDown2Linear_RVV 5.0 ARGBScaleDownBy2_Box ScaleARGBRowDown2Box_RVV 4.3 ARGBScaleDownBy4_None ScaleARGBRowDownEven_RVV 1.2 ARGBScaleDownBy8_Box ScaleARGBRowDownEvenBox_RVV 3.2 ARGBScaleDownBy4_Box ScaleARGBRowDown2Box_RVV 4.5 I444ScaleDownBy2_None ScaleRowDown2_RVV 5.8 I444ScaleDownBy2_Linear ScaleRowDown2Linear_RVV 6.1 I444ScaleDownBy2_Box ScaleRowDown2Box_RVV 5.0 I444ScaleDownBy4_None ScaleRowDown4_RVV 3.6 I444ScaleDownBy4_Box ScaleRowDown4Box_RVV 3.5 UVScaleDownBy2_None ScaleUVRowDown2_RVV 5.8 UVScaleDownBy2_Linear ScaleUVRowDown2Linear_RVV 5.6 UVScaleDownBy2_Box ScaleUVRowDown2Box_RVV 4.1 UVScaleDownBy4_None ScaleUVRowDown4_RVV 1.7 UVScaleDownBy4_Box ScaleUVRowDown2Box_RVV 4.5 avg-speedup: 4 Note: Specialize ScaleUVRowDown with step_size=4 by ScaleUVRowDown4_RVV. Bug: libyuv:956 Change-Id: If9604a6aadf681193f282507602c57c726332202 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4601684 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-13 00:40:39 +00:00
Frank Barchard	b08ccb6a83	FP16 to FP32 float conversion row function Bug: None Change-Id: I97aab6aafd41c3bf36bfbf33fdcc424e5b3fd6e3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4590225 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2023-06-07 00:02:40 +00:00
Frank Barchard	157b153b60	Fix tidy warning that uint32_t dither4 should not be const - Remove const from uint32_t dither4 parameter to fix clang-tidy warning - Apply clang format - Bump version - Remove unused MMI source; superceded by MSA Bug: None Change-Id: Id49991db25bca4e99590b415312542d917471c62 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4581882 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-02 00:42:02 +00:00
Vignesh Venkatasubramanian	c0f64c14ca	Add I412/I212 to I420 functions They re-use the same method as I410/I210 to I420 with a depth value of 12 instead of 10. Bug: b/268505204 Change-Id: I299862b4556461d8c95f0fc1dcd5260e1c1f25cd Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4581867 Commit-Queue: Vignesh Venkatasubramanian <vigneshv@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-06-01 19:50:16 +00:00
Bruce Lai	4b6373d189	[RVV] Use LMUL=2 for I4{44,22}To{ARGB,RGB24,RGBA} conversion Replace vv+m1(LMUL=1) with vx+m2(LMUL=2). Some kernels' asm code might contain register spill(1~2). Change-Id: Ie3655f250d17f37c1ba9039474ece43ede98ede0 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4573159 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-30 09:42:10 +00:00
Darren Hsieh	d14bd701c8	[RVV] Enable CopyRow_RVV, InterpolateRow_RVV, {Merge,Split}UVRow_RVV * Run on SiFive internal FPGA: MergeUVPlane_Opt(~6x vs scalar) SplitUVPlane_Opt(~6x vs scalar) TestCopyPlane(~8x vs scalar) ARGBInterpolate0_Opt(~10x vs scalar) ARGBInterpolate64_Opt(~9x vs scalar) ARGBInterpolate168_Opt(~9x vs scalar) ARGBInterpolate192_Opt(~8.5x vs scalar) ARGBInterpolate255_Opt(~8x vs scalar) Bug: libyuv:956 Change-Id: I8372341865f75f42e30371ef943d5c2e4be7b79a Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4574186 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-30 09:10:35 +00:00
Frank Barchard	78d168054b	Remove extraneous quote from clobber list Bug: None Change-Id: Ie20574d0f9c8c2f074247405b294b49c3406448d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4568770 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-05-30 09:03:05 +00:00
Justin Green	0e111d2c58	Wrap neon registers in {} for the neon MT2T unpack implementation. Some compilers throw a syntax error otherwise. Change-Id: Ic169dcfe4d9bb9bf6d0dcae977d6cf510a7a60bf Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4568904 Commit-Queue: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-26 17:12:02 +00:00
Frank Barchard	22c7a51452	Fix SplitRGB clobber list to include all registers used Bug: None Change-Id: Icac4becb0537903ab87495fb0e2a2b750e1eca4f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4563355 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: David Gao <davidgao@google.com>	2023-05-24 21:44:59 +00:00
Wan-Teh Chang	dcbe082070	Save boxwidth - minboxwidth in a local variable Avoid repetitions of the expression boxwidth - minboxwidth. Change-Id: Ib53fb6b06a926b80ff9a64cc5d499aeef0894c99 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408062 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-22 19:10:13 +00:00
Bruce Lai	de3e7fd147	Manually remove rounding value inside yb(yuvconstant) in row_rvv.cc After libyuv:961 is completed, yb(yuvconstant) will no longer contain rounding bias +32 for fixed-point. This CL removes rounding bias(-32) manmually in row_rvv.cc. Hence, all fixed-point related codes' rounding mode is changed to round-to-nearest-up "0" in row_rvv.cc. Also, replace vwmul+vnsrl w/ vmulh in I400ToARGBRow_RVV. Bug: libyuv:956, libyuv:961 Change-Id: I10e34668a2332e38393e9d68414f07aafb6c7cf7 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4550591 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-22 18:15:27 +00:00
Wan-Teh Chang	179b0203e5	Enable {J400/I400}ToARGBRow_RVV Run on SiFive internal FPGA*: I400ToARGB_Opt (~8x vs scalar) J400ToARGB_Opt (~10x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Bug: libyuv:956, libyuv:961 Change-Id: If4e21ec85c4ff79083ec16a6faae0e457129a8de Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4544972 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-05-20 23:29:33 +00:00
Lu Wang	8670bcf17f	Optimize the following 19 functions with LSX in row_lsx.cc. UYVYToYRow_LSX, UYVYToUVRow_LSX, UYVYToUV422Row_LSX, ARGBToUVRow_LSX, ARGBToRGB24Row_LSX, ARGBToRAWRow_LSX, ARGBToRGB565Row_LSX, ARGBToARGB1555Row_LSX, ARGBToARGB4444Row_LSX, ARGBToUV444Row_LSX, ARGBMultiplyRow_LSX, ARGBAddRow_LSX, ARGBSubtractRow_LSX, ARGBAttenuateRow_LSX, ARGBToRGB565DitherRow_LSX, ARGBShuffleRow_LSX, ARGBShadeRow_LSX, ARGBGrayRow_LSX, ARGBSepiaRow_LSX Bug: libyuv:913 Change-Id: I02c0c9d68b229c4a66c96837e9b928c2f5dda1f3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4546814 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-19 18:55:58 +00:00
Frank Barchard	a37799344d	ARGBToI420Alpha function to convert ARGB to I420 with Alpha Bug: b/281866362 Change-Id: Ic1093a887fb483f134c78909cf1ee7495e7345ba Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4534100 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2023-05-17 00:23:24 +00:00
Bruce Lai	11d4536002	Enable I{422,444}AlphaToARGBRow_RVV & ARGBAttentuateRow_RVV Run on SiFive internal FPGA: I444AlphaToARGB_Opt (~16x vs scalar) I422AlphaToARGB_Opt (~10x vs scalar) ARGBAttenuate_Opt (~3x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: I0046eb7af8104bc8e13cee1cb91a19f90940d5b0 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4535657 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-16 19:20:49 +00:00
Frank Barchard	6a68b18a96	Bump version and apply clang format Bug: libyuv:956 Change-Id: I2375a02583789af2a5f13f8dba6c663d5975aaa9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4522352 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-11 11:27:28 +00:00
Bruce Lai	59eae49f17	Enable ARGBToYMatrixRow_RVV/RGBAToYMatrixRow_RVV/RGBToYMatrixRow_RVV Run on SiFive internal FPGA: ARGBToJ400_Opt (~6x vs scalar) RGBAToJ400_Opt (~6x vs scalar) RGB24ToJ400_Opt (~5.5x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: Ia3ce8cea7962fbd8618cc23e850a7913c9cabf4f Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4521783 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-11 10:17:51 +00:00
Darren Hsieh	497ea35688	Enable I444To{ARGB,RGB24}Row_RVV Run on SiFive internal FPGA: I444ToARGB_Opt (~16x vs scalar) I444ToRGB24_Opt (~10x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: Idae7dc46ef648beaa14b58ba3eb56b67b17c9b3b Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4520761 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-10 19:50:56 +00:00
Darren Hsieh	964d963afb	Enable I422To{ARGB,RGBA,RGB24}Row_RVV Run on SiFive internal FPGA: I422ToARGB_Opt (~10x vs scalar) I422ToRGBA_Opt (~10x vs scalar) I420ToRGB24_Opt (~8x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 This CL manually sets rounding mode, since we use fixed-point vector narrowing clip. There is no definition about default value for fixed-point rounding mode. https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#38-vector-fixed-point-rounding-mode-register-vxrm The behavior could be different on differet paltforms. To avoid unexpected behavior, we set rounding mode manually. Change-Id: I90f0dcb90c37f7da7caab8eb1df6c9c7a3c874a8 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4512373 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-10 00:29:20 +00:00
Lu Wang	1d940cc570	Optimize the following functions with LSX. MirrorRow_LSX, MirrorUVRow_LSX, ARGBMirrorRow_LSX, I422ToYUY2Row_LSX, I422ToUYVYRow_LSX, I422ToARGBRow_LSX, I422ToRGBARow_LSX, I422AlphaToARGBRow_LSX, I422ToRGB24Row_LSX, I422ToRGB565Row_LSX, I422ToARGB4444Row_LSX, I422ToARGB1555Row_LSX, YUY2ToYRow_LSX, YUY2ToUVRow_LSX, YUY2ToUV422Row_LSX Bug: libyuv:913 Change-Id: I46cec605001d7ddd73846eed6d0a77f936b6dc53 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4515191 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-10 00:25:48 +00:00
James Zern	b372510c56	row_win.cc: fix ARM64EC build include intrin.h rather than emmintrin.h; fixes: C:\...\VC\Tools\MSVC\14.35.32215\include\emmintrin.h(28,1): fatal error C1189: #error: this header should only be included through Change-Id: Ief9c81f6f1971e552c8aac301d678b64fe5bd7cc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4513825 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-09 19:56:35 +00:00
shaodiwei	4c209d264d	MergeUVRow_AVX2 implementation is consistent in row_win.cc and row_gcc.cc，the commit can fix memory is wrote out of bounds Change-Id: I4b771a46fc853effc4c0fa3ae8032322a8369dc9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4514810 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-09 18:54:25 +00:00
Bruce Lai	f4bd840794	Fix compile error for riscv scalar & simplify cmake cross build flow 1. Fix compile error when build riscv without using vector 2. Fix run_qemu.sh misused v=true for USE_RVV=OFF case 3. [cmake] Fix warning by rename TEST to UNIT_TEST Warning log: CMake Warning (dev) at CMakeLists.txt:57 (if): [54/1931] Policy CMP0064 is not set: Support new TEST if() operator. Run "cmake --help-policy CMP0064" for policy details. Use the cmake_policy command to set the policy and suppress this warning. TEST will be interpreted as an operator when the policy is set to NEW. Since the policy is not set the OLD behavior will be used. This warning is for project developers. Use -Wno-dev to suppress it. 4. [cmake] Simplify logic for cross-build Bug: libyuv:956 Change-Id: I120402fc7d6d86403e7d974180b81f4f9c663e36 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4486239 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-04 18:09:00 +00:00
Bruce Lai	8811ad8ba1	Fix TestLinuxRVV test fail Fail log: [ RUN ] LibYUVBaseTest.TestLinuxRVV Note: testing to load "../../unit_test/testdata/riscv64.txt" /scratch/brucel/libyuv/src/unit_test/cpu_test.cc:290: Failure Expected equality of these values: kCpuHasRVV \| kCpuHasRVVZVFH Which is: 1610612736 RiscvCpuCaps("../../unit_test/testdata/riscv64_rvv_zvfh.txt") Which is: 536870912 [ FAILED ] LibYUVBaseTest.TestLinuxRVV (17 ms) Reason: The root cause is "\n" may be contained in the ext variable. The last of extension substring contains "\n". For instance, test case riscv64_rvv_zvfh.txt, the last substring is "zvfh\n" instead of "zvfh". Solved this failure by removing "\n" which is at the end of line. NOTE: We avoid using strstr() to solve the problem here. Becasue using strstr() will violate the parsing rule, if future extension contains "zvfh"(e.g zvfhxxx). Log after modification: [ RUN ] LibYUVBaseTest.TestLinuxRVV Note: testing to load "../../unit_test/testdata/riscv64.txt" [ OK ] LibYUVBaseTest.TestLinuxRVV (38 ms) Change-Id: I7b7db98dbc5388cbc148423da6892b8f0be64599 Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4498101 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-04 03:26:25 +00:00
Darren Hsieh	1b3c4c12d4	Add Split/Merge RGB/ARGB/XRGB Row_RVV * Run on SiFive internal FPGA: SplitRGBPlane_Opt (~6.87x vs scalar) SplitARGBPlane_Opt (~10.77x vs scalar) SplitXRGBPlane_Opt (~18.69x vs scalar) MergeRGBPlane_Opt (~3.63x vs scalar) MergeARGBPlane_Opt (~3.50x vs scalar) MergeXRGBPlane_Opt (~2.90x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 - include a fix to avoid implict conversion warning between size_t & int. Bug: libyuv:956 Change-Id: Icd79b282b04ea3981e7fd4e6d547da6708d82516 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4443411 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-04-28 18:34:46 +00:00
Frank Barchard	7c6a7e5737	cpuid for arm/mips/riscv initialize buffer - change cpu printf to hex to better show flags util/cpuid: Cpu Flags 0x30000001 Has RISCV 0x10000000 Has RVV 0x20000000 [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 0x30000001 Has RISCV 0x10000000 Has RVV 0x20000000 Has RVVZVFH 0x0 [ OK ] LibYUVBaseTest.TestCpuHas (1 ms) [ RUN ] LibYUVBaseTest.TestCompilerMacros __ATOMIC_RELAXED 0 __cplusplus 201703 __clang_major__ 9999 __clang_minor__ 0 __GNUC__ 4 __GNUC_MINOR__ 2 __riscv 1 __riscv_vector 1 __clang__ 1 __llvm__ 1 __pic__ 2 INT_TYPES_DEFINED __has_feature Bug: libyuv:956 Change-Id: Iee4f1f34799434390e756de1e6c2c4596d82ace5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4484957 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-27 22:46:27 +00:00
Frank Barchard	cf21b5ea5c	Rename variables to match layout of ABGR Bug: None Change-Id: Ia1d596b6e108307fe042a03c34162b25152293d4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4461967 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-26 16:57:33 +00:00
Bruce Lai	1330a79e9f	Optimized AR64/AB64 <-> ARGB with RVV * Run on SiFive internal FPGA: ARGBToAR64_Opt (~13.7x vs scalar) ARGBToAB64_Opt (~5.81x vs scalar) AR64ToARGB_Opt (~15.8x vs scalar) AB64ToARGB_Opt (~2.40x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Bug: libyuv:956 Change-Id: Ida642a5077f59d25fb7c5328f671956b2293dadd Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4442913 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-20 19:49:55 +00:00
Frank Barchard	c994782086	Enable RVV if qemu is detected - include a fix for jpeg unittests to do at least 1 iteration - include a fix for scale uv to only use linearup2 if filter is linear Tested on qemu with Intel host: [ RUN ] LibYUVBaseTest.TestCpuHas Cpu Flags 805306369 Has RISCV 268435456 Has RVV 536870912 Has RVVZVFH 0 Has X86 0 Bug: libyuv:956, libyuv:959, libyuv:960 Change-Id: I4a1b66f83d82ba127780f52526153d586db90111 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4429570 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Randall Bosetti <rlb@google.com>	2023-04-18 20:29:04 +00:00
Darren Hsieh	44396e6e9a	Add ARGBToRAWRow_RVV, ARGBToRGB24Row_RVV, RGB24ToARGBRow_RVV * Run on SiFive internal FPGA: ARGBToRAW_Opt (~1.55x vs scalar) ARGBToRGB24_Opt (~1.44x vs scalar) RGB24ToARGB_Opt (~1.77x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Bug: libyuv:956 Change-Id: I26722f6848cd68684d95d9a7ee06ce0416e7985d Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4413083 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-13 19:33:16 +00:00
Frank Barchard	68659d0d68	UVScale down by 2 fix for C and optimize for NEON - update cpu_id to use "re" for fopen to avoid leaking handles if a thread is started while the file is open. Bug: libyuv:958 Change-Id: I1af9de68fce12e440e1226fc8070634ccb1bf090 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4417176 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-12 22:49:20 +00:00
Frank Barchard	ee3e71c7ce	Any functions use memset(vin, 0, sizeof(vin)) for GCC warning fix - Fix -Wmemset-elt-size warning for GCC - Use vin for inputs and vout for outputs Bug: None Change-Id: Iefd418dc884b4d062e1fdd9215319c8838c49eaa Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4412065 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2023-04-10 20:44:10 +00:00
Darren Hsieh	724e7aee03	Fix macro define typo in scale_uv.cc The correct define can be found in scale_row.h Change-Id: I633ed47006c7bd8014038493005c2d934489ff18 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4411353 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-10 16:55:48 +00:00
James Zern	0200037a5a	row_any,ANYDETILE: fix -Wmemset-elt-size warning under gcc 12.2.0 using -Wall: source/row_any.cc: In function ‘void libyuv::DetileRow_16_Any_SSE2(const uint16_t, ptrdiff_t, uint16_t, int)’: source/row_any.cc:2287:11: warning: ‘memset’ used with length equal to number of elements without multiplication by element size [-Wmemset-elt-size] 2287 \| memset(temp, 0, 16 * BPP); /* for msan */ \| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~ source/row_any.cc:2308:1: note: in expansion of macro ‘ANYDETILE’ 2308 \| ANYDETILE(DetileRow_16_Any_SSE2, DetileRow_16_SSE2, uint16_t, 2, 15) This increases the memset to the full buffer size, which may not be strictly necessary. Change-Id: Iea2fc649990ee84ea9aa8020d6f6b25e012b18fb Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4406599 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-04-08 19:01:02 +00:00
Darren Hsieh	e8af6cb2e4	Add RAWToARGBRow_RVV,RAWToRGBARow_RVV,RAWToRGB24Row_RVV * Run on SiFive internal FPGA: RAWToARGB_Opt (~2x vs scalar) RAWToRGBA_Opt (~2x vs scalar) RAWToRGB24_Opt (~1.5x vs scalar) LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10 Change-Id: I21a13d646589ea2aa3822cb9225f5191068c285b Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408357 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-04-07 18:45:08 +00:00
Darren Hsieh	aa47d668d8	Add riscv cpu info detection. * Supports: * The standard single-letter Vector detection. * Vector fp16 detection. Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Change-Id: Ia7ee1bd8ec1a990f1b2b1700805942e99c0aa87b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4401738 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-04-06 15:58:29 +00:00
Wan-Teh Chang	ec48e4328e	Add assertions for the Clang static analyzer The Clang static analyzer (scan-build) in LLVM 14 warns about array index out of bounds in scaletbl[boxwidth - minboxwidth] in ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two elements. It's not clear the index boxwidth - minboxwidth is either 0 or 1. Change-Id: I072476e86950154beffe6b1a89915755118b3cbd Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-04-05 21:58:22 +00:00
Frank Barchard	464c51a035	AArch32 YUVTORGB_SETUP use load and dup to avoid modifying pointer - Allows code to be optimized with clang 17 -flto-thin - Bump version number to 1864 to allow detection of fix - Apply clang format to standardize formatting; No impact on code generated Bug: chromium:1424089 Change-Id: Ib745836b27915a5e4cb1d7d928ee52659360612b Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370052 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2023-03-24 19:32:30 +00:00
Frank Barchard	1a971f8cc3	clang 17 -flto-thin bug fix for Neon YUVtoRGB and ARGBToRGB565Dither - YUV to RGB AArch32 kRGBCoeffBias rewind pointer - ARGBToRGB565Dither declare width and source pointers as modified Bug: chromium:1424089 Change-Id: I987180652331bab16ce27d8d166399a687ee890e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370099 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-03-24 10:59:40 +00:00
Frank Barchard	3f219a3501	GCC warning fix for MT2T - Fix redundent assignment compile warning in GCC - Apply clang-format - Bump version to 1863 Bug: libyuv:955 Change-Id: If2b6588cd5a7f068a1745fe7763e90caa7277101 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4344729 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-03-16 06:57:20 +00:00
Justin Green	76468711d5	M2T2 Unpack fixes Fix the algorithm for unpacking the lower 2 bits of M2T2 pixels. Bug: b:258474032 Change-Id: Iea1d63f26e3f127a70ead26bc04ea3d939e793e3 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4337978 Commit-Queue: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-03-14 14:59:26 +00:00
Frank Barchard	f9b23b9cc0	Transpose 4x4 for SSE2 and AVX2 Skylake Xeon AVX2 Transpose4x4_Opt (290 ms) SSE2 Transpose4x4_Opt (302 ms) C Transpose4x4_Opt (522 ms) AMD Zen2 AVX2 Transpose4x4_Opt (136 ms) SSE2 Transpose4x4_Opt (137 ms) C Transpose4x4_Opt (431 ms) Bug: None Change-Id: I4997dbd5c5387c22bfd6c5960b421504e4bc8a2a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4292946 Reviewed-by: Justin Green <greenjustin@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-03-03 17:46:23 +00:00
Frank Barchard	88b050f337	MergeUV AVX512BW use assembly - Convert MergeUVRow_AVX512BW to assembly - Enable MergeUVRow_AVX512BW for Windows with clangcl - MergeUVRow_AVX2 use vpmovzxbw and vpsllw - MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V AMD Zen 4 640x360 100000 iterations Was AVX512 MergeUVPlane_Opt (884 ms) AVX2 MergeUVPlane_Opt (945 ms) AVX2 MergeUVPlane_16_Opt (2167 ms) Now AVX512 MergeUVPlane_Opt (865 ms) AVX2 MergeUVPlane_Opt (943 ms) SSE2 MergeUVPlane_Opt (973 ms) AVX2 MergeUVPlane_16_Opt (2102 ms) Bug: None Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230 Reviewed-by: Fritz Koenig <frkoenig@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2023-02-22 21:19:08 +00:00
Frank Barchard	2bdc210be9	MergeUV_AVX512BW for I420ToNV12 On Skylake Xeon 640x360 100000 iterations AVX512 MergeUVPlane_Opt (1196 ms) AVX2 MergeUVPlane_Opt (1565 ms) SSE2 MergeUVPlane_Opt (1780 ms) Pixel 7 MergeUVPlane_Opt (1177 ms) Bug: None Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2023-02-13 20:14:57 +00:00
Sergio Garcia Murillo	b2528b0be9	Add support for odd width and height in I410ToI420 Bug: libyuv:950 Change-Id: Ic9a094463af875aefd927023f730b5f35f8551de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4154630 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-01-23 19:05:00 +00:00

1 2 3 4 5 ...

1830 Commits