libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2025-12-07 01:06:46 +08:00

Author	SHA1	Message	Date
George Steed	c2e7f8389a	[AArch64] Add SME implementations of InterpolateRow{,_16,_16To8} InterpolateRow_SME and InterpolateRow_16_SME need special cases to handle if source_y_fraction is 256 since this would overflow a byte and can just be a call to memcpy instead. InterpolateRow_16To8_SME is never called with a source_y_fraction value of 256 so there is no need for a special case here. Change-Id: I67805b5db2c411acb93ada626cf414b35620f467 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6074375 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-12-12 03:03:41 -08:00
runzezhang	192b8c2238	Add NV24 scaling support to libyuv Some projects require scaling support for the NV24 format, but libyuv currently lacks this functionality. This commit adds a scaling function for NV24, enabling its use in projects that require NV24 format processing. Change-Id: I6e6b2bea342e1df7f387056ab3bc5003da983bb7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6068715 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-12-12 02:46:11 -08:00
George Steed	85331e00cc	[AArch64] Add SME impls of ScaleRowDown2{,Linear,Box}_16 Mostly just straightforward copies of the Neon code ported to Streaming-SVE, these follow the same pattern as the prior ScaleRowDown2 SME kernels, but operating on 16-bit data rather than 8-bit. These is no benefit from this kernel when the SVE vector length is only 128 bits, so skip writing a non-streaming SVE implementation. Change-Id: I7bad0719d24cdb1760d1039c63c0e77726b28a54 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6070784 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-12-12 01:21:08 -08:00
George Steed	9a9752134e	[AArch64] Add Neon implementation of ScaleRowDown2Linear_16 Reduction in runtime observed relative to the auto-vectorized C implementation compiled with LLVM 19: Cortex-A55: -13.7% Cortex-A510: -49.0% Cortex-A520: -32.0% Cortex-A76: -34.3% Cortex-A710: -56.7% Cortex-A715: -45.4% Cortex-A720: -44.7% Cortex-X1: -70.6% Cortex-X2: -67.9% Cortex-X3: -72.2% Cortex-X4: -40.0% Cortex-X925: -24.1% Bug: b/42280942 Change-Id: I977899a2239e752400c9901f4d8482a76841269a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6040154 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-11-25 21:10:26 +00:00
George Steed	11c57f4f12	[AArch64] Add Neon implementation of ScaleRowDown2_16_NEON The auto-vectorized implementation unrolls to process 32 elements per iteration, so unroll the new Neon implementation to match and avoid a performance regression on little cores. Performance relative to the auto-vectorized C implementation compiled with LLVM 19: Cortex-A55: -35.8% Cortex-A510: -20.4% Cortex-A520: -22.1% Cortex-A76: -54.8% Cortex-A710: -44.5% Cortex-A715: -31.1% Cortex-A720: -31.4% Cortex-X1: -48.5% Cortex-X2: -47.8% Cortex-X3: -47.6% Cortex-X4: -51.1% Cortex-X925: -14.6% Bug: b/42280942 Change-Id: Ib4e89ba230d554f2717052e934ca0e8a109ccc42 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6040153 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-11-25 21:10:05 +00:00
George Steed	952d6a282f	[AArch64] Enable use of ScaleRowDown2Box_16_NEON The #ifdef surrounding the use of this kernel is never defined and ScaleRowDown2_16_NEON does not exist, so add the missing #define and remove the use of ScaleRowDown2_16_NEON for now. Additionally since there is no implementation of this kernel for 32-bit Arm, restrict the define to only be present on AArch64. Bug: b/42280942 Change-Id: Icc35c145c1bad1c0df2933a2d8bc7dcf7fe63cb7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/6040152 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-11-24 19:58:00 +00:00
George Steed	aec4b4e22e	[AArch64] Add SME implementation of ScaleRowDown2Box There is no benefit from an SVE version of this kernel for devices with an SVE vector length of 128-bits, so skip directly to SME instead. We do not use the ZA tile here, so this is a purely streaming-SVE (SSVE) implementation. Change-Id: I5021aeda30f4c5f1aa4cc6326c8d7886851d2c09 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913885 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-11-07 18:42:21 +00:00
George Steed	51d07554a0	[AArch64] Add SME implementation of ScaleRowDown2Linear There is no benefit from an SVE version of this kernel for devices with an SVE vector length of 128-bits, so skip directly to SME instead. We do not use the ZA tile here, so this is a purely streaming-SVE (SSVE) implementation. Change-Id: Ie6b91bd4407130ba2653838088e81e72e4460f68 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913884 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-10-30 17:57:15 +00:00
George Steed	593965cea2	[AArch64] Add SME implementation of ScaleRowDown2 Including associated changes for adding a new scale_sme.cc file. There is no benefit from an SVE version of this kernel for devices with an SVE vector length of 128-bits, so skip directly to SME instead. We do not use the ZA tile here, so this is a purely streaming-SVE (SSVE) implementation. Change-Id: I47d149613fbabd8c203605a809811f1a668e8fb7 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5913883 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2024-10-30 17:56:41 +00:00
Wan-Teh Chang	77f3acade4	ScalePlaneDown34: test dst_width%24 == 0 for armv7 In ScalePlaneDown34(), check if dst_width % 24 == 0 for armv7, and check if dst_width % 48 == 0 for aarch64. No-Try: True Bug: b/369963535, b/366045177 Change-Id: I7dc1227517c83c97a1d1052ef2230d5cec41da10 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5896492 Commit-Queue: Wan-Teh Chang <wtc@google.com> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2024-09-27 23:00:19 +00:00
George Steed	2d62d8d22a	[AArch64] Unroll ScaleRowDown4_NEON We can use wider load/store instructions here which is mostly an improvement across the board. Reduction in runtimes observed compared to the existing Neon implementation: Cortex-A55: +4.9% (!) Cortex-A510: -46.3% Cortex-A520: -49.0% Cortex-A76: -12.2% Cortex-A715: -15.5% Cortex-A720: -15.0% Cortex-X1: -12.4% Cortex-X2: -12.5% Cortex-X3: -12.3% Cortex-X4: +0.3% Bug: b/42280945 Change-Id: Id8af6499c63919924c2a954dfe7765b703ce4820 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5785970 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2024-09-16 04:30:04 +00:00
Frank Barchard	4620f17058	ScalePlane crash fix for 3/4 scaling - Scaling 48 pixels at a time, but calling code checked for 24 pixels - Added test for scaling to 1080x1920 libyuv_test --gunit_filter=LibYUVScaleTest.I420ScaleTo1080x1920_Box* --libyuv_width=1440 --libyuv_height=2560 Was libyuv_test --gunit_filter=LibYUVScaleTest.I420ScaleTo1080x1920_Box* --libyuv_width=1440 --libyuv_height=2560 [ RUN ] LibYUVScaleTest.I420ScaleTo1080x1920_Box Segmentation fault Traceback (most recent call last): Now [ RUN ] LibYUVScaleTest.I420ScaleTo1080x1920_Box filter 3 - 6741 us C - 3566 us OPT [ OK ] LibYUVScaleTest.I420ScaleTo1080x1920_Box (43 ms) Bug: b/366045177 Change-Id: I0ea6c2d6a32b2e7ca44cd030abc9f248115be44a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5857554 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2024-09-13 01:20:39 +00:00
Frank Barchard	def473f501	malloc return 1 for failures and assert for internal functions Bug: libyuv:968 Change-Id: Iea2f907061532d2e00347996124bc80d079a7bdc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5010874 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-12-04 22:55:20 +00:00
Wan-Teh Chang	fb6341d326	Change ScalePlane,ScalePlane_16,... to return int Change ScalePlane(), ScalePlane_16(), and ScalePlane_12() to return int so that they can report memory allocation failures (by returning 1). BUG=libyuv:968 Change-Id: Ie5c183ee42e3d595302671f9ecb7b3472dc8fdb5 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5005031 Commit-Queue: Wan-Teh Chang <wtc@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-11-03 23:53:24 +00:00
Frank Barchard	31e1d6f896	Check allocations that return NULL and return early BUG=libyuv:968 Change-Id: I9e8594440a6035958511f9c50072820131331fc8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4977552 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-10-27 17:41:36 +00:00
Yannis Guyon	a3b9c36eb9	Fix unused arg errors in ScalePlane*() in Release src_width parameter is used for assertions and unused with NDEBUG. Fix the warning treated as an error when -Wall -Wextra -Werror is used to build that part of the code. BUG=libyuv:967 Change-Id: I4c02ab013e8e2684b3bed5ce9693e1493d7751b9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4905033 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-10-03 15:19:25 +00:00
Bruce Lai	c60ac4025c	[RVV] Enable ScaleRowDown38_RVV & ScaleRowDown38_{2,3}_Box_RVV * Run on SiFive internal FPGA: Test Case Speedup I420ScaleDownBy3by8_None 4.2 I420ScaleDownBy3by8_Linear 1.7 I420ScaleDownBy3by8_Bilinear 1.7 I420ScaleDownBy3by8_Box 1.7 I444ScaleDownBy3by8_None 4.2 I444ScaleDownBy3by8_Linear 1.8 I444ScaleDownBy3by8_Bilinear 1.8 I444ScaleDownBy3by8_Box 1.8 Change-Id: Ic2e98de2494d9e7b25f5db115a7f21c618eaefed Signed-off-by: Bruce Lai <bruce.lai@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4711857 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-07-27 02:59:47 +00:00
Darren Hsieh	10de943a12	[RVV] Enable ScaleRowUp2_(Bi)linear_RVV/ScaleUVRowUp2_(Bi)linear_RVV ScaleUVRowUp2_(Bi)linear_RVV function is equal to other platforms' ScaleRowUp2_(Bi)linear_Any_XXX. We process entire row in this function. Other platforms only implement non-edge part of image and process edge with scalar. ScaleRowUp2_(Bi)linear_Any_XXX: Combine ScaleRowUp2_(Bi)linear_XXX(non-edge) + ScaleRowUp2_(Bi)linear_C(edge) by SBUH2LANY/SU2BLANY. * Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleFrom640x360_Bilinear ScaleRowUp2_Bilinear_RVV 8.21 I444ScaleFrom640x360_Linear ScaleRowUp2_Linear_RVV 8.08 UVScaleFrom640x360_Bilinear ScaleUVRowUp2_Bilinear_RVV 7.80 UVScaleFrom640x360_Linear ScaleUVRowUp2_Linear_RVV 7.03 Change-Id: I539245ce51858f077506a78f0e7e82377ac6a95d Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4666062 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-07-26 18:05:50 +00:00
Frank Barchard	650be7496f	Fix warnings for missing prototypes - Add static to internal scale and rotate functions - Remove unittest that tested an internal scale function - Remove unused private functions - Include missing scale_argb.h header - Bump version and apply clang format Bug: libyuv:830 Change-Id: I45bab0423b86334f9707f935aedd0c6efc442dd4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4658956 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2023-06-30 17:46:56 +00:00
Darren Hsieh	552571e8b2	[RVV] Enable ScaleRowDown34_RVV & ScaleRowDown34_{0,1}_Box_RVV Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleDownBy3by4_None ScaleRowDown34_RVV 5.8 I444ScaleDownBy3by4_Linear ScaleRowDown34_0/1_Box_RVV 6.5 I444ScaleDownBy3by4_Bilinear ScaleRowDown34_0/1_Box_RVV 6.3 Bug: libyuv:956 Change-Id: I8ef221ab14d631e14f1ba1aaa25d2b30d4e710db Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4607777 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-14 00:57:00 +00:00
Darren Hsieh	873eaa3bbf	[RVV] Enable Scale{ARGB,UV}RowDown{2,4,EVEN}_RVV Run on SiFive internal FPGA: Test case RVV function Speedup I444ScaleDownBy3_Box ScaleAddRow_RVV+ScaleAddCols(scalar) 2.8 ARGBScaleDownBy2_None ScaleARGBRowDown2_RVV 2.2 ARGBScaleDownBy2_Linear ScaleARGBRowDown2Linear_RVV 5.0 ARGBScaleDownBy2_Box ScaleARGBRowDown2Box_RVV 4.3 ARGBScaleDownBy4_None ScaleARGBRowDownEven_RVV 1.2 ARGBScaleDownBy8_Box ScaleARGBRowDownEvenBox_RVV 3.2 ARGBScaleDownBy4_Box ScaleARGBRowDown2Box_RVV 4.5 I444ScaleDownBy2_None ScaleRowDown2_RVV 5.8 I444ScaleDownBy2_Linear ScaleRowDown2Linear_RVV 6.1 I444ScaleDownBy2_Box ScaleRowDown2Box_RVV 5.0 I444ScaleDownBy4_None ScaleRowDown4_RVV 3.6 I444ScaleDownBy4_Box ScaleRowDown4Box_RVV 3.5 UVScaleDownBy2_None ScaleUVRowDown2_RVV 5.8 UVScaleDownBy2_Linear ScaleUVRowDown2Linear_RVV 5.6 UVScaleDownBy2_Box ScaleUVRowDown2Box_RVV 4.1 UVScaleDownBy4_None ScaleUVRowDown4_RVV 1.7 UVScaleDownBy4_Box ScaleUVRowDown2Box_RVV 4.5 avg-speedup: 4 Note: Specialize ScaleUVRowDown with step_size=4 by ScaleUVRowDown4_RVV. Bug: libyuv:956 Change-Id: If9604a6aadf681193f282507602c57c726332202 Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4601684 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-06-13 00:40:39 +00:00
Darren Hsieh	d14bd701c8	[RVV] Enable CopyRow_RVV, InterpolateRow_RVV, {Merge,Split}UVRow_RVV * Run on SiFive internal FPGA: MergeUVPlane_Opt(~6x vs scalar) SplitUVPlane_Opt(~6x vs scalar) TestCopyPlane(~8x vs scalar) ARGBInterpolate0_Opt(~10x vs scalar) ARGBInterpolate64_Opt(~9x vs scalar) ARGBInterpolate168_Opt(~9x vs scalar) ARGBInterpolate192_Opt(~8.5x vs scalar) ARGBInterpolate255_Opt(~8x vs scalar) Bug: libyuv:956 Change-Id: I8372341865f75f42e30371ef943d5c2e4be7b79a Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4574186 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-05-30 09:10:35 +00:00
Wan-Teh Chang	dcbe082070	Save boxwidth - minboxwidth in a local variable Avoid repetitions of the expression boxwidth - minboxwidth. Change-Id: Ib53fb6b06a926b80ff9a64cc5d499aeef0894c99 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408062 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-05-22 19:10:13 +00:00
Wan-Teh Chang	ec48e4328e	Add assertions for the Clang static analyzer The Clang static analyzer (scan-build) in LLVM 14 warns about array index out of bounds in scaletbl[boxwidth - minboxwidth] in ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two elements. It's not clear the index boxwidth - minboxwidth is either 0 or 1. Change-Id: I072476e86950154beffe6b1a89915755118b3cbd Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Wan-Teh Chang <wtc@google.com>	2023-04-05 21:58:22 +00:00
Frank Barchard	2bdc210be9	MergeUV_AVX512BW for I420ToNV12 On Skylake Xeon 640x360 100000 iterations AVX512 MergeUVPlane_Opt (1196 ms) AVX2 MergeUVPlane_Opt (1565 ms) SSE2 MergeUVPlane_Opt (1780 ms) Pixel 7 MergeUVPlane_Opt (1177 ms) Bug: None Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2023-02-13 20:14:57 +00:00
Sergio Garcia Murillo	b2528b0be9	Add support for odd width and height in I410ToI420 Bug: libyuv:950 Change-Id: Ic9a094463af875aefd927023f730b5f35f8551de Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4154630 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2023-01-23 19:05:00 +00:00
Frank Barchard	6e4b0acb4b	I422Rotate take stride for temporary buffers - Minor variable name changes first/last to top/bottom - Comments explaining rotate temporary buffers usage - Add asserts for scale parameter - Use NULL and stddef.h instead of 0 - Use void * for allocation in row.h - Add () around size parameter in macros Bug: libyuv:926, libyuv:949 Change-Id: Ib55417570926ccada0a0f8abd1753dc12e5b162e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4136762 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-04 23:11:52 +00:00
Sergio Garcia Murillo	22a579c438	Use ScalePlaneDown2_16To8 for avoiding the 2 step process Bug: libyuv:950 Change-Id: I5a77bca9a0230fe00abd810939e217833a14683f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4134524 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2023-01-03 21:41:21 +00:00
Frank Barchard	3abd6f36b6	Casting for scale functions - MT2T support for source strides added, but only works for positive values. - Reduced casting in row_common - one cast per assignment. - scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t Bug: libyuv:948, b/257266635, b/262468594 Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Ernest Hua <ernesthua@google.com>	2022-12-15 22:34:22 +00:00
Frank Barchard	f71c83552d	I420ToRGB24MatrixFilter function added - Implemented as 3 steps: Upsample UV to 4:4:4, I444ToARGB, ARGBToRGB24 - Fix some build warnings for missing prototypes. Pixel 4 I420ToRGB24_Opt (743 ms) I420ToRGB24Filter_Opt (1331 ms) Windows with skylake xeon: x86 32 bit I420ToRGB24_Opt (387 ms) I420ToRGB24Filter_Opt (571 ms) x64 64 bit I420ToRGB24_Opt (384 ms) I420ToRGB24Filter_Opt (582 ms) Bug: libyuv:938, libyuv:830 Change-Id: Ie27f70816ec084437014f8a1c630ae011ee2348c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3900298 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-09-16 19:46:47 +00:00
Frank Barchard	3e38ce5058	SSE2 MM21->YUY2 conversion Add SSE2 optimization for MM21ToYUY2 conversion. Bug: b/238137982 Change-Id: I189f712514308322f651b082b496bce9c015c4ee Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832525 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Justin Green <greenjustin@google.com>	2022-08-17 18:39:05 +00:00
Frank Barchard	65e7c9d570	MM21ToYUY2 and ABGRToJ420 conversion MM21 to YUY2 use zip1 for performance Cortex A510 Was MM21ToYUY2 (612 ms) Now MM21ToYUY2 (573 ms) Prefetches help Cortex A53 Was MM21ToYUY2 (4998 ms) Now MM21ToYUY2 (1900 ms) Pixel 4 Cortex A76 Was MM21ToYUY2 (215 ms) Now MM21ToYUY2 (173 ms) ABGRToJ420 - NEON, SSSE3 and AVX2 row functions - J400, J420 and J422 formats. - Added AVX2 for UV on ARGBToJ420. Was SSSE3 Same code/performance as ARGBToJ420 but with constants re-ordered. Pixel 4 ABGRToJ420_Opt (623 ms) ABGRToJ422_Opt (702 ms) ABGRToJ400_Opt (238 ms) Skylake Xeon With LIBYUV_BIT_EXACT which uses C for UV ABGRToJ420_Opt (988 ms) ABGRToJ422_Opt (1872 ms) ABGRToJ400_Opt (186 ms) Skylake Xeon using AVX2 ABGRToJ420_Opt (251 ms) ABGRToJ422_Opt (245 ms) ABGRToJ400_Opt (184 ms) Skylake Xeon using SSSE3 ABGRToJ420_Opt (328 ms) ABGRToJ422_Opt (362 ms) ABGRToJ400_Opt (185 ms) Bug: b/238137982 Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111 Reviewed-by: Justin Green <greenjustin@google.com> Reviewed-by: Wan-Teh Chang <wtc@google.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-08-16 22:07:38 +00:00
Yuan Tong	98ec7c28d5	Fix SSE2 version of ScalePlaneUp2_16_Bilinear - Define HAS_SCALEROWUP2_BILINEAR_16_SSE2: it's now fixed. - Correct function name to ScaleRowUp2_Bilinear_16_Any_SSE2: this row function uses only SSE2 instructions. Bug: libyuv:882 Change-Id: Ib1c7ac5b09997cb5b32bc54109d8c566af762433 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3800842 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-08-02 20:35:48 +00:00
Frank Barchard	b028453ba6	Disable bilinear 16 bit scale up for SSE2 - Undefine HAS_SCALEROWUP2_BILINEAR_16_SSE2 - Save XMM7 in ScaleRowUp2_Bilinear_16_SSE2(). - Rename HAS_SCALEROWUP2LINEAR_xxx to HAS_SCALEROWUP2_LINEAR_xxx - DetileSplitUVRow_C() is implemented using SplitUVRow_C(). - Changes to unit_test/planar_test.cc. Bug: libyuv:882 Change-Id: I0a8e8e5fb43bdf58ded87244e802343eacb789f2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3795063 Reviewed-by: Wan-Teh Chang <wtc@google.com>	2022-08-01 22:54:48 +00:00
Frank Barchard	fe4a50df8e	Bilinear scale up msan fix - Avoid stepping to height + 1 for bilinear filter 2nd row for last row of source - Box filter ubsan fix for 3/4 and 3/8 scaling for 16 bit planar - Height 1 asan fixes Bug: libyuv:935, b/206716399 Change-Id: I56088520f2a884a37b987ee5265def175047673e Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717263 Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-06-22 00:11:49 +00:00
Frank Barchard	30f9b28048	Add I210ToI420 Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 Change-Id: Ib135d0b4ff17665f6a4ab60edb782a7b314219a4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3696042 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2022-06-09 08:07:50 +00:00
Frank Barchard	d011314f14	Revert "I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix" This reverts commit 60254a1d846a93a4d7559009004cdd91bcc04d82. Reason for revert: breaks PaintCanvasVideoRendererTest.HighBitDepth Original change's description: > I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix > > - Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit > - Add NEON InterpolateRow_16 for fast 10 bit scaling > - When scaling up, set step to interpolate toward height - 1 to avoid buffer overread > - When scaling down, center the 2 rows used for source to achieve filtering. > - CopyPlane check for 0 size and return > > Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 > Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1 > Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552 > Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> > Reviewed-by: richard winterton <rrwinterton@gmail.com> Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 Change-Id: Icc05bb340db0e7fe864061fb501d0a861c764116 No-Presubmit: true No-Tree-Checks: true No-Try: true Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3692886 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2022-06-07 09:16:05 +00:00
Frank Barchard	60254a1d84	I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix - Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit - Add NEON InterpolateRow_16 for fast 10 bit scaling - When scaling up, set step to interpolate toward height - 1 to avoid buffer overread - When scaling down, center the 2 rows used for source to achieve filtering. - CopyPlane check for 0 size and return Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482 Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2022-06-07 01:41:56 +00:00
Frank Barchard	d62ee21e66	UVScale fix for vertical-only scaling Bug: b/228841445 Change-Id: I0342856e1bfcea69851d718459d66926bb170219 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3595240 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Miguel Casas-Sanchez <mcasas@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-04-20 01:27:33 +00:00
Frank Barchard	2ad73733d9	I422Rotate update to remove name space for ios build warning - Remove libyuv:: from within libyuv to resolve a build warning on IOS. - Check src_y parameter is not NULL if there is a dst_y parameter - Apply clang-format - Bump version Performance on Intel Skylake Xeon ARGBRotate90_Opt (795 ms) I420Rotate90_Opt (283 ms) I422Rotate90_Opt (867 ms) <-- scales and rotates I444Rotate90_Opt (565 ms) NV12Rotate90_Opt (289 ms) Performance on Pixel 4 (Cortex A76) ARGBRotate90_Opt (4208 ms) I420Rotate90_Opt (273 ms) I422Rotate90_Opt (1207 ms) I444Rotate90_Opt (718 ms) NV12Rotate90_Opt (282 ms) Bug: libyuv:926 Change-Id: I42e1b93a9595f6ed075918e91bed977dd3d23f6f Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3576778 Reviewed-by: Mirko Bonadei <mbonadei@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-04-07 21:06:44 +00:00
Sergio Garcia Murillo	4589081cea	Add I422 and I210 functions Bug: webrtc:13826 Change-Id: I68235a668abecf76133f7b89472b192b1442bed4 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3557217 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-03-31 15:30:53 +00:00
Frank Barchard	2c6bfc02d5	Remove MMI support Bug: libyuv:916 Change-Id: I345b7e271ceb4b32fe91e292915e66be40812810 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3415817 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2022-01-26 08:41:33 +00:00
Hao Chen	2f87e9a713	Add optimization functions in scale_lsx.cc file. Optimize 20 functions in source/scale_lsx.cc file. All test cases passed on loongarch platform. Bug: libyuv:913 Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-01-21 01:34:38 +00:00
Hao Chen	dfe046d272	Add optimization functions in row_lsx.cc file. Optimize 44 functions in source/row_lsx.cc file. All test cases passed on loongarch platform. Bug: libyuv:913 Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2022-01-21 01:34:38 +00:00
Frank Barchard	f0cfc1f1c8	ubsan friendly unaligned tests - ubsan complains on unaligned tests when an int16 or int32 is stored unaligned in C. Although current Intel, ARM, Mips and PPC can do unaligned load/store, its not guaranteed and could crash a CPU that doesnt support it. - unaligned tests use offset of 2 or 4, which ubsan accepts. - unittest fills in random buffer with 2 bytes at a time instead of a short. - row common functions for int16 types use 2 shorts instead of 1 int. Bug: libyuv:908, b/203243873 Change-Id: Idf13fa901647d7b0975f1947291caa781999a9bc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3229782 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>	2021-10-18 18:03:28 +00:00
Frank Barchard	55b97cb48f	BIT_EXACT for unattenuate and attenuate. - reenable Intel SIMD unaffected by BIT_EXACT - add bit exact version of ARGBAttenuate, which uses ARM version of formula. - add bit exact version of ARGBUnatenuate, which mimics the AVX code. Apply clang format to cleanup code. Bug: libyuv:908, b/202888439 Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-10-15 19:46:02 +00:00
Frank Barchard	11cbf8f976	Add LIBYUV_BIT_EXACT macro to force C to match SIMD - C code use ARM path, so NEON and C match - C used on Intel platforms, disabling AVX. Bug: libyuv:908, b/202888439 Change-Id: Ie035a150a60d3cf4ee7c849a96819d43640cf020 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3223507 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-10-14 20:37:39 +00:00
Frank Barchard	60db98b6fa	clang-tidy applied Bug: libyuv:886, libyuv:889 Change-Id: I2d14d03c19402381256d3c6d988e0b7307bdffd8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2800147 Reviewed-by: richard winterton <rrwinterton@gmail.com>	2021-04-01 21:42:47 +00:00
Frank Barchard	ba033a11e3	Add 12 bit YUV to 10 bit RGB Bug: libyuv:843 Change-Id: I0104c8fcaeed09e83d2fd654c6a5e7d41bcb74cf Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2727775 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Wan-Teh Chang <wtc@google.com>	2021-03-05 01:09:37 +00:00
Yuan Tong	c41eabe3d4	Add full 16 bit scaling up by 2x function R=fbarchard@chromium.org Change-Id: I4a869aefdc16e34357a615727711594c5d8e3a80 Bug: libyuv:882 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2719842 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2021-03-02 19:29:02 +00:00

1 2 3 4 5 ...

256 Commits