libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2025-12-06 16:56:55 +08:00

Author	SHA1	Message	Date
Frank Barchard	cf101116c9	Remove initialize to zero on output variables for inline. Inline that uses temporary variables is currently initializing them to 0 and passing in as output "+r". This CL replaces the output constraint to "=&r" for most meaning an output with early write (before inputs). This allows the initialize to zero step to be removed, saving 1 instruction. BUG=libyuv:580 TESTED=local libyuv build on gcc/linux and try bots R=harryjin@google.com Review URL: https://codereview.chromium.org/1895743008 .	2016-04-18 16:24:26 -07:00
Frank Barchard	9c53ff2c57	Fix temporary stride for ConvertToARGB with rotation. BUG=libyuv:578 TESTED=local unittests pass R=harryjin@google.com Review URL: https://codereview.chromium.org/1879783002 .	2016-04-11 15:21:04 -07:00
Frank Barchard	3c862e3d29	Fix stride bug for msan on I420Interpolate. When using C version of I420Interpolate for msan, a 50% interpolation would cause stride to be cast to int, which could cause erroneous memory reads on 64 bit build. This CL makes the stride use ptrdiff_t for HalfRow_C BUG=libyuv:582 TESTED=try bots tests R=dhrosa@google.com Review URL: https://codereview.chromium.org/1872953002 .	2016-04-08 15:58:53 -07:00
Frank Barchard	c7372a323a	add if defined(_MSC_FULL_VER) for NaCL TBR=kjellander@chromium.org BUG=libyuv:573 TESTED=try bots Review URL: https://codereview.chromium.org/1850053002 .	2016-04-01 17:48:23 -07:00
Frank Barchard	76aee8ced7	Remove most clang-cl special cases from cpu_id.cc They are not needed, and due to them there was a call to _xgetbv() without a declaration of the function. This used to work because we implicitly included intrin.h in all translation units with clang-cl, but we want to stop doing that. BUG=chromium:592745 R=fbarchard@google.com Review URL: https://codereview.chromium.org/1780473003 .	2016-03-10 14:01:26 -08:00
Frank Barchard	ee99b85126	Port ARGBToRGB565 from aarch64 neon to 32 bit The 64 bit version of ARGBToRGB565 to 32 bit. 64 bit is using sri which shifts and inserts, saving some masking. The instruction is available for neon 32 bit as well. R=magjed@chromium.org, harryjin@google.com BUG=libyuv:571 Review URL: https://codereview.chromium.org/1724393002 .	2016-02-29 12:22:25 -08:00
Frank Barchard	22e062a448	Port ARGBToJ420 to AVX2 ARGBToJ420 had an SSSE3 version, but not AVX2. ARGBToI420 had an AVX2, so adapt that code to J420. TBR=harryjin@google.com BUG=libyuv:553 Review URL: https://codereview.chromium.org/1702373004 .	2016-02-17 23:16:39 -08:00
Frank Barchard	127ff512b3	add perf data files to ignores document play services update R=jkellander@chromium.org BUG=none Review URL: https://codereview.chromium.org/1712463002 .	2016-02-17 21:37:09 -08:00
Frank Barchard	cc33dc68c7	Port I411ToARGBRow to AVX2. An SSSE3 version already exists, and an AVX2 version is available for Visual C. This ports the function to AVX2 completing the AVX2 ports of all YUV to RGB functions for AVX2 on gcc. TBR=harryjin@google.com BUG=libyuv:555 Review URL: https://codereview.chromium.org/1687253002 .	2016-02-12 10:26:10 -08:00
Frank Barchard	0e554b18fe	port NV12ToRGB565Row_AVX2 to gcc NV12ToRGB565Row for Intel is implemented as a 2 step conversion: NV12ToARGBRow_SSSE3 and ARGBToRGB565Row_SSE2 NV12ToARGBRow has an AVX2 version, so this CL implements NV12ToRGB565Row_AVX2 with call to NV12ToARGBRow_AVX2 and ARGBToRGB565Row_SSE2. R=harryjin@google.com BUG=libyuv:554 Review URL: https://codereview.chromium.org/1687953002 .	2016-02-10 11:13:41 -08:00
Frank Barchard	c39509c8e5	add avx2 wrappers for functions that can call I422ToARGBRow_AVX2 R=harryjin@google.com BUG=libyuv:557 Review URL: https://codereview.chromium.org/1687713002 .	2016-02-09 17:14:29 -08:00
Frank Barchard	0d880e5bc0	rename MIPS_DSPR2 to DSPR2 for consistency When attempting to normalize function names to end in Row_SIMD it was made harder with MIPS_DSPR2 naming convention. Other CPUs do not include the vendor. This should be named consistently. Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other processors. TBR=harryjin@google.com BUG=libyuv:562 Review URL: https://codereview.chromium.org/1677633002 .	2016-02-05 14:49:54 -08:00
Frank Barchard	05ed0c539c	rework scale code for ubsan For more info on ubsan, see http://dev.chromium.org/developers/testing/undefinedbehaviorsanitizer TESTED=Passing compilation using: GYP_DEFINES="ubsan=1" GYP_DEFINES="ubsan_vptr=1" R=harryjin@google.com, pbos@webrtc.org BUG=libyuv:563 Review URL: https://codereview.chromium.org/1654253004 .	2016-02-02 11:01:49 -08:00
Frank Barchard	9e39c1f271	ubsan overflow fix for multiply by 0x01010101 This is an UBSan error reported by libjingle [ RUN ] WebRtcVideoFrameTest.ConvertToYUY2BufferStride [000:000] (videoframe.cc:375): Validate frame passed. format: I420 bpp: 12 size: 1280x720 bytes: 1382400 expected: 1382400 sample[0..3]: 73, 73, 73, 73 ../../chromium/src/third_party/libyuv/source/row_gcc.cc:2903:25: runtime error: signed integer overflow: 128 * 16843009 cannot be represented in type 'int' [8/614] WebRtcVideoFrameTest.ConvertToYUY2BufferStride returned/aborted with exit code 1 (32 ms) [9/614] WebRtcVideoFrameTest.ConvertToYUY2BufferInverted (29 ms) Note: Google Test filter = WebRtcVideoFrameTest.ConvertToYUY2BufferInverted The source is uint8 and the multiply is by 0x01010101 to replicate the byte to 4 bytes. Changing the constant to 0x01010101u should avoid overflow. R=harryjin@google.com TBR=harryjin@google.com BUG=libyuv:563 Review URL: https://codereview.chromium.org/1657533005 .	2016-02-01 12:29:04 -08:00
Frank Barchard	58cb534962	Fix memory overwrite in YUY2ToNV12 odd wdiths When width was odd Y channel wrote an extra pixel. This change splits the Y from UV into a temporary buffer and memcpy's to the destination. Performance is slower. Was YUY2ToNV12_Any (307 ms) YUY2ToNV12_Unaligned (213 ms) TestYUY2ToNV12 (181 ms) YUY2ToNV12_Opt (177 ms) YUY2ToNV12_Invert (177 ms) Npw YUY2ToNV12_Any (300 ms) YUY2ToNV12_Unaligned (226 ms) YUY2ToNV12_Invert (206 ms) TestYUY2ToNV12 (184 ms) YUY2ToNV12_Opt (181 ms) TBR=harryjin@google.com BUG=libyuv:545 Review URL: https://codereview.chromium.org/1593833002 .	2016-01-19 11:28:09 -08:00
Frank Barchard	8377c798fb	Fix I420ToNV21 for wrong dst_stride_y parameter. I420ToNV21 passes the wrong dst_stride_y when it calls I420ToNV12; parameter 8 (convert_from.cc:448) is src_stride_y but should be dst_stride_y. This causes image corruption when converting I420 -> NV21 with mismatched luminance strides. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:547 Review URL: https://codereview.chromium.org/1582793008 .	2016-01-14 17:38:54 -08:00
Frank Barchard	081475b3c8	refactor ARGBToI422 using ARGBToI420 internally R=harryjin@google.com BUG=libyuv:546 Review URL: https://codereview.chromium.org/1574253004 .	2016-01-12 17:05:49 -08:00
Frank Barchard	23c6a83561	Fix ifdef mismatch for mirroruv Macro define and macro ifdef didnt match, leading to C code being used. Make macro match function name. TBR=harryjin@google.com BUG=libyuv:543 Review URL: https://codereview.chromium.org/1579023002 .	2016-01-11 16:33:36 -08:00
Frank Barchard	0e462e6f45	Remove use_sysroot=0 use_sysroot=0 is required for webrtc on linux intel builds, but libyuv doesnt use the affected libraries, so removing this. R=harryjin@google.com, sbc@chromium.org BUG=libyuv:534,libyuv:542 Review URL: https://codereview.chromium.org/1566303002 .	2016-01-11 14:57:50 -08:00
Frank Barchard	fc52d8ded2	Odd width variation of scale down by 2 for subsampling R=dhrosa@google.com, harryjin@google.com BUG=libyuv:538 Review URL: https://codereview.chromium.org/1558093003 .	2016-01-06 15:12:17 -08:00
Frank Barchard	36615d62a0	fix for InterpolateRow_AVX2 port scaledownby4_avx2 to gcc TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1546763002 .	2015-12-22 12:29:54 -08:00
Frank Barchard	71deb7ba3a	bug fix - remove shift from InterpolateRow_AVX2 TBR=harryjin@google.com BUG=libyuv:537 Review URL: https://codereview.chromium.org/1547703002 .	2015-12-22 10:28:48 -08:00
Frank Barchard	2cb2e9e1ad	fix for InterpolateRow_AVX2 TBR=harryjin@google.com BUG=libyuv:535 Review URL: https://codereview.chromium.org/1543773002 .	2015-12-21 18:35:12 -08:00
Frank Barchard	3f4d86053e	avx2 interpolate use 8 bit BUG=libyuv:535 R=dhrosa@google.com Review URL: https://codereview.chromium.org/1535833003 .	2015-12-21 10:57:32 -08:00
Frank Barchard	f4447745ae	Add rounding to InterpolateRow for improved quality and consistency. Remove inaccurate specializations for 1/4 and 3/4, since they round incorrectly. Specialize for 100% and 50% are kept due to performance. Make C and ARM code match SSSE3. Make unittests expect zero difference. BUG=libyuv:535 R=harryjin@google.com Review URL: https://codereview.chromium.org/1533643005 .	2015-12-17 15:24:06 -08:00
Frank Barchard	1ccbf8fb7b	use memory for loop counter to work around nearly out of registers TBR=harryjin@google.com BUG=libyuv:533 Review URL: https://codereview.chromium.org/1535433003 .	2015-12-16 17:13:37 -08:00
Frank Barchard	80ca4514ef	change scale down by 4 to use rounding. TBR=harryjin@google.com BUG=libyuv:447 Review URL: https://codereview.chromium.org/1525033005 .	2015-12-15 21:25:18 -08:00
Frank Barchard	70445ef2ef	avx2 scale down by 2 for gcc R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1520423003 .	2015-12-15 10:59:20 -08:00
Frank Barchard	ae55e41851	use rounding in scaledown by 2 When scaling down by 2 the formula should round consistently. (a+b+c+d+2)/4 The C version did but the SSE2 version was doing 2 averages. avg(avg(a,b),avg(c,d)) This change uses a sum, then rounds. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:447,libyuv:527 Review URL: https://codereview.chromium.org/1513183004 .	2015-12-14 17:25:36 -08:00
Frank Barchard	b3bbcc1f4e	add ifdef for AVX2 so vs2010 can still compile R=harryjin@google.com BUG=libyuv:531 Review URL: https://codereview.chromium.org/1515503005 .	2015-12-09 15:23:51 -08:00
Frank Barchard	cb44936403	fix typo in avx2 gcc blend. was using wrong register on 32 pixel version. R=harryjin@google.com, dhrosa@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1511433006 .	2015-12-09 10:38:46 -08:00
Frank Barchard	353ffbab80	fix for gcc compile error: variable duplicate define TBR=harryjin@google.com BUG=libyuv:529 Review URL: https://codereview.chromium.org/1512793002 .	2015-12-08 19:03:43 -08:00
Frank Barchard	a2ea905679	BlendPlane any width. Benchmark out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=Blend \| sortms Was I420Blend_Any (2321 ms) I420Blend_Unaligned (1684 ms) I420Blend_Opt (1675 ms) I420Blend_Invert (1653 ms) BlendPlane_Invert (1556 ms) BlendPlane_Any (1552 ms) BlendPlane_Unaligned (1548 ms) BlendPlane_Opt (1535 ms) ARGBBlend_Unaligned (659 ms) ARGBBlend_Any (596 ms) ARGBBlend_Invert (591 ms) ARGBBlend_Opt (508 ms) BlendPlaneRow_Unaligned (186 ms) BlendPlaneRow_Opt (171 ms) Now ARGBBlend_Any (621 ms) ARGBBlend_Unaligned (585 ms) ARGBBlend_Invert (564 ms) ARGBBlend_Opt (512 ms) I420Blend_Unaligned (347 ms) I420Blend_Invert (345 ms) I420Blend_Any (337 ms) I420Blend_Opt (327 ms) BlendPlane_Unaligned (187 ms) BlendPlaneRow_Unaligned (187 ms) BlendPlane_Invert (186 ms) BlendPlane_Any (186 ms) BlendPlaneRow_Opt (173 ms) BlendPlane_Opt (171 ms) which is comparable to aligned case out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=Blend \| sortms ARGBBlend_Any (625 ms) ARGBBlend_Unaligned (602 ms) ARGBBlend_Invert (508 ms) ARGBBlend_Opt (506 ms) I420Blend_Any (353 ms) I420Blend_Unaligned (322 ms) I420Blend_Invert (304 ms) I420Blend_Opt (301 ms) BlendPlaneRow_Unaligned (188 ms) BlendPlane_Unaligned (186 ms) BlendPlane_Invert (185 ms) BlendPlane_Any (184 ms) BlendPlaneRow_Opt (173 ms) BlendPlane_Opt (169 ms) R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1513443002 .	2015-12-08 18:59:48 -08:00
Frank Barchard	dee77a4ebe	Optimize yuv alpha blend AVX2 code to do 32 pixels at time. out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt Was LibYUVPlanarTest.I420Blend_Opt (2335 ms) Now LibYUVPlanarTest.I420Blend_Opt (1937 ms) vs SSSE3 LibYUVPlanarTest.I420Blend_Opt (2599 ms) BUG=libyuv:527 R=dhrosa@google.com Review URL: https://codereview.chromium.org/1505673003 .	2015-12-08 18:20:30 -08:00
Frank Barchard	fae1a10545	Work around bug in xgetbv for Visual Studio. xgetbv is generating bad code, falsely disabling AVX2 and AVX512. disable optimization for the function affected on older versions of Visual C 32 bit. R=brucedawson@chromium.org, dhrosa@google.com, harryjin@google.com BUG=libyuv:529 Review URL: https://codereview.chromium.org/1503393004 .	2015-12-08 18:13:32 -08:00
Frank Barchard	2657688e70	Add support for odd height YUVA alpha blending. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1507683003 .	2015-12-07 12:03:20 -08:00
Frank Barchard	b0b22f88b9	Unroll C version of YUV blender for improved performance. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1502343003 .	2015-12-07 12:02:45 -08:00
Frank Barchard	48a919d86e	Bug fix for UYVYToNV12 odd height TBR=harryjin@google.com BUG=libyuv:528 Review URL: https://codereview.chromium.org/1506973002 .	2015-12-07 11:39:48 -08:00
Frank Barchard	bea690b3e0	AVX2 YUV alpha blender and improved unittests AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions. unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1505433002 .	2015-12-05 22:23:29 -08:00
Frank Barchard	fa2618ee26	Port BlendPlaneRow_SSSE3 to GCC R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1490273006 .	2015-12-04 11:19:41 -08:00
Frank Barchard	8af0ebf816	planar blend use signed images R=dhrosa@google.com, harryjin@google.com, jzern@chromium.org BUG=libyuv:527 Review URL: https://codereview.chromium.org/1491533002 .	2015-12-02 14:20:17 -08:00
Frank Barchard	b6f37bd8ec	Interpolate plane initial implementation. YUV version of interpolation between two images. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:526 Review URL: https://codereview.chromium.org/1479593002 .	2015-11-25 16:11:42 -08:00
Frank Barchard	526558b2d8	disable debug build of 411 to work around compiler bug TBR=harryjin@google.com BUG=libyuv:524 Review URL: https://codereview.chromium.org/1461013002 .	2015-11-19 02:25:00 -08:00
Frank Barchard	b7dfb72559	fix for I411 build error on 32 bit x86 TBR=harrjin@google.com BUG=libyuv:525 Review URL: https://codereview.chromium.org/1461693004 .	2015-11-19 01:45:14 -08:00
Frank Barchard	528356a128	syntax fix for gcc movzwl TBR=harryjin@google.com BUG=libtyv:525 Review URL: https://codereview.chromium.org/1460723003 .	2015-11-18 13:14:15 -08:00
Frank Barchard	50f8cb2db3	port I411 movzx 2 byte reader to gcc previously the I411 format used movd to read U, V pixels. But this reads 4 bytes, and can cause a memory exception. pinsrw can be used, but fails on drmemory 1.5, and is slow. So in this change a movzxw is used to read 2 bytes into EBX, then copy to xmm0 with movd. Slightly slower, but no memory exception Was LibYUVConvertTest.I411ToARGB_Opt (577 ms) Now LibYUVConvertTest.I411ToARGB_Opt (608 ms) TBR=harryjin@google.com BUG=libyuv:525 Review URL: https://codereview.chromium.org/1457783004 .	2015-11-18 13:05:39 -08:00
Frank Barchard	5eefbe2330	Fix for drmemory failure on I411ToARGB Before I420ToARGB_Opt (594 ms) I422ToARGB_Opt (483 ms) I411ToARGB_Opt (748 ms) * I444ToARGB_Opt (452 ms) I400ToARGB_Opt (218 ms) After I420ToARGB_Opt (591 ms) I422ToARGB_Opt (454 ms) I411ToARGB_Opt (502 ms) * I444ToARGB_Opt (441 ms) I400ToARGB_Opt (216 ms) TBR=harryjin@google.com BUG=libyuv:525 Review URL: https://codereview.chromium.org/1459513002 .	2015-11-17 18:00:52 -08:00
Frank Barchard	0815568a50	test for unaligned vs aligned for CopyRow_SSE2 improves performance on older CPUs where movdqa is faster. TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1455463002 .	2015-11-17 00:04:03 -08:00
Frank Barchard	1019e4537f	port I444ToARGB avx2 code from Visual C to GCC. SSSE3 Note: Google Test filter = I444ToARGB [==========] Running 8 tests from 1 test case. [----------] Global test environment set-up. [----------] 8 tests from LibYUVConvertTest [ RUN ] LibYUVConvertTest.I444ToARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms) [----------] 8 tests from LibYUVConvertTest (3389 ms total) AVX2 Note: Google Test filter = I444ToARGB [==========] Running 8 tests from 1 test case. [----------] Global test environment set-up. [----------] 8 tests from LibYUVConvertTest [ RUN ] LibYUVConvertTest.I444ToARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms) [----------] 8 tests from LibYUVConvertTest (2615 ms total) TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1445893002 .	2015-11-13 18:31:22 -08:00
Frank Barchard	60adcbaf32	scale with conversion using 2 steps with unittest a prototype function to implement the yuv to rgb with conversion and scale. replace with 1 step function in future version, using same API. R=harryjin@google.com BUG=libyuv:471 Review URL: https://codereview.chromium.org/1421553016 .	2015-11-13 11:25:56 -08:00

1 2 3 4 5 ...

1226 Commits