libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2025-12-07 01:06:46 +08:00

Author	SHA1	Message	Date
Frank Barchard	bb3180ae80	Add I420ToAR30 10 bit RGB For more complete support of AR30 format, add I420ToAR30 allowing the new RGB 10 bit format to be used from standard 8 bit I420 format. Bug: libyuv:751 Test: I420ToAR30 unittest added Change-Id: Ia8b0857447408bd6adab485158ce5f38d6dc2faa Reviewed-on: https://chromium-review.googlesource.com/823243 Reviewed-by: Weiyong Yao <braveyao@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2017-12-12 23:40:58 +00:00
Frank Barchard	c367751430	ARGBToAR30 SSSE3 use pmulhuw to replicate fields AR30 is optimized with 3 techniques 1. pmulhuw is used to replicate 8 bits to 10 bits. 2. Two channels are processed at a time. R and B, and A and G. 3. pshufb is used to shift and mask 2 channels of R and B Bug: libyuv:751 Test: ARGBToAR30_Opt Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b Reviewed-on: https://chromium-review.googlesource.com/822520 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2017-12-12 20:12:58 +00:00
Frank Barchard	11dd1b956f	ARGBToAR30 use vpmulhuw to replicate fields AR30 is optimized with 3 techniques 1. vpmulhuw is used to replicate 8 bits to 10 bits. 2. Two channels are processed at a time. R and B, and A and G. 3. vpshufb is used to shift and mask 2 channels of R and B Red Blue With the 8 bit value in the upper bits, vpmulhuw by (1024+4) will produce a 10 bit value in the low 10 bits of each 16 bit value. This is whats wanted for the blue channel. The red needs to be shifted 4 left, so multiply by (1024+4)16 for red. Alpha Green Alpha and Green are already in the high bits so vpand can zero out the other bits, keeping just 2 upper bits of alpha and 8 bit green. The same multiplier could be used for Green - (1024+4) putting the 10 bit green in the lsb. Alpha would be a simple multiplier to shift it into position. It wants a gap of 10 above the green. Green is 10 bits, so there are 6 bits in the low short. 4 more are needed, so a multiplier of 4 gets the 2 bits into the upper 16 bits, and then a shift of 4 is a multiply of 16, so (416) = 64. Then shift the result left 10 to position the A and G channels. Bug: libyuv:751 Test: ARGBToAR30_Opt Change-Id: Ie4f20dce18203bae7b75acb1fd5232db8a8a4f11 Reviewed-on: https://chromium-review.googlesource.com/820046 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-12-12 02:57:54 +00:00
Frank Barchard	0f98c3c1df	Add ARGBToAR30Row_SSE2 to speed up H010ToAR30 Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions but xmm registers and doing half as many pixels per loop. Bug: libyuv:751 Test: LibYUVConvertTest.ARGBToAR30_Opt Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673 Reviewed-on: https://chromium-review.googlesource.com/817918 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2017-12-09 00:11:20 +00:00
Frank Barchard	324fa32739	Convert16To8Row_SSSE3 port from AVX2 H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit. Then standard YUV conversion can be used. This improves performance on low end CPUs. Future CL will by pass this conversion allowing for 10 bit YUV source, but the function will be useful as a utility for YUV conversions. Bug: libyuv:559, libyuv:751 Test: out/Release/libyuv_unittest --gtest_filter=H010ToAR30 --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1 Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af Reviewed-on: https://chromium-review.googlesource.com/792334 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2017-11-28 19:22:39 +00:00
Lei Zhang	8445617191	Mark a bunch of kArray variables as const. This allows the linker to move the variables from the .data section to the .rodata section. Bug: libyuv:254 Test: out/Release/libyuv_unittest --gtest_filter=* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1 Change-Id: I6998570f1af4337d7b80313d9e18e36aa20d6ec0 Reviewed-on: https://chromium-review.googlesource.com/777033 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2017-11-27 23:38:44 +00:00
Frank Barchard	26173eb73e	H010ToAR30 for 10 bit bt.709 YUV to 30 bit RGB This version of the H010ToAR30 provides a 3 step conversion Convert16To8Row_AVX2 H420ToARGB_AVX2 ARGBToAR30_AVX2 Low level function added to convert 16 bit to 8 bit using multiply to adjust 10 bit or other bit depths and then save the upper 16 bits. Bug: libyuv:751 Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03 Reviewed-on: https://chromium-review.googlesource.com/783554 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2017-11-22 23:58:30 +00:00
Frank Barchard	a98d6cdb17	ARGBToAR30 AVX2 conversion function Bug: libyuv:751 Test: LibYUVConvertTest.ARGBToAR30_Opt Change-Id: I09c13eb53ba5f1ce1740c013dc587f8300f1d9e0 Reviewed-on: https://chromium-review.googlesource.com/780437 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2017-11-21 20:37:01 +00:00
Frank Barchard	46594be758	add ScalePlane_16 unit tests Tests ScalePlane vs ScalePlane_16 match. Bug: libyuv:749 Test: LibYUVScaleTest.ScalePlaneDownBy4_Box_16 Change-Id: I3f71748da404982d5d48bfb11bbd3ae95a1d021c Reviewed-on: https://chromium-review.googlesource.com/765045 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Weiyong Yao <braveyao@chromium.org>	2017-11-16 01:40:48 +00:00
Frank Barchard	49d1e3b036	MultiplyRow_16_AVX2 for converting 10 bit YUV When converting from lsb 10 bit formats to msb, the values need to be shifted to the top 10 bits. Using a multiply allows the different numbers of bits to be copied: // 128 = 9 bits // 64 = 10 bits // 16 = 12 bits // 1 = 16 bits Bug: libyuv:751 Test: LibYUVPlanarTest.MultiplyRow_16_Opt Change-Id: I9cf226053a164baa14155215cb175065b1c4f169 Reviewed-on: https://chromium-review.googlesource.com/762951 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-11-10 22:02:32 +00:00
Frank Barchard	2f58d126b9	MergeUV10Row_AVX2 use multiply to handle different bit depths Instead of hardcoded shift, use a multiply by a parameter. 128 = 9 bits 64 = 10 bits 16 = 12 bits 1 = 16 bits Bug: libyuv:751 Test: LibYUVPlanarTest.MergeUV10Row_Opt Change-Id: Id925edfdbf91243370c90641b50eb8e7625ec329 Reviewed-on: https://chromium-review.googlesource.com/762523 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-11-10 03:38:07 +00:00
Frank Barchard	e26b0a7e0e	casting for c89 compatibility and lint cleanup Bug: libyuv:756 Test: CFLAGS="-m32 -static -std=gnu89 -mno-sse -O2" CXXFLAGS="-m32 -x c -static -std=gnu99 -mno-sse -O2" make -f linux.mk libyuv.a Change-Id: Ic362f93e01ccbb0bea14f361a58585e79297e7d2 Reviewed-on: https://chromium-review.googlesource.com/759423 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Patrik Höglund <phoglund@chromium.org> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-11-09 18:22:17 +00:00
Frank Barchard	735ace2ed3	Re-enable x86 assembly without requiring -msse2 clang does not require -msse2 or -msse for inline, except the "x" parameter. So change this to "m" for 32 bit. 64 bit requires sse2 so use "x" for 64 bit. gcc requires -msse for xmm registers in clobber list. Reduce compiler requirement from -msse2 to -msse for enabling assembly. Bug: libyuv:754, libyuv:757 Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk Change-Id: I86df72cfee80b7d349561c1fd7c97ad360767255 Reviewed-on: https://chromium-review.googlesource.com/759303 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-11-09 00:51:06 +00:00
Frank Barchard	d997ac287d	Revert "Enable SSE2 code without -msse" This reverts commit 01e994d74e4e3937ee1a3efdc048320a1e51f818. Change-Id: Ie76710d0f4e641e071889c5125fd3be23cdcdb59 Reviewed-on: https://chromium-review.googlesource.com/758499 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-11-08 19:33:09 +00:00
Frank Barchard	01e994d74e	Enable SSE2 code without -msse Bug: libyuv:754 Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk Change-Id: I74bf8d032013694e65ea7637bc38d3253db53ff2 Reviewed-on: https://chromium-review.googlesource.com/758043 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-11-08 02:54:41 +00:00
Frank Barchard	a0c32b9e49	MergeUV10Row_AVX2 for converting H010 to P010 H010 is 10 bit planar format with 10 bits in lower bits. P010 is 10 bit biplanar format with 10 bits in upper bits. This function weaves the U and V channels and shifts the bits into the upper bits. Bug: libyuv:751 Test: LibYUVPlanarTest.MergeUV10Row_Opt Change-Id: I4a0bac0ef1ff95aa1b8d68261ec8e8e86f2d1fbf Reviewed-on: https://chromium-review.googlesource.com/752692 Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-11-03 18:55:36 +00:00
Frank Barchard	1e16cb5c38	SplitRGBPlane and MergeRGBPlane functions added Converts packed RGB to planar and back. TBR=kjellander@chromium.org BUG=libyuv:728 TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc Reviewed-on: https://chromium-review.googlesource.com/658069 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-09-11 21:02:04 +00:00
Frank Barchard	6825b161d7	HalfFloat SSE2/AVX2 optimized port scheduling. Uses 1 add instead of 2 leas to reduce port pressure on ports 1 and 5 used for SIMD instructions. BUG=libyuv:670 TEST=~/iaca-lin64/bin/iaca.sh -arch HSW out/Release/obj/libyuv/row_gcc.o Change-Id: I3965ee5dcb49941a535efa611b5988d977f5b65c Reviewed-on: https://chromium-review.googlesource.com/433391 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-02-11 01:02:06 +00:00
Frank Barchard	76e7f104ae	documentation updates BUG=None TEST=Untested Change-Id: I8ab95654255d1aa9cf05a664ecf59ee6c0757e66 Reviewed-on: https://chromium-review.googlesource.com/434941 Reviewed-by: Henrik Kjellander <kjellander@chromium.org> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-02-02 18:31:32 +00:00
Frank Barchard	749e316ed8	Remove commented out code TEST=None BUG=libyuv:672 Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169 Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169 Reviewed-on: https://chromium-review.googlesource.com/430321 Reviewed-by: Aaron Gable <agable@chromium.org>	2017-01-20 02:03:12 +00:00
Frank Barchard	a7c87e19f0	add Intel Code Analyst markers add macros to enable/disable code analyst around blocks of code. Normally these macros should not be used, but if performance details are wanted for intel code, enable them around the code and then run via the iaca tool, available on the intel website. BUG=libyuv:670 TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest R=wangcheng@google.com Review-Url: https://codereview.chromium.org/2626193002 .	2017-01-13 15:50:24 -08:00
Frank Barchard	3028e1bd97	clang-format row_gcc.cc with some functions disabled BUG=libyuv:654 TEST=try bots build R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2484083003 .	2016-11-07 18:37:29 -08:00
Frank Barchard	e62309f259	clang-format libyuv BUG=libyuv:654 R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2469353005 .	2016-11-07 17:37:23 -08:00
Frank Barchard	c2073823b4	use __OPTIMIZE__ macro to determine debug vs release. Debug builds of x86 gcc/clang can run out of register. Previously NDEBUG or _DEBUG was used to detect a debug build. But those macros are not set by gentoo builds. This CL switches to the compiler predefine __OPTIMIZE__ which is built into clang and gcc. BUG=libyuv:602 TEST=untested R=wangcheng@google.com Review URL: https://codereview.chromium.org/2451503002 .	2016-10-24 18:02:48 -07:00
Frank Barchard	451af5e922	scale by 1 for neon implemented void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fcvtn v4.4h, v2.4s \n" // 8 floatsgit "fcvtn2 v4.8h, v1.4s \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : : "cc", "memory", "v1", "v2", "v4" ); } void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent "fmul v1.4s, v1.4s, %3.s[0] \n" "uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat "uqshrn2 v4.8h, v1.4s, #13 \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : "w"(scale * 1.9259299444e-34f) // %3 : "cc", "memory", "v1", "v2", "v4" ); } TEST=LibYUVPlanarTest.TestHalfFloatPlane_One BUG=libyuv:560 R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2430313008 .	2016-10-21 14:30:03 -07:00
Frank Barchard	550cf829fb	HalfFloat avx2 unpack bug fix. AVX unpack parameters were reverse ordered causing incorrect results on AVX2 hardware. TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=Half BUG=libyuv:560 R=wangcheng@google.com Review URL: https://codereview.chromium.org/2438893002 .	2016-10-20 15:49:00 -07:00
Frank Barchard	2d80fc3133	Port HalfFloatRow_SSE2 to AVX2 but not using F16C. R=wangcheng@google.com, hubbe@chromium.org BUG=libyuv:560 Review URL: https://codereview.chromium.org/2421993002 .	2016-10-14 19:01:41 -07:00
Frank Barchard	a5e93766a2	Add ARGBExtractAlpha_AVX2 function Port SSE2 version to AVX2. BUG=libyuv:572 TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=Extract R=wangcheng@google.com, magjed@chromium.org Review URL: https://codereview.chromium.org/2420553002 .	2016-10-13 16:03:43 -07:00
Frank Barchard	d363ea6527	Remove I411 support. YUV 411 is very uncommon format. Remove support. Update documentation to reflect that 411 is deprecated. Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now. BUG=libyuv:645 R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2406123002 .	2016-10-11 11:14:16 -07:00
Frank Barchard	aa197ee1a3	HalfFloat_SSE2 for Visual C Low level support for 12 bit 420, 422 and 444 YUV video frame conversion. BUG=libyuv:560, chromium:445071 TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows R=hubbe@chromium.org, wangcheng@google.com Review URL: https://codereview.chromium.org/2387713002 .	2016-10-03 10:33:38 -07:00
Frank Barchard	4a14cb2e81	HalfFloat_SSE2 port from C algorithm to SSE2 Low level support for 12 bit 420, 422 and 444 YUV video frame conversion. BUG=libyuv:560, chromium:445071 TEST=untested R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2381493006 .	2016-09-30 09:47:16 -07:00
Frank Barchard	7fc932ddd3	Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion. BUG=libyuv:560,chromium:445071 TEST=untested R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2371293002 .	2016-09-29 15:06:30 -07:00
Frank Barchard	fd3e676e91	android_full_debug x86 fix - use +rm for width count Work around for android full debug build runnign out of registers. 5 functions were running out of registers causing the compiler error error: 'asm' operand has impossible constraints These functions mostly have 4 pointers, a counter (width) and a tempory eax register. With fpic and debug using stackframes, 2 registers are unavailable. So a total of 8 registers are used. Although fpic and stack frame dont apply to assembly, the compiler reserves 2 registers. The optimized version builds, so its likely freeing up the registers once it knows they are not used. These functions used to build, so compile options and/or compiler may have updated.. likely fpic was turned on. An attribute can be done to disable each, and will avoid using the 2 GPR registers, but they are still reserved and unavailable in debug builds on current compilers (gcc 4.9 and clang 3.8). R=dhrosa@google.com BUG=libyuv:602 Review URL: https://codereview.chromium.org/2066933002 .	2016-06-14 15:25:28 -07:00
Magnus Jedvert	942db3016a	Add ARGBExtractAlpha function BUG=libyuv:572 R=fbarchard@google.com Review URL: https://codereview.chromium.org/1995293002 .	2016-05-26 10:30:57 +02:00
Frank Barchard	cf101116c9	Remove initialize to zero on output variables for inline. Inline that uses temporary variables is currently initializing them to 0 and passing in as output "+r". This CL replaces the output constraint to "=&r" for most meaning an output with early write (before inputs). This allows the initialize to zero step to be removed, saving 1 instruction. BUG=libyuv:580 TESTED=local libyuv build on gcc/linux and try bots R=harryjin@google.com Review URL: https://codereview.chromium.org/1895743008 .	2016-04-18 16:24:26 -07:00
Frank Barchard	22e062a448	Port ARGBToJ420 to AVX2 ARGBToJ420 had an SSSE3 version, but not AVX2. ARGBToI420 had an AVX2, so adapt that code to J420. TBR=harryjin@google.com BUG=libyuv:553 Review URL: https://codereview.chromium.org/1702373004 .	2016-02-17 23:16:39 -08:00
Frank Barchard	cc33dc68c7	Port I411ToARGBRow to AVX2. An SSSE3 version already exists, and an AVX2 version is available for Visual C. This ports the function to AVX2 completing the AVX2 ports of all YUV to RGB functions for AVX2 on gcc. TBR=harryjin@google.com BUG=libyuv:555 Review URL: https://codereview.chromium.org/1687253002 .	2016-02-12 10:26:10 -08:00
Frank Barchard	9e39c1f271	ubsan overflow fix for multiply by 0x01010101 This is an UBSan error reported by libjingle [ RUN ] WebRtcVideoFrameTest.ConvertToYUY2BufferStride [000:000] (videoframe.cc:375): Validate frame passed. format: I420 bpp: 12 size: 1280x720 bytes: 1382400 expected: 1382400 sample[0..3]: 73, 73, 73, 73 ../../chromium/src/third_party/libyuv/source/row_gcc.cc:2903:25: runtime error: signed integer overflow: 128 * 16843009 cannot be represented in type 'int' [8/614] WebRtcVideoFrameTest.ConvertToYUY2BufferStride returned/aborted with exit code 1 (32 ms) [9/614] WebRtcVideoFrameTest.ConvertToYUY2BufferInverted (29 ms) Note: Google Test filter = WebRtcVideoFrameTest.ConvertToYUY2BufferInverted The source is uint8 and the multiply is by 0x01010101 to replicate the byte to 4 bytes. Changing the constant to 0x01010101u should avoid overflow. R=harryjin@google.com TBR=harryjin@google.com BUG=libyuv:563 Review URL: https://codereview.chromium.org/1657533005 .	2016-02-01 12:29:04 -08:00
Frank Barchard	081475b3c8	refactor ARGBToI422 using ARGBToI420 internally R=harryjin@google.com BUG=libyuv:546 Review URL: https://codereview.chromium.org/1574253004 .	2016-01-12 17:05:49 -08:00
Frank Barchard	23c6a83561	Fix ifdef mismatch for mirroruv Macro define and macro ifdef didnt match, leading to C code being used. Make macro match function name. TBR=harryjin@google.com BUG=libyuv:543 Review URL: https://codereview.chromium.org/1579023002 .	2016-01-11 16:33:36 -08:00
Frank Barchard	71deb7ba3a	bug fix - remove shift from InterpolateRow_AVX2 TBR=harryjin@google.com BUG=libyuv:537 Review URL: https://codereview.chromium.org/1547703002 .	2015-12-22 10:28:48 -08:00
Frank Barchard	2cb2e9e1ad	fix for InterpolateRow_AVX2 TBR=harryjin@google.com BUG=libyuv:535 Review URL: https://codereview.chromium.org/1543773002 .	2015-12-21 18:35:12 -08:00
Frank Barchard	3f4d86053e	avx2 interpolate use 8 bit BUG=libyuv:535 R=dhrosa@google.com Review URL: https://codereview.chromium.org/1535833003 .	2015-12-21 10:57:32 -08:00
Frank Barchard	f4447745ae	Add rounding to InterpolateRow for improved quality and consistency. Remove inaccurate specializations for 1/4 and 3/4, since they round incorrectly. Specialize for 100% and 50% are kept due to performance. Make C and ARM code match SSSE3. Make unittests expect zero difference. BUG=libyuv:535 R=harryjin@google.com Review URL: https://codereview.chromium.org/1533643005 .	2015-12-17 15:24:06 -08:00
Frank Barchard	1ccbf8fb7b	use memory for loop counter to work around nearly out of registers TBR=harryjin@google.com BUG=libyuv:533 Review URL: https://codereview.chromium.org/1535433003 .	2015-12-16 17:13:37 -08:00
Frank Barchard	cb44936403	fix typo in avx2 gcc blend. was using wrong register on 32 pixel version. R=harryjin@google.com, dhrosa@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1511433006 .	2015-12-09 10:38:46 -08:00
Frank Barchard	dee77a4ebe	Optimize yuv alpha blend AVX2 code to do 32 pixels at time. out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt Was LibYUVPlanarTest.I420Blend_Opt (2335 ms) Now LibYUVPlanarTest.I420Blend_Opt (1937 ms) vs SSSE3 LibYUVPlanarTest.I420Blend_Opt (2599 ms) BUG=libyuv:527 R=dhrosa@google.com Review URL: https://codereview.chromium.org/1505673003 .	2015-12-08 18:20:30 -08:00
Frank Barchard	bea690b3e0	AVX2 YUV alpha blender and improved unittests AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions. unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1505433002 .	2015-12-05 22:23:29 -08:00
Frank Barchard	fa2618ee26	Port BlendPlaneRow_SSSE3 to GCC R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1490273006 .	2015-12-04 11:19:41 -08:00
Frank Barchard	526558b2d8	disable debug build of 411 to work around compiler bug TBR=harryjin@google.com BUG=libyuv:524 Review URL: https://codereview.chromium.org/1461013002 .	2015-11-19 02:25:00 -08:00
Frank Barchard	b7dfb72559	fix for I411 build error on 32 bit x86 TBR=harrjin@google.com BUG=libyuv:525 Review URL: https://codereview.chromium.org/1461693004 .	2015-11-19 01:45:14 -08:00
Frank Barchard	528356a128	syntax fix for gcc movzwl TBR=harryjin@google.com BUG=libtyv:525 Review URL: https://codereview.chromium.org/1460723003 .	2015-11-18 13:14:15 -08:00
Frank Barchard	50f8cb2db3	port I411 movzx 2 byte reader to gcc previously the I411 format used movd to read U, V pixels. But this reads 4 bytes, and can cause a memory exception. pinsrw can be used, but fails on drmemory 1.5, and is slow. So in this change a movzxw is used to read 2 bytes into EBX, then copy to xmm0 with movd. Slightly slower, but no memory exception Was LibYUVConvertTest.I411ToARGB_Opt (577 ms) Now LibYUVConvertTest.I411ToARGB_Opt (608 ms) TBR=harryjin@google.com BUG=libyuv:525 Review URL: https://codereview.chromium.org/1457783004 .	2015-11-18 13:05:39 -08:00
Frank Barchard	0815568a50	test for unaligned vs aligned for CopyRow_SSE2 improves performance on older CPUs where movdqa is faster. TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1455463002 .	2015-11-17 00:04:03 -08:00
Frank Barchard	1019e4537f	port I444ToARGB avx2 code from Visual C to GCC. SSSE3 Note: Google Test filter = I444ToARGB [==========] Running 8 tests from 1 test case. [----------] Global test environment set-up. [----------] 8 tests from LibYUVConvertTest [ RUN ] LibYUVConvertTest.I444ToARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms) [----------] 8 tests from LibYUVConvertTest (3389 ms total) AVX2 Note: Google Test filter = I444ToARGB [==========] Running 8 tests from 1 test case. [----------] Global test environment set-up. [----------] 8 tests from LibYUVConvertTest [ RUN ] LibYUVConvertTest.I444ToARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Any [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Invert [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms) [ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Opt [ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms) [----------] 8 tests from LibYUVConvertTest (2615 ms total) TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1445893002 .	2015-11-13 18:31:22 -08:00
Frank Barchard	431cb3667a	YUV to RGB for x64 use registers instead of memory. On Arm the YVU to RGB conversions move constants into registers. This change does the same for 64 bit intel builds where additional registers are available. The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided. x64 builds using memory: AVX2 I420ToARGB_Opt (3059 ms) SSSE3 I420ToARGB_Opt (3959 ms) Now using registers AVX2 I420ToARGB_Opt (2906 ms) SSSE3 I420ToARGB_Opt (3928 ms) TBR=harryjin@google.com BUG=libyuv:520 Review URL: https://codereview.chromium.org/1407353010 .	2015-11-04 16:16:18 -08:00
Frank Barchard	ce4c2fad1d	Raw 24 bit RGB to RGB24 (bgr) Add unittests that do 1 step conversion vs 2 step conversion. Tests end swapping versions match direct conversions. R=harryjin@google.com BUG=libyuv:518 Review URL: https://codereview.chromium.org/1419103007 .	2015-11-03 10:30:30 -08:00
Frank Barchard	87926cec8b	remove store bgra, abgr, raw unused macros TBR=harryjin@google.com BUG=libyuv:518 Review URL: https://codereview.chromium.org/1420033004 .	2015-11-02 10:40:03 -08:00
Frank Barchard	2c7aa0070a	remove I422ToBGRA and use I422ToRGBA internally Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix. Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions. R=harryjin@google.com BUG=libyuv:518 Review URL: https://codereview.chromium.org/1427993004 .	2015-11-02 10:24:12 -08:00
Frank Barchard	cdbdf5b723	Fix debug compilation problems for gcc and 32 bit x86. In some methods with 7 arguments gcc fails to find enough registers to compile the assembler code when compiling debug. Simplest solution is to skip the assembler version in debug of those particular functions (I422Alpha -> ARBG/ABGR) R=harryjin@google.com,bratell@opera.com BUG=libyuv:517 Review URL: https://codereview.chromium.org/1423283002 .	2015-10-28 14:27:29 -07:00
Frank Barchard	b86dbf24d3	refactor I420AlphaToABGR to use I420AlphaToARGB internally swap U and V and transpose conversion matrix, so I420AlphaToARGB and I420AlphaToABGR share low level code. Having less code with same performance allows more focused optimization for future ARM versions. R=harryjin@google.com TBR=harryjin@chromium.org BUG=libyuv:473,libyuv:516 Review URL: https://codereview.chromium.org/1422263002 .	2015-10-27 14:17:21 -07:00
Frank Barchard	cf160cdbaa	implement I444ToABGR by swapping uv and transpose matrix U contributes to B and G. V contributes to R and G. By swapping U and V, they contribute to the opposite channels. Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the new B channel and vice versa. This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers. As a result the existing I444ToABGRRow functions are no longer needed and are removed. Previously this function was only Intel AVX2 optimized for Windwos. Now it is also optimized for Arm and GCC. ARMv7 Neon Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms) Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms) 20.6 times faster. R=xhwang@chromium.org BUG=libyuv:515 Review URL: https://codereview.chromium.org/1414133006 .	2015-10-27 10:21:21 -07:00
Frank Barchard	2e4466e282	change all pix parameters to width for consistency TBR=harryjin@google.com BUG=none Review URL: https://codereview.chromium.org/1398633002 .	2015-10-07 22:30:36 -07:00
Frank Barchard	76a599ec3b	fix jpeg and bt.709 yuvconstants for neon64. yuv constants for bt.601 were previously ported to neon64, as well as the code to respect other color spaces. But the jpeg and bt.709 colour conversion constants were still in armv7 form. This changes the constants for aarch64 builds to be compatible with the code. yuv constants are now passed as const * Remove Yvu constants which were used for older version on nv21 but not new code. TBR=harryjin@google.com BUG=none Review URL: https://codereview.chromium.org/1398623002 .	2015-10-07 19:46:56 -07:00
Frank Barchard	fae8e66d43	Fix for AVX2 dither function. Fix for 64 bit gcc parameter in dither function which requires m not r, when ABI uses register. BUG=none Review URL: https://codereview.chromium.org/1399463002 .	2015-10-07 19:17:56 -07:00
Frank Barchard	8f0cadede4	port ARGB to 565 dithering AVX2 code to GCC. Previously the assembly code was only available to Windows. This CL ports the AVX2 code to GCC syntax. TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1391273003 .	2015-10-07 19:13:59 -07:00
Frank Barchard	cc89e3a77b	port ARGB to 565 dithering SSE2 code to GCC. Previously the assembly code was only available to Windows. This CL ports the SSE2 code to GCC syntax. When running a profiler on all the unittests, this function was the slowest of all functions that still ran in C code. 3.71% libyuv_unittest libyuv_unittest [.] ARGBToRGB565DitherRow_C Was ARGBToRGB565Dither_Opt (2894 ms) Now ARGBToRGB565Dither_Opt (432 ms) TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1397673002 .	2015-10-07 18:24:50 -07:00
Frank Barchard	914a9856c7	Reimplement NV21ToARGB to allow different color matrix. Low level for NV21ToARGB written to accept yuv matrix used by other YUV to ARGB functions. Previously NV21 was implemented for Windows using NV12 with a different matrix that swapped U and V. But the Arm version of the low level does not allow the matrix U and V contributions to be swapped. Using a new low level function that reads NV21 and uses the same yuvconstants as other YUV conversion functions allows an Arm port of this function. TBR=harryjin@google.com BUG=libyuv:500 Review URL: https://codereview.chromium.org/1388273002 .	2015-10-06 20:34:44 -07:00
Frank Barchard	2cc1a2b233	Remove sse2 functions that also have ssse3 ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2 Since vast majority of CPUs have SSSE3 now, removing the SSE2 improves the performance of CPU dispatching. R=harryjin@google.com BUG=none Review URL: https://codereview.chromium.org/1377053003 .	2015-09-30 14:24:44 -07:00
Frank Barchard	d039ad6e9b	Width use memory instead of register for 32 bit fpic. Code runs out of registers on 32 bit fpic builts. TBR=harryjin@google.com BUG=libyuv:496 Review URL: https://codereview.chromium.org/1369053002 .	2015-09-25 15:36:04 -07:00
Frank Barchard	9a0e12f5f1	AVX2 1 step I422AlphaToARGB for gcc and win. C I420AlphaToARGB_Opt (5169 ms) SSSE3 I420AlphaToARGB_Opt (432 ms) AVX2 I420AlphaToARGB_Opt (358 ms) and with premultiplication as 2 step process: I420AlphaToARGB_Premult (7029 ms) I420AlphaToARGB_Premult (757 ms) I420AlphaToARGB_Premult (508 ms) R=harryjin@google.com BUG=libyuv:496,libyuv:473 Review URL: https://codereview.chromium.org/1372653003 .	2015-09-25 13:37:42 -07:00
Frank Barchard	8fb2048e9f	Fix nv12 64 bit gcc increment. Should be 16 bytes, but was 0x16 causing memory corruption. TBR=harryjin@google.com BUG=libyuv:492 Review URL: https://codereview.chromium.org/1368693002 .	2015-09-24 10:19:17 -07:00
Frank Barchard	accc04e6d8	NV12ToARGB_AVX2 ported to gcc TBR=harryjin@google.com BUG=none Review URL: https://codereview.chromium.org/1364913002 .	2015-09-23 15:54:16 -07:00
Frank Barchard	000cf89ca8	YUY2ToARGB avx2 in 1 step conversion. Includes UYVYToARGB ssse3 fix. Was YUY2ToARGB_Opt (433 ms) 69.79% libyuv_unittest libyuv_unittest [.] I422ToARGBRow_AVX2 20.73% libyuv_unittest libyuv_unittest [.] YUY2ToUV422Row_AVX2 6.04% libyuv_unittest libyuv_unittest [.] YUY2ToYRow_AVX2 0.77% libyuv_unittest libyuv_unittest [.] YUY2ToARGBRow_AVX2 Now YUY2ToARGB_Opt (280 ms) 95.66% libyuv_unittest libyuv_unittest [.] YUY2ToARGBRow_AVX2 BUG=libyuv:494 R=harryjin@google.com Review URL: https://codereview.chromium.org/1364813002 .	2015-09-23 11:15:18 -07:00
Frank Barchard	2b92ec8d0f	Fix git markers introduced on landing previous CL BUG=none Review URL: https://codereview.chromium.org/1359023003 .	2015-09-22 15:00:57 -07:00
Frank Barchard	5f3d4270d1	yuy2 to rgb gcc versions read in read function for yuv conversion R=harryjin@google.com BUG=libyuv:488 Review URL: https://codereview.chromium.org/1355393002 .	2015-09-22 14:27:33 -07:00
Frank Barchard	03cd8584e7	Read Y channel in read function for yuv conversion. Allows reader to support YUY2 format. Also contains fix for win64 build for yuv conversion. TBR=harryjin@google.com BUG=libyuv:488 Review URL: https://codereview.chromium.org/1355333002 .	2015-09-22 12:05:16 -07:00
Frank Barchard	f96890a0be	yuvconstants for all YUV to RGB conversion functions. R=harryjin@google.com BUG=libyuv:488 Review URL: https://codereview.chromium.org/1363503002 .	2015-09-22 10:26:03 -07:00
Frank Barchard	62c49dc811	move constants into common R=harryjin@google.com BUG=libyuv:488 Review URL: https://codereview.chromium.org/1359443005 .	2015-09-18 16:28:44 -07:00
Frank Barchard	0381673d19	port I444 to ARGB to matrix. Add I444 to ABGR. R=harryjin@google.com BUG=libyuv:488,libyuv:490 Review URL: https://codereview.chromium.org/1348763005 .	2015-09-18 14:36:15 -07:00
Frank Barchard	ed55d24d9f	H420 functionality R=harryjin@google.com BUG=libyuv:488 Review URL: https://webrtc-codereview.appspot.com/54869004 .	2015-09-06 11:01:40 -07:00
Frank Barchard	7060e0d826	I420ToABGRMatrix functions with J420ToABGR wrapper. Allows direct conversion from JPeg to ABGR for android. BUG=libyuv:488 R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/55719004 .	2015-09-03 10:42:36 -07:00
Frank Barchard	925c3d9e26	I420ToARGB conversion with matrix. Take color conversion constants as a parameter to row function for I420ToARGBMatrixRow_SSSE3. Allows future variations of color space using a single low level. R=harryjin@google.com BUG=libyuv:488 Review URL: https://webrtc-codereview.appspot.com/56669004 .	2015-09-02 10:45:42 -07:00
Frank Barchard	ee9aaea02f	i422torgb565 is asm for clangcl as well Merge branch 'master' of https://chromium.googlesource.com/libyuv/libyuv into convertcl allow lto for llvm but not gcc R=harryjin@google.com BUG=libyuv:469 Review URL: https://webrtc-codereview.appspot.com/52769004.	2015-08-19 10:46:30 -07:00
Frank Barchard	1f461f73d8	remove align directives R=harryjin@google.com BUG=none Review URL: https://webrtc-codereview.appspot.com/54809004.	2015-08-04 17:00:03 -07:00
Frank Barchard	0686f26938	blend remove alignment 1 pixel loop for less overhead. R=tpsiaki@google.com BUG=none TESTED=libyuvTest.ARGBBlend_Opt Review URL: https://webrtc-codereview.appspot.com/50289005.	2015-06-24 11:34:12 -07:00
fbarchard@google.com	2e9f3e5cf5	rename source files from row_posix.cc etc to row_gcc.cc to avoid gyp build filtering out source files from build when on windows with clang. The source code contained in row_gcc.cc is gcc syntax inline assembly available for any platform that supports gcc or clang for intel cpus. BUG=440 TESTED=try bots R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/56579004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1430 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-06-09 17:27:52 +00:00

1 2 3 4

187 Commits