84 Commits

Author SHA1 Message Date
Frank Barchard
0f98c3c1df Add ARGBToAR30Row_SSE2 to speed up H010ToAR30
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.

Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-09 00:11:20 +00:00
Frank Barchard
324fa32739 Convert16To8Row_SSSE3 port from AVX2
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used.  This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.

Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-28 19:22:39 +00:00
Lei Zhang
8445617191 Mark a bunch of kArray variables as const.
This allows the linker to move the variables from the .data section to
the .rodata section.

Bug: libyuv:254
Test: out/Release/libyuv_unittest --gtest_filter=* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1

Change-Id: I6998570f1af4337d7b80313d9e18e36aa20d6ec0
Reviewed-on: https://chromium-review.googlesource.com/777033
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-27 23:38:44 +00:00
Frank Barchard
26173eb73e H010ToAR30 for 10 bit bt.709 YUV to 30 bit RGB
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2

Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-22 23:58:30 +00:00
Frank Barchard
a98d6cdb17 ARGBToAR30 AVX2 conversion function
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: I09c13eb53ba5f1ce1740c013dc587f8300f1d9e0
Reviewed-on: https://chromium-review.googlesource.com/780437
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-21 20:37:01 +00:00
Frank Barchard
46594be758 add ScalePlane_16 unit tests
Tests ScalePlane vs ScalePlane_16 match.

Bug: libyuv:749
Test: LibYUVScaleTest.ScalePlaneDownBy4_Box_16
Change-Id: I3f71748da404982d5d48bfb11bbd3ae95a1d021c
Reviewed-on: https://chromium-review.googlesource.com/765045
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-16 01:40:48 +00:00
Frank Barchard
49d1e3b036 MultiplyRow_16_AVX2 for converting 10 bit YUV
When converting from lsb 10 bit formats to msb, the values
need to be shifted to the top 10 bits.  Using a multiply
allows the different numbers of bits to be copied:
// 128 = 9 bits
// 64 = 10 bits
// 16 = 12 bits
// 1 = 16 bits
Bug: libyuv:751
Test: LibYUVPlanarTest.MultiplyRow_16_Opt
Change-Id: I9cf226053a164baa14155215cb175065b1c4f169
Reviewed-on: https://chromium-review.googlesource.com/762951
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 22:02:32 +00:00
Frank Barchard
2f58d126b9 MergeUV10Row_AVX2 use multiply to handle different bit depths
Instead of hardcoded shift, use a multiply by a parameter.
128 = 9 bits
64 = 10 bits
16 = 12 bits
1 = 16 bits

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: Id925edfdbf91243370c90641b50eb8e7625ec329
Reviewed-on: https://chromium-review.googlesource.com/762523
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 03:38:07 +00:00
Frank Barchard
e26b0a7e0e casting for c89 compatibility and lint cleanup
Bug: libyuv:756
Test: CFLAGS="-m32 -static -std=gnu89 -mno-sse -O2" CXXFLAGS="-m32 -x c -static -std=gnu99 -mno-sse -O2" make -f linux.mk libyuv.a
Change-Id: Ic362f93e01ccbb0bea14f361a58585e79297e7d2
Reviewed-on: https://chromium-review.googlesource.com/759423
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Patrik Höglund <phoglund@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 18:22:17 +00:00
Frank Barchard
735ace2ed3 Re-enable x86 assembly without requiring -msse2
clang does not require -msse2 or -msse for inline, except
the "x" parameter.  So change this to "m" for 32 bit.  64 bit
requires sse2 so use "x" for 64 bit.

gcc requires -msse for xmm registers in clobber list.
Reduce compiler requirement from -msse2 to -msse for enabling
assembly.

Bug: libyuv:754, libyuv:757
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I86df72cfee80b7d349561c1fd7c97ad360767255
Reviewed-on: https://chromium-review.googlesource.com/759303
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 00:51:06 +00:00
Frank Barchard
d997ac287d Revert "Enable SSE2 code without -msse"
This reverts commit 01e994d74e4e3937ee1a3efdc048320a1e51f818.

Change-Id: Ie76710d0f4e641e071889c5125fd3be23cdcdb59
Reviewed-on: https://chromium-review.googlesource.com/758499
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 19:33:09 +00:00
Frank Barchard
01e994d74e Enable SSE2 code without -msse
Bug: libyuv:754
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I74bf8d032013694e65ea7637bc38d3253db53ff2
Reviewed-on: https://chromium-review.googlesource.com/758043
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 02:54:41 +00:00
Frank Barchard
a0c32b9e49 MergeUV10Row_AVX2 for converting H010 to P010
H010 is 10 bit planar format with 10 bits in lower bits.
P010 is 10 bit biplanar format with 10 bits in upper bits.
This function weaves the U and V channels and shifts the bits
into the upper bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: I4a0bac0ef1ff95aa1b8d68261ec8e8e86f2d1fbf
Reviewed-on: https://chromium-review.googlesource.com/752692
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-03 18:55:36 +00:00
Frank Barchard
1e16cb5c38 SplitRGBPlane and MergeRGBPlane functions added
Converts packed RGB to planar and back.

TBR=kjellander@chromium.org
BUG=libyuv:728
TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added

Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc
Reviewed-on: https://chromium-review.googlesource.com/658069
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-11 21:02:04 +00:00
Frank Barchard
6825b161d7 HalfFloat SSE2/AVX2 optimized port scheduling.
Uses 1 add instead of 2 leas to reduce port pressure on ports 1 and 5
used for SIMD instructions.

BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -arch HSW out/Release/obj/libyuv/row_gcc.o

Change-Id: I3965ee5dcb49941a535efa611b5988d977f5b65c
Reviewed-on: https://chromium-review.googlesource.com/433391
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-11 01:02:06 +00:00
Frank Barchard
76e7f104ae documentation updates
BUG=None
TEST=Untested

Change-Id: I8ab95654255d1aa9cf05a664ecf59ee6c0757e66
Reviewed-on: https://chromium-review.googlesource.com/434941
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-02 18:31:32 +00:00
Frank Barchard
749e316ed8 Remove commented out code
TEST=None
BUG=libyuv:672
Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169

Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169
Reviewed-on: https://chromium-review.googlesource.com/430321
Reviewed-by: Aaron Gable <agable@chromium.org>
2017-01-20 02:03:12 +00:00
Frank Barchard
a7c87e19f0 add Intel Code Analyst markers
add macros to enable/disable code analyst around blocks of code.

Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.

BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com

Review-Url: https://codereview.chromium.org/2626193002 .
2017-01-13 15:50:24 -08:00
Frank Barchard
3028e1bd97 clang-format row_gcc.cc with some functions disabled
BUG=libyuv:654
TEST=try bots build
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2484083003 .
2016-11-07 18:37:29 -08:00
Frank Barchard
e62309f259 clang-format libyuv
BUG=libyuv:654
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
c2073823b4 use __OPTIMIZE__ macro to determine debug vs release.
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.

BUG=libyuv:602
TEST=untested
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2451503002 .
2016-10-24 18:02:48 -07:00
Frank Barchard
451af5e922 scale by 1 for neon implemented
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fcvtn      v4.4h, v2.4s                   \n"  // 8 floatsgit
    "fcvtn2     v4.8h, v1.4s                   \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  :
  : "cc", "memory", "v1", "v2", "v4"
  );
}

void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fmul       v2.4s, v2.4s, %3.s[0]          \n"  // adjust exponent
    "fmul       v1.4s, v1.4s, %3.s[0]          \n"
    "uqshrn     v4.4h, v2.4s, #13              \n"  // isolate halffloat
    "uqshrn2    v4.8h, v1.4s, #13              \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  : "w"(scale * 1.9259299444e-34f)    // %3
  : "cc", "memory", "v1", "v2", "v4"
  );
}

TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2430313008 .
2016-10-21 14:30:03 -07:00
Frank Barchard
550cf829fb HalfFloat avx2 unpack bug fix.
AVX unpack parameters were reverse ordered causing incorrect results
on AVX2 hardware.

TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2438893002 .
2016-10-20 15:49:00 -07:00
Frank Barchard
2d80fc3133 Port HalfFloatRow_SSE2 to AVX2 but not using F16C.
R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2421993002 .
2016-10-14 19:01:41 -07:00
Frank Barchard
a5e93766a2 Add ARGBExtractAlpha_AVX2 function
Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2420553002 .
2016-10-13 16:03:43 -07:00
Frank Barchard
d363ea6527 Remove I411 support.
YUV 411 is very uncommon format.  Remove support.

Update documentation to reflect that 411 is deprecated.

Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.

BUG=libyuv:645
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2406123002 .
2016-10-11 11:14:16 -07:00
Frank Barchard
aa197ee1a3 HalfFloat_SSE2 for Visual C
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows
R=hubbe@chromium.org, wangcheng@google.com

Review URL: https://codereview.chromium.org/2387713002 .
2016-10-03 10:33:38 -07:00
Frank Barchard
4a14cb2e81 HalfFloat_SSE2 port from C algorithm to SSE2
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2381493006 .
2016-09-30 09:47:16 -07:00
Frank Barchard
7fc932ddd3 Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560,chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2371293002 .
2016-09-29 15:06:30 -07:00
Frank Barchard
fd3e676e91 android_full_debug x86 fix - use +rm for width count
Work around for android full debug build runnign out of registers.
5 functions were running out of registers causing the compiler error
error: 'asm' operand has impossible constraints
These functions mostly have 4 pointers, a counter (width) and a tempory
eax register.  With fpic and debug using stackframes, 2 registers are
unavailable.  So a total of 8 registers are used.
Although fpic and stack frame dont apply to assembly, the compiler
reserves 2 registers.  The optimized version builds, so its likely
freeing up the registers once it knows they are not used.
These functions used to build, so compile options and/or compiler may
have updated.. likely fpic was turned on.
An attribute can be done to disable each, and will avoid using the
2 GPR registers, but they are still reserved and unavailable in debug
builds on current compilers (gcc 4.9 and clang 3.8).

R=dhrosa@google.com
BUG=libyuv:602

Review URL: https://codereview.chromium.org/2066933002 .
2016-06-14 15:25:28 -07:00
Magnus Jedvert
942db3016a Add ARGBExtractAlpha function
BUG=libyuv:572
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/1995293002 .
2016-05-26 10:30:57 +02:00
Frank Barchard
cf101116c9 Remove initialize to zero on output variables for inline.
Inline that uses temporary variables is currently initializing them
to 0 and passing in as output "+r".
This CL replaces the output constraint to "=&r" for most meaning an
output with early write (before inputs).  This allows the initialize
to zero step to be removed, saving 1 instruction.

BUG=libyuv:580
TESTED=local libyuv build on gcc/linux and try bots
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1895743008 .
2016-04-18 16:24:26 -07:00
Frank Barchard
22e062a448 Port ARGBToJ420 to AVX2
ARGBToJ420 had an SSSE3 version, but not AVX2.
ARGBToI420 had an AVX2, so adapt that code to J420.

TBR=harryjin@google.com
BUG=libyuv:553

Review URL: https://codereview.chromium.org/1702373004 .
2016-02-17 23:16:39 -08:00
Frank Barchard
cc33dc68c7 Port I411ToARGBRow to AVX2.
An SSSE3 version already exists, and an AVX2 version is available for
Visual C.  This ports the function to AVX2 completing the AVX2 ports of
all YUV to RGB functions for AVX2 on gcc.

TBR=harryjin@google.com
BUG=libyuv:555

Review URL: https://codereview.chromium.org/1687253002 .
2016-02-12 10:26:10 -08:00
Frank Barchard
9e39c1f271 ubsan overflow fix for multiply by 0x01010101
This is an UBSan error reported by libjingle

[ RUN      ] WebRtcVideoFrameTest.ConvertToYUY2BufferStride
[000:000] (videoframe.cc:375): Validate frame passed. format: I420 bpp: 12 size: 1280x720 bytes: 1382400 expected: 1382400 sample[0..3]: 73, 73, 73, 73
../../chromium/src/third_party/libyuv/source/row_gcc.cc:2903:25: runtime error: signed integer overflow: 128 * 16843009 cannot be represented in type 'int'
[8/614] WebRtcVideoFrameTest.ConvertToYUY2BufferStride returned/aborted with exit code 1 (32 ms)
[9/614] WebRtcVideoFrameTest.ConvertToYUY2BufferInverted (29 ms)
Note: Google Test filter = WebRtcVideoFrameTest.ConvertToYUY2BufferInverted

The source is uint8 and the multiply is by 0x01010101 to replicate the byte to 4 bytes.
Changing the constant to 0x01010101u should avoid overflow.

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:563

Review URL: https://codereview.chromium.org/1657533005 .
2016-02-01 12:29:04 -08:00
Frank Barchard
081475b3c8 refactor ARGBToI422 using ARGBToI420 internally
R=harryjin@google.com
BUG=libyuv:546

Review URL: https://codereview.chromium.org/1574253004 .
2016-01-12 17:05:49 -08:00
Frank Barchard
23c6a83561 Fix ifdef mismatch for mirroruv
Macro define and macro ifdef didnt match, leading to C code
being used.  Make macro match function name.

TBR=harryjin@google.com
BUG=libyuv:543

Review URL: https://codereview.chromium.org/1579023002 .
2016-01-11 16:33:36 -08:00
Frank Barchard
71deb7ba3a bug fix - remove shift from InterpolateRow_AVX2
TBR=harryjin@google.com
BUG=libyuv:537

Review URL: https://codereview.chromium.org/1547703002 .
2015-12-22 10:28:48 -08:00
Frank Barchard
2cb2e9e1ad fix for InterpolateRow_AVX2
TBR=harryjin@google.com
BUG=libyuv:535

Review URL: https://codereview.chromium.org/1543773002 .
2015-12-21 18:35:12 -08:00
Frank Barchard
3f4d86053e avx2 interpolate use 8 bit
BUG=libyuv:535
R=dhrosa@google.com

Review URL: https://codereview.chromium.org/1535833003 .
2015-12-21 10:57:32 -08:00
Frank Barchard
f4447745ae Add rounding to InterpolateRow for improved quality and consistency.
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly.  Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.

BUG=libyuv:535
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1533643005 .
2015-12-17 15:24:06 -08:00
Frank Barchard
1ccbf8fb7b use memory for loop counter to work around nearly out of registers
TBR=harryjin@google.com
BUG=libyuv:533

Review URL: https://codereview.chromium.org/1535433003 .
2015-12-16 17:13:37 -08:00
Frank Barchard
cb44936403 fix typo in avx2 gcc blend.
was using wrong register on 32 pixel version.

R=harryjin@google.com, dhrosa@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1511433006 .
2015-12-09 10:38:46 -08:00
Frank Barchard
dee77a4ebe Optimize yuv alpha blend AVX2 code to do 32 pixels at time.
out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt

Was LibYUVPlanarTest.I420Blend_Opt (2335 ms)
Now LibYUVPlanarTest.I420Blend_Opt (1937 ms)

vs SSSE3
LibYUVPlanarTest.I420Blend_Opt (2599 ms)

BUG=libyuv:527
R=dhrosa@google.com

Review URL: https://codereview.chromium.org/1505673003 .
2015-12-08 18:20:30 -08:00
Frank Barchard
bea690b3e0 AVX2 YUV alpha blender and improved unittests
AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions.

unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1505433002 .
2015-12-05 22:23:29 -08:00
Frank Barchard
fa2618ee26 Port BlendPlaneRow_SSSE3 to GCC
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1490273006 .
2015-12-04 11:19:41 -08:00
Frank Barchard
526558b2d8 disable debug build of 411 to work around compiler bug
TBR=harryjin@google.com
BUG=libyuv:524

Review URL: https://codereview.chromium.org/1461013002 .
2015-11-19 02:25:00 -08:00
Frank Barchard
b7dfb72559 fix for I411 build error on 32 bit x86
TBR=harrjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1461693004 .
2015-11-19 01:45:14 -08:00
Frank Barchard
528356a128 syntax fix for gcc movzwl
TBR=harryjin@google.com
BUG=libtyv:525

Review URL: https://codereview.chromium.org/1460723003 .
2015-11-18 13:14:15 -08:00
Frank Barchard
50f8cb2db3 port I411 movzx 2 byte reader to gcc
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)

TBR=harryjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1457783004 .
2015-11-18 13:05:39 -08:00