Manojkumar Bhosale
a899dea251
Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBAttenuateRow_MSA - ~1.1x
ARGBAttenuateRow_Any_MSA - ~1.1x
ARGBToRGB565DitherRow_MSA - ~6.4x
ARGBToRGB565DitherRow_Any_MSA - ~6.2x
ARGBShuffleRow_MSA - ~5.1x
ARGBShuffleRow_Any_MSA - ~1.9x
ARGBShadeRow_MSA - ~1.1x
ARGBGrayRow_MSA - ~2.6x
ARGBSepiaRow_MSA - ~11.6x
Performance Gain (vs C non-vectorized)
ARGBAttenuateRow_MSA - ~2.46x
ARGBAttenuateRow_Any_MSA - ~2.45x
ARGBToRGB565DitherRow_MSA - ~9.4x
ARGBToRGB565DitherRow_Any_MSA - ~12.5x
ARGBShuffleRow_MSA - ~5.2x
ARGBShuffleRow_Any_MSA - ~1.9x
ARGBShadeRow_MSA - ~4.3x
ARGBGrayRow_MSA - ~10.5x
ARGBSepiaRow_MSA - ~12.2x
Review-Url: https://codereview.chromium.org/2559693002 .
2016-12-15 12:06:02 +05:30
Manojkumar Bhosale
6fa5e4eb78
Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
TransposeWx8_MSA - ~2.7x
TransposeWx8_Any_MSA - ~2.1x
TransposeUVWx8_MSA - ~2.5x
TransposeUVWx8_Any_MSA - ~2.7x
Performance Gain (vs C non-vectorized)
TransposeWx8_MSA - ~4.6x
TransposeWx8_Any_MSA - ~2.9x
TransposeUVWx8_MSA - ~4.4x
TransposeUVWx8_Any_MSA - ~3.7x
Review URL: https://codereview.chromium.org/2553403002 .
2016-12-15 10:06:01 +05:30
Frank Barchard
b18fd21d3c
Android420ToI420 - use ptrdiff_t for difference of u and v pointers
...
The difference was assigned to an int, causing a warning on Visual C.
BUG=662
TEST=tested with try bots.
R=devangelakos@google.com
Review-Url: https://codereview.chromium.org/2574373002 .
2016-12-14 11:53:55 -08:00
Frank Barchard
dde8ba7009
ConvertFromI420: use halfstride instead of halfwidth
...
BUG=libyuv:660
TEST=try bots
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2554213003 .
2016-12-07 10:16:16 -08:00
Manojkumar Bhosale
56b5bbb0be
Add MSA optimized ARGB scaling functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ScaleARGBRowDown2_MSA - ~2.6x
ScaleARGBRowDown2Linear_MSA - ~7.9x
ScaleARGBRowDown2Box_MSA - ~3.7x
ScaleARGBRowDownEven_MSA - ~1.2x
ScaleARGBRowDownEvenBox_MSA - ~3.5x
ScaleARGBRowDown2_Any_MSA - ~2.6x
ScaleARGBRowDown2Linear_Any_MSA - ~7.9x
ScaleARGBRowDown2Box_Any_MSA - ~3.6x
ScaleARGBRowDownEven_Any_MSA - ~1.2x
ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x
Performance Gain (vs C non-vectorized)
ScaleARGBRowDown2_MSA - 2.6x
ScaleARGBRowDown2Linear_MSA - 13.5x
ScaleARGBRowDown2Box_MSA - 5.8x
ScaleARGBRowDownEven_MSA - 1.2x
ScaleARGBRowDownEvenBox_MSA - 3.7x
ScaleARGBRowDown2_Any_MSA - 2.6x
ScaleARGBRowDown2Linear_Any_MSA - 13.5x
ScaleARGBRowDown2Box_Any_MSA - 5.3x
ScaleARGBRowDownEven_Any_MSA - 1.2x
ScaleARGBRowDownEvenBox_Any_MSA - 3.7x
Review URL: https://codereview.chromium.org/2527983002 .
2016-12-07 11:47:15 +05:30
Manojkumar Bhosale
83f460be33
Add MSA optimized ARGB Multiply/Add/Subtract row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBMultiplyRow_MSA - 1.4x
ARGBAddRow_MSA - 8.6x
ARGBSubtractRow_MSA - 8.6x
ARGBMultiplyRow_Any_MSA - 1.35x
ARGBAddRow_Any_MSA - 7.3x
ARGBSubtractRow_Any_MSA - 7.2x
Performance Gain (vs C non-vectorized)
ARGBMultiplyRow_MSA - 4.4x
ARGBAddRow_MSA - 27x
ARGBSubtractRow_MSA - 22x
ARGBMultiplyRow_Any_MSA - 3.5x
ARGBAddRow_Any_MSA - 23x
ARGBSubtractRow_Any_MSA - 18x
Review URL: https://codereview.chromium.org/2529983002 .
2016-12-02 15:21:10 +05:30
Frank Barchard
da0c29dada
Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBToRGB565Row_MSA - ~1.6x
ARGBToRGB565Row_Any_MSA - ~1.6x
ARGBToARGB1555Row_MSA - ~1.3x
ARGBToARGB1555Row_Any_MSA - ~1.3x
ARGBToARGB4444Row_MSA - ~3.8x
ARGBToARGB4444Row_Any_MSA - ~3.8x
ARGBToUV444Row_MSA - ~2.4x
ARGBToUV444Row_Any_MSA - ~2.4x
Performance Gain (vs C non-vectorized)
ARGBToRGB565Row_MSA - ~2.8x
ARGBToRGB565Row_Any_MSA - ~2.8x
ARGBToARGB1555Row_MSA - ~2.2x
ARGBToARGB1555Row_Any_MSA - ~2.2x
ARGBToARGB4444Row_MSA - ~6.8x
ARGBToARGB4444Row_Any_MSA - ~6.6x
ARGBToUV444Row_MSA - ~6.7x
ARGBToUV444Row_Any_MSA - ~6.7x
Review URL: https://codereview.chromium.org/2520003004 .
2016-11-22 10:47:55 -08:00
Frank Barchard
b1504a8e48
Add MSA optimized ARGBToRGB24Row_MSA and ARGBToRAWRow_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Review URL: https://codereview.chromium.org/2487913004 .
2016-11-18 15:05:10 -08:00
Frank Barchard
97fb18b846
disable I422AlphaToARGBRow_SSSE3 for 32 bit fpic
...
BUG=libyuv:658
TEST=g++ -I include -fPIC -m32 -msse2 -Os -fno-omit-frame-pointer -c source/row_gcc.cc -o row_gcc.o
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2482263003 .
2016-11-08 16:09:09 -08:00
Frank Barchard
3028e1bd97
clang-format row_gcc.cc with some functions disabled
...
BUG=libyuv:654
TEST=try bots build
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2484083003 .
2016-11-07 18:37:29 -08:00
Frank Barchard
c2bc1561ce
Remove unused time variables
...
BUG=None
TEST=None
Review URL: https://codereview.chromium.org/2487603002 .
2016-11-07 17:46:51 -08:00
Frank Barchard
e62309f259
clang-format libyuv
...
BUG=libyuv:654
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
f2c27dafa2
HalfFloat neon armv7 fix for destination pointer.
...
Improved unittests detect different in arm64 rounding.
TEST=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*Half* -a "--libyuv_width=640 --libyuv_height=360"
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2478313004 .
2016-11-07 12:13:04 -08:00
Frank Barchard
eca08525cb
HalfFloat Neon for ARMv7.
...
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.
TEST=out/Release/libyuv_unittest --gtest_filter=*Half*
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2467723002 .
2016-11-01 11:36:51 -07:00
Frank Barchard
10ce829bad
Add MSA optimized I422ToRGB565Row_MSA, I422ToARGB4444Row_MSA and I422ToARGB1555Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
I422ToRGB565Row_MSA : ~1.5x
I422ToRGB565Row_Any_MSA : ~1.5x
I422ToARGB4444Row_MSA : ~1.4x
I422ToARGB4444Row_Any_MSA : ~1.4x
I422ToARGB1555Row_MSA : ~1.4x
I422ToARGB1555Row_Any_MSA : ~1.4x
Performance Gain (vs C non-vectorized)
I422ToRGB565Row_MSA : ~6.8x
I422ToRGB565Row_Any_MSA : ~6.8x
I422ToARGB4444Row_MSA : ~6.6x
I422ToARGB4444Row_Any_MSA : ~6.6x
I422ToARGB1555Row_MSA : ~6.6x
I422ToARGB1555Row_Any_MSA : ~6.6x
Review URL: https://codereview.chromium.org/2445343007 .
2016-10-27 10:47:35 -07:00
Frank Barchard
532f5708a9
Add MSA optimized I422AlphaToARGBRow_MSA and I422ToRGB24Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
I422AlphaToARGBRow_MSA : ~1.4x
I422AlphaToARGBRow_Any_MSA : ~1.4x
I422ToRGB24Row_MSA : ~4.8x
I422ToRGB24Row_Any_MSA : ~4.8x
Performance Gain (vs C non-vectorized)
I422AlphaToARGBRow_MSA : ~7.0x
I422AlphaToARGBRow_Any_MSA : ~7.0x
I422ToRGB24Row_MSA : ~7.9x
I422ToRGB24Row_Any_MSA : ~7.7x
Review URL: https://codereview.chromium.org/2454433003 .
2016-10-26 11:12:17 -07:00
Frank Barchard
02ae8b60c5
Line continuation at end of line with NOLINT before that.
...
BUG=libyuv:634
TEST=git cl lint
TBR=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2453013003 .
2016-10-26 10:42:52 -07:00
Frank Barchard
2c94d6bd5a
document GN for ios
...
BUG=libyuv:643
TEST=gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"arm64\"" && ninja -v -C out/Release libyuv_unittest
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2450853003 .
2016-10-25 17:13:59 -07:00
Frank Barchard
7c309c459f
cherry picking changes needed for deps roll.
...
DEPS roll is needed for mips builds. These additional changes are also
needed for that DEPS roll. These can be done separately.
TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=try bots
Review URL: https://codereview.chromium.org/2446043003 .
2016-10-25 15:54:59 -07:00
Frank Barchard
2488b3105b
White spaces, comments and lint fixes for msa.
...
no functional changes.
TBR=kjellander@chromium.org
BUG=libyuv:634
Review URL: https://codereview.chromium.org/2446313002 .
2016-10-25 11:36:54 -07:00
Frank Barchard
c2073823b4
use __OPTIMIZE__ macro to determine debug vs release.
...
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.
BUG=libyuv:602
TEST=untested
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2451503002 .
2016-10-24 18:02:48 -07:00
Frank Barchard
f5d5bd88d6
Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gains :- (vs C vectorized)
I422ToARGBRow_MSA : ~1.6x
I422ToRGBARow_MSA : ~1.6x
I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x
Performance Gains :- (vs C non-vectorized)
I422ToARGBRow_MSA : ~7x
I422ToRGBARow_MSA : ~7x
I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x
Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
Review URL: https://codereview.chromium.org/2430313005 .
2016-10-24 15:37:08 -07:00
Frank Barchard
451af5e922
scale by 1 for neon implemented
...
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fcvtn v4.4h, v2.4s \n" // 8 floatsgit
"fcvtn2 v4.8h, v1.4s \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
:
: "cc", "memory", "v1", "v2", "v4"
);
}
void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent
"fmul v1.4s, v1.4s, %3.s[0] \n"
"uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat
"uqshrn2 v4.8h, v1.4s, #13 \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
: "w"(scale * 1.9259299444e-34f) // %3
: "cc", "memory", "v1", "v2", "v4"
);
}
TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2430313008 .
2016-10-21 14:30:03 -07:00
Frank Barchard
550cf829fb
HalfFloat avx2 unpack bug fix.
...
AVX unpack parameters were reverse ordered causing incorrect results
on AVX2 hardware.
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half*
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2438893002 .
2016-10-20 15:49:00 -07:00
Frank Barchard
f553db2d30
HalfFloatPlane unittest for denormal half floats
...
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.
TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2424233004 .
2016-10-19 18:13:01 -07:00
Frank Barchard
78c58ab8aa
Add MSA optimized ARGB4444ToI420 and ARGB4444ToARGB functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains : (Auto-vectorized C vs MSA SIMD)
ARGB4444ToYRow_MSA : ~3.0x
ARGB4444ToUVRow_MSA : ~1.8x
ARGB4444ToARGBRow_MSA : ~3.4x
ARGB4444ToYRow_Any_MSA : ~2.8x
ARGB4444ToUVRow_Any_MSA : ~1.7x
ARGB4444ToARGBRow_Any_MSA : ~3.2x
Review URL: https://codereview.chromium.org/2421843002 .
2016-10-19 11:10:51 -07:00
Frank Barchard
e16e3a629f
cpu_id cleanup. no functional change.
...
remove old comment about initialize to zero.
remove ifdef and replace with macro defined to zero.
BUG=None
TEST=try bots
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2425623004 .
2016-10-18 12:26:02 -07:00
Henrik Kjellander
93f47948b1
landmine to clobber old GYP build artifacts to enable moving to GN.
...
BUG=chromium:652188
TBR=fbarchard@chromium.org
Review URL: https://codereview.chromium.org/2427643003 .
2016-10-18 08:56:41 +02:00
Henrik Kjellander
e005669332
PRESUBMIT: Remove GYP trybots
...
As they're being removed from the try server.
BUG=chromium:652188
TBR=fbarchard@chromium.org
Review URL: https://codereview.chromium.org/2426693003 .
2016-10-18 07:54:36 +02:00
Henrik Kjellander
a0a549c5b3
landmine to clobber old GYP build artifacts to enable moving to GN.
...
BUG=chromium:652188
TBR=ehmaldonado@chromium.org
Review URL: https://codereview.chromium.org/2421343002 .
2016-10-17 15:58:38 +02:00
Henrik Kjellander
3d047196a8
Add landmine support
...
After switching bots from GYP to GN, build artifacts are left that fails
the next builds. Since it's unfeasible to clean out all bot machines
it's better to have an automated system for this, which is what landmines is.
By adding a line to tools/get_landmines.py it is possible to clobber each bot
that syncs past that "landmine CL".
BUG=chromium:652188
TBR=ehmaldonado@chromium.org
Review URL: https://codereview.chromium.org/2427633003 .
2016-10-17 15:37:47 +02:00
Henrik Kjellander
fcbb30f593
PRESUBMIT: rename trybots from gn to gyp.
...
After switching the default bots from GYP to GN,
we now only have a few GYP bots left, so rename the trybots
accordingly
BUG=chromium:652188
TBR=fbarchard@chromium.org
Review URL: https://codereview.chromium.org/2425693002 .
2016-10-17 15:09:55 +02:00
Frank Barchard
2d80fc3133
Port HalfFloatRow_SSE2 to AVX2 but not using F16C.
...
R=wangcheng@google.com , hubbe@chromium.org
BUG=libyuv:560
Review URL: https://codereview.chromium.org/2421993002 .
2016-10-14 19:01:41 -07:00
Frank Barchard
fdcf524aac
Add f16c (halffloat) cpuid
...
R=wangcheng@google.com , hubbe@chromium.org
BUG=libyuv:560
Review URL: https://codereview.chromium.org/2418763006 .
2016-10-14 16:34:08 -07:00
Frank Barchard
5333e94e70
Port ARGBExtractAlpha_AVX2 function to windows.
...
BUG=libyuv:572
TEST=try bots
R=wangcheng@google.com , magjed@chromium.org
Review URL: https://codereview.chromium.org/2416783004 .
2016-10-13 23:20:57 -07:00
Frank Barchard
a5e93766a2
Add ARGBExtractAlpha_AVX2 function
...
Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com , magjed@chromium.org
Review URL: https://codereview.chromium.org/2420553002 .
2016-10-13 16:03:43 -07:00
Frank Barchard
9fb3c31b06
Add linux_use_bundled_binutils_override = true to build_overrides.
...
This variable was introduced in https://codereview.chromium.org/2293853002
and causes builds to fail, since is not defined in WebRTC.
BUG=webrtc:6281
TBR=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2418643003 .
2016-10-12 18:15:21 -07:00
Frank Barchard
198bce3959
Cast for clang-cl 64 bit build warnings in unittests
...
R=kjellander@chromium.org
BUG=libyuv:649
Review URL: https://codereview.chromium.org/2414763002 .
2016-10-12 13:09:57 -07:00
Frank Barchard
a7166c3375
Add GN files that need exec_script to list for win64 clang-cl
...
TBR=kjellander@chromium.org
BUG=libyuv:649
TEST=call gn gen out\Release "--args=is_debug=false is_clang=true"
Review URL: https://codereview.chromium.org/2414783002 .
2016-10-12 12:46:22 -07:00
Frank Barchard
d363ea6527
Remove I411 support.
...
YUV 411 is very uncommon format. Remove support.
Update documentation to reflect that 411 is deprecated.
Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.
BUG=libyuv:645
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2406123002 .
2016-10-11 11:14:16 -07:00
Frank Barchard
0071f46a1f
Side by side 420 test
...
I420 output can be slow due to multi channel write.
Putting the U and V into a single side by side buffer can improve performance.
TBR=wangcheng@google.com
BUG=None
Review URL: https://codereview.chromium.org/2403223003 .
2016-10-10 19:28:33 -07:00
Frank Barchard
af87c11c9a
YUY2ToI422 coalesce rows for small images
...
TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToI422_Opt
Review URL: https://codereview.chromium.org/2393393006 .
2016-10-07 18:35:42 -07:00
Frank Barchard
edd3a84d05
libyuv::YUY2ToY for isolating Y channel of YUY2.
...
This function is the first step of YUY2 To I420.
Provided primarily for diagnostics.
TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToY_Opt
Review URL: https://codereview.chromium.org/2399153004 .
2016-10-07 17:20:30 -07:00
Frank Barchard
a2891ec77c
Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains as below,
YUY2ToI422, YUY2ToI420 :-
YUY2ToYRow_MSA : ~10x
YUY2ToUVRow_MSA : ~11x
YUY2ToUV422Row_MSA : ~9x
YUY2ToYRow_Any_MSA : ~6x
YUY2ToUVRow_Any_MSA : ~5x
YUY2ToUV422Row_Any_MSA : ~4x
UYVYToI422, UYVYToI420 :-
UYVYToYRow_MSA : ~10x
UYVYToUVRow_MSA : ~11x
UYVYToUV422Row_MSA : ~9x
UYVYToYRow_Any_MSA : ~6x
UYVYToUVRow_Any_MSA : ~5x
UYVYToUV422Row_Any_MSA : ~4x
Review URL: https://codereview.chromium.org/2397693002 .
2016-10-07 10:37:22 -07:00
Frank Barchard
3b88a19ab1
YUY2ToI422_Any_Neon clean up to not require 16 pixels
...
YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
the last pixel. The replication was not necessary after a previous
change to treat YUY2 to 4 byte macro pixels.
TBR=harryjin@google.com
BUG=libyuv:648
TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"
Review URL: https://codereview.chromium.org/2399143002 .
2016-10-06 12:11:40 -07:00
Frank Barchard
1cd384140d
GN: Add default target
...
This reduces the number of objects when not specifying a
build target during compile. This is especially significant for Android
where the number of objects decreases from 3322 to 1761.
BUG=libyuv:644
R=fbarchard@google.com
Review URL: https://codereview.chromium.org/2395743002 .
2016-10-05 11:17:28 -07:00
Frank Barchard
4b3b310e66
Enable optimize max for GN builds + update docs
...
Optimize max enables O2 for official builds. Normally release builds
are O2 but the official build is Os, affecting performance.
The GYP file was previously updated to enable optimize max,
which enables ltcg and O2.
Documentation updated to show GN builds in docs/getting_started.md
BUG=libyuv:642
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2386093003 .
2016-10-04 11:50:19 -07:00
Frank Barchard
7018f5be0f
Add MSA optimized I422ToYUY2Row, I422ToUYVYRow functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains :-
I422ToYUY2Row_MSA - ~12x
I422ToYUY2Row_Any_MSA - ~7x
I422ToUYVYRow_MSA - ~12x
I422ToUYVYRow_Any_MSA - ~7x
Review URL: https://codereview.chromium.org/2378753004 .
2016-10-03 18:21:31 -07:00
Frank Barchard
aa197ee1a3
HalfFloat_SSE2 for Visual C
...
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560, chromium:445071
TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows
R=hubbe@chromium.org , wangcheng@google.com
Review URL: https://codereview.chromium.org/2387713002 .
2016-10-03 10:33:38 -07:00
Frank Barchard
4a14cb2e81
HalfFloat_SSE2 port from C algorithm to SSE2
...
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560, chromium:445071
TEST=untested
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2381493006 .
2016-09-30 09:47:16 -07:00