Frank Barchard
081475b3c8
refactor ARGBToI422 using ARGBToI420 internally
...
R=harryjin@google.com
BUG=libyuv:546
Review URL: https://codereview.chromium.org/1574253004 .
2016-01-12 17:05:49 -08:00
Frank Barchard
8030a711aa
Rename rotate tests to include _Opt and disable _Odd tests
...
TBR=harryjin@google.com
BUG=libyuv:543
Review URL: https://codereview.chromium.org/1577723003 .
2016-01-11 17:30:27 -08:00
Frank Barchard
fc52d8ded2
Odd width variation of scale down by 2 for subsampling
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:538
Review URL: https://codereview.chromium.org/1558093003 .
2016-01-06 15:12:17 -08:00
Frank Barchard
2560df9513
add clang variable for other apps to use
...
R=dhrosa@google.com
BUG=libyuv:539
Review URL: https://codereview.chromium.org/1557923005 .
2016-01-05 11:47:55 -08:00
Frank Barchard
36615d62a0
fix for InterpolateRow_AVX2
...
port scaledownby4_avx2 to gcc
TBR=harryjin@google.com
BUG=libyuv:492
Review URL: https://codereview.chromium.org/1546763002 .
2015-12-22 12:29:54 -08:00
Frank Barchard
71deb7ba3a
bug fix - remove shift from InterpolateRow_AVX2
...
TBR=harryjin@google.com
BUG=libyuv:537
Review URL: https://codereview.chromium.org/1547703002 .
2015-12-22 10:28:48 -08:00
Frank Barchard
2cb2e9e1ad
fix for InterpolateRow_AVX2
...
TBR=harryjin@google.com
BUG=libyuv:535
Review URL: https://codereview.chromium.org/1543773002 .
2015-12-21 18:35:12 -08:00
Frank Barchard
3f4d86053e
avx2 interpolate use 8 bit
...
BUG=libyuv:535
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1535833003 .
2015-12-21 10:57:32 -08:00
Frank Barchard
029f926a14
add NDEBUG for release chromium buids
...
BUG=libyuv:533
TBR=harryjin@google.com
Review URL: https://codereview.chromium.org/1531143002 .
2015-12-16 16:23:09 -08:00
Frank Barchard
216e93b4e8
Fix MIPS DSPR2 build failure.
...
Fixing the failure:
'TransposeWx8_Fast_MIPS_DSPR2' was not declared in this scope
BUG=none
R=fbarchard@chromium.org
Review URL: https://codereview.chromium.org/1527243002 .
2015-12-16 10:37:42 -08:00
Frank Barchard
70445ef2ef
avx2 scale down by 2 for gcc
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1520423003 .
2015-12-15 10:59:20 -08:00
Frank Barchard
ae55e41851
use rounding in scaledown by 2
...
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
2015-12-14 17:25:36 -08:00
Frank Barchard
8bca9fc178
remove unused var in a test
...
remove include from unittest.cc that is already done by unittest.h
TBR=harryjin@google.com
BUG=libyuv:530
Review URL: https://codereview.chromium.org/1513263004 .
2015-12-10 18:39:36 -08:00
Frank Barchard
44373d8fbb
Add check for DEBUG to functions disabled on 386
...
Some functions run out of registers when compiled for debug,
fpic, with stack frames on 32 bit x86 with clang.
Previously they were enabled based on _DEBUG but that macro
is not set in some build systems. This CL adds DEBUG macro as
well to cover those environments.
R=harryjin@google.com
BUG=libyuv:532
Review URL: https://codereview.chromium.org/1517693005 .
2015-12-10 15:42:46 -08:00
Frank Barchard
a2ea905679
BlendPlane any width.
...
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)
Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)
which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1513443002 .
2015-12-08 18:59:48 -08:00
Frank Barchard
fae1a10545
Work around bug in xgetbv for Visual Studio.
...
xgetbv is generating bad code, falsely disabling AVX2 and AVX512.
disable optimization for the function affected on older versions of Visual C 32 bit.
R=brucedawson@chromium.org , dhrosa@google.com , harryjin@google.com
BUG=libyuv:529
Review URL: https://codereview.chromium.org/1503393004 .
2015-12-08 18:13:32 -08:00
Frank Barchard
2657688e70
Add support for odd height YUVA alpha blending.
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1507683003 .
2015-12-07 12:03:20 -08:00
Frank Barchard
bea690b3e0
AVX2 YUV alpha blender and improved unittests
...
AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions.
unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1505433002 .
2015-12-05 22:23:29 -08:00
Frank Barchard
8af0ebf816
planar blend use signed images
...
R=dhrosa@google.com , harryjin@google.com , jzern@chromium.org
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1491533002 .
2015-12-02 14:20:17 -08:00
Frank Barchard
b6f37bd8ec
Interpolate plane initial implementation.
...
YUV version of interpolation between two images.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:526
Review URL: https://codereview.chromium.org/1479593002 .
2015-11-25 16:11:42 -08:00
Frank Barchard
88552486f1
disable 411 on x86 due to compile error
...
TBR=harryjin@google.com
BUG=libyuv:524
Review URL: https://codereview.chromium.org/1468523002 .
2015-11-20 11:21:39 -08:00
Frank Barchard
526558b2d8
disable debug build of 411 to work around compiler bug
...
TBR=harryjin@google.com
BUG=libyuv:524
Review URL: https://codereview.chromium.org/1461013002 .
2015-11-19 02:25:00 -08:00
Frank Barchard
b7dfb72559
fix for I411 build error on 32 bit x86
...
TBR=harrjin@google.com
BUG=libyuv:525
Review URL: https://codereview.chromium.org/1461693004 .
2015-11-19 01:45:14 -08:00
Frank Barchard
528356a128
syntax fix for gcc movzwl
...
TBR=harryjin@google.com
BUG=libtyv:525
Review URL: https://codereview.chromium.org/1460723003 .
2015-11-18 13:14:15 -08:00
Frank Barchard
50f8cb2db3
port I411 movzx 2 byte reader to gcc
...
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)
TBR=harryjin@google.com
BUG=libyuv:525
Review URL: https://codereview.chromium.org/1457783004 .
2015-11-18 13:05:39 -08:00
Frank Barchard
5eefbe2330
Fix for drmemory failure on I411ToARGB
...
Before
I420ToARGB_Opt (594 ms)
I422ToARGB_Opt (483 ms)
I411ToARGB_Opt (748 ms) ***
I444ToARGB_Opt (452 ms)
I400ToARGB_Opt (218 ms)
After
I420ToARGB_Opt (591 ms)
I422ToARGB_Opt (454 ms)
I411ToARGB_Opt (502 ms) ***
I444ToARGB_Opt (441 ms)
I400ToARGB_Opt (216 ms)
TBR=harryjin@google.com
BUG=libyuv:525
Review URL: https://codereview.chromium.org/1459513002 .
2015-11-17 18:00:52 -08:00
Frank Barchard
ec4b258d4e
free src_a in unittest to fix leak
...
TBR=harryjin@google.com
BUG=libyuv:524
Review URL: https://codereview.chromium.org/1452083002 .
2015-11-17 00:29:53 -08:00
Frank Barchard
0815568a50
test for unaligned vs aligned for CopyRow_SSE2
...
improves performance on older CPUs where movdqa is faster.
TBR=harryjin@google.com
BUG=libyuv:492
Review URL: https://codereview.chromium.org/1455463002 .
2015-11-17 00:04:03 -08:00
Frank Barchard
60adcbaf32
scale with conversion using 2 steps with unittest
...
a prototype function to implement the yuv to rgb with conversion and scale.
replace with 1 step function in future version, using same API.
R=harryjin@google.com
BUG=libyuv:471
Review URL: https://codereview.chromium.org/1421553016 .
2015-11-13 11:25:56 -08:00
Frank Barchard
6100f50f13
fix yvu constants for avx2 yuv to rgb
...
the yvu matrix for yuv to rgb had an incorrect entry, affecting yuv to bgra,
yuv to abgr and yuv to raw.
fix the matrix and reenable avx2 functions.
R=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1411763004 .
2015-11-10 10:45:44 -08:00
Frank Barchard
72a9e282ec
disable more avx2 functions that dont link in chrome
...
libyuv builds/runs, but when integrated into chromium, produces link errors. unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.
TBR=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1427683004 .
2015-11-09 17:20:02 -08:00
Frank Barchard
fb5ed1f4c5
disable 4 AVX2 YUV to RGB conversions which fails tests.
...
disable I422ALPHATOARGBROW_AVX2 I422TOARGBROW_AVX2 I422TORGB24ROW_AVX2 I422TORGBAROW_AVX2 in row.h.
SSSE3 versions will be used instead.
Short term fix until issue can be resolved.
R=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1419513009 .
2015-11-09 14:40:08 -08:00
Frank Barchard
98eb102bea
set d19 alpha on inner loop
...
TBR=harryjin@google.com
BUG=libyuv:521
Review URL: https://codereview.chromium.org/1429263004 .
2015-11-06 11:38:21 -08:00
Frank Barchard
431cb3667a
YUV to RGB for x64 use registers instead of memory.
...
On Arm the YVU to RGB conversions move constants into registers.
This change does the same for 64 bit intel builds where additional
registers are available.
The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided.
x64 builds using memory:
AVX2 I420ToARGB_Opt (3059 ms)
SSSE3 I420ToARGB_Opt (3959 ms)
Now using registers
AVX2 I420ToARGB_Opt (2906 ms)
SSSE3 I420ToARGB_Opt (3928 ms)
TBR=harryjin@google.com
BUG=libyuv:520
Review URL: https://codereview.chromium.org/1407353010 .
2015-11-04 16:16:18 -08:00
Frank Barchard
c2bff1a1af
add .gn file for gn builds
...
using a stripped down gn file from webrtc.
BUG=libyuv:411,libyuv:519
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/1417613007 .
2015-11-04 11:09:00 -08:00
Frank Barchard
d95d2169d9
rename yuv matrix constants to be more clear about what they are
...
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1429693006 .
2015-11-03 17:09:53 -08:00
Frank Barchard
87926cec8b
remove store bgra, abgr, raw unused macros
...
TBR=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1420033004 .
2015-11-02 10:40:03 -08:00
Frank Barchard
2c7aa0070a
remove I422ToBGRA and use I422ToRGBA internally
...
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.
Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
811a5ec446
pass clangcl compile options to ignore warnings in gflags.cc
...
R=ajm@chromium.org , ajm@google.com
BUG=libyuv:513,webrtc:760
Review URL: https://codereview.chromium.org/1427643003 .
2015-10-28 10:58:19 -07:00
Frank Barchard
b86dbf24d3
refactor I420AlphaToABGR to use I420AlphaToARGB internally
...
swap U and V and transpose conversion matrix, so I420AlphaToARGB and
I420AlphaToABGR share low level code.
Having less code with same performance allows more focused
optimization for future ARM versions.
R=harryjin@google.com
TBR=harryjin@chromium.org
BUG=libyuv:473,libyuv:516
Review URL: https://codereview.chromium.org/1422263002 .
2015-10-27 14:17:21 -07:00
Frank Barchard
cf160cdbaa
implement I444ToABGR by swapping uv and transpose matrix
...
U contributes to B and G. V contributes to R and G.
By swapping U and V, they contribute to the opposite channels. Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the
new B channel and vice versa.
This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers.
As a result the existing I444ToABGRRow functions are no longer needed and are removed.
Previously this function was only Intel AVX2 optimized for Windwos. Now it is also optimized for Arm and GCC.
ARMv7 Neon
Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms)
Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms)
20.6 times faster.
R=xhwang@chromium.org
BUG=libyuv:515
Review URL: https://codereview.chromium.org/1414133006 .
2015-10-27 10:21:21 -07:00
Frank Barchard
e8ee175549
add unittest that compares ABGR to ARGB
...
TBR=harryjin@google.com
BUG=libyuv:515
Review URL: https://codereview.chromium.org/1423663007 .
2015-10-26 17:51:03 -07:00
Frank Barchard
2844662e1c
Add avx512bw detection code
...
R=harryjin@google.com
BUG=libyuv:514
Review URL: https://codereview.chromium.org/1413463004 .
2015-10-26 14:42:49 -07:00
Frank Barchard
1502832a70
switch cpu flags to 0 for unitialized to avoid compare
...
R=harryjin@google.com
BUG=libyuv:512
Review URL: https://codereview.chromium.org/1418253002 .
2015-10-23 10:57:42 -07:00
Frank Barchard
ad36ba5c48
initialize cpu flags to fix compile error on windows
...
R=harryjin@google.com
BUG=libyuv:512
Review URL: https://codereview.chromium.org/1422733003 .
2015-10-22 15:16:31 -07:00
Frank Barchard
00f15e3c6c
color unittest allow j420 error of 5 for arm
...
R=harryjin@google.com
BUG=libyuv:511
Review URL: https://codereview.chromium.org/1412683005 .
2015-10-22 11:25:04 -07:00
Frank Barchard
430bb0a0f0
odd width 444 fix
...
TBR=harryjin@google.com
BUG=libyuv:510
Review URL: https://codereview.chromium.org/1415583003 .
2015-10-21 20:03:19 -07:00
Frank Barchard
ba4b409d51
Fix ARGBToI411 odd width bug.
...
The any function for handling ARGBToI411 was not handling the pixel
replication correctly. On 422 and odd width was handled by duplicating
a pixel of source. 411 needs replication for remainders of 1, 2 or 3
pixels.
The C version was handling odd width but with an average of the remainder
pixels, which does not match the SIMD 'any' handling off remainder.
This changes the odd width handling to mimic the any version.
TBR=harryjin@google.com
BUG=libyuv:491
Review URL: https://codereview.chromium.org/1411733004 .
2015-10-21 12:22:24 -07:00
Frank Barchard
9daa550a2e
Move cpu_info variable outside ifdef
...
Fix compile error on arm, mips etc due to undefined variable.
TBR=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1403373008 .
2015-10-20 16:32:44 -07:00
Frank Barchard
9be6d21ae7
write to cpu_flags once
...
To make init cpu flags thread safe, there can only be one write to the variable.
R=richard.winterton@intel.com , harryjin@google.com
BUG=libyuv:508
Review URL: https://codereview.chromium.org/1412793006 .
2015-10-20 16:24:01 -07:00