Frank Barchard
6546096269
ARGBExtractAlpha 16 pixels at a time for ARM
...
arm64 8 TestARGBExtractAlpha (10019 ms) <-original 64 bit code
arm64 8 x2 TestARGBExtractAlpha (7639 ms)
arm64 16 TestARGBExtractAlpha (7369 ms) <- new 64 bit code
thumb32 8 TestARGBExtractAlpha (9505 ms) <- original 32 bit code
thumb32 8 x2 TestARGBExtractAlpha (7400 ms)
thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code
arm32 8 TestARGBExtractAlpha (10002 ms)
BUG=libyuv:572
TESTED=local test on nexus 9
R=harryjin@google.com , wangcheng@google.com
Review URL: https://codereview.chromium.org/2035573002 .
2016-06-07 10:44:28 -07:00
Frank Barchard
b00d40160a
make unittest allocator align to 64 bytes.
...
blur requires memory be aligned. change the unittest allocator to guarantee 64 byte alignment.
re-enable blur any test that fails if memory is unaligned.
TBR=harryjin@google.com
BUG=libyuv:596,libyuv:594
TESTED=local build passes with row.h removed from tests.
Review URL: https://codereview.chromium.org/2019753002 .
2016-05-27 18:02:47 -07:00
Frank Barchard
ade85fb55c
remove row.h from unittests
...
add SIMD_ALIGNED to unittest header.
BUG=libyuv:594
TESTED=local build passes with row.h removed from tests.
R=harryjin@google.com
Review URL: https://codereview.chromium.org/2001373002 .
2016-05-27 10:57:49 -07:00
Magnus Jedvert
942db3016a
Add ARGBExtractAlpha function
...
BUG=libyuv:572
R=fbarchard@google.com
Review URL: https://codereview.chromium.org/1995293002 .
2016-05-26 10:30:57 +02:00
Frank Barchard
3c862e3d29
Fix stride bug for msan on I420Interpolate.
...
When using C version of I420Interpolate for msan, a 50% interpolation
would cause stride to be cast to int, which could cause erroneous
memory reads on 64 bit build.
This CL makes the stride use ptrdiff_t for HalfRow_C
BUG=libyuv:582
TESTED=try bots tests
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1872953002 .
2016-04-08 15:58:53 -07:00
Frank Barchard
0d880e5bc0
rename MIPS_DSPR2 to DSPR2 for consistency
...
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor. This should be named consistently.
Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.
TBR=harryjin@google.com
BUG=libyuv:562
Review URL: https://codereview.chromium.org/1677633002 .
2016-02-05 14:49:54 -08:00
Frank Barchard
58cb534962
Fix memory overwrite in YUY2ToNV12 odd wdiths
...
When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination. Performance
is slower.
Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)
Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545
Review URL: https://codereview.chromium.org/1593833002 .
2016-01-19 11:28:09 -08:00
Frank Barchard
fc52d8ded2
Odd width variation of scale down by 2 for subsampling
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:538
Review URL: https://codereview.chromium.org/1558093003 .
2016-01-06 15:12:17 -08:00
Frank Barchard
f4447745ae
Add rounding to InterpolateRow for improved quality and consistency.
...
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly. Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.
BUG=libyuv:535
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1533643005 .
2015-12-17 15:24:06 -08:00
Frank Barchard
ae55e41851
use rounding in scaledown by 2
...
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
2015-12-14 17:25:36 -08:00
Frank Barchard
a2ea905679
BlendPlane any width.
...
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)
Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)
which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1513443002 .
2015-12-08 18:59:48 -08:00
Frank Barchard
dee77a4ebe
Optimize yuv alpha blend AVX2 code to do 32 pixels at time.
...
out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt
Was LibYUVPlanarTest.I420Blend_Opt (2335 ms)
Now LibYUVPlanarTest.I420Blend_Opt (1937 ms)
vs SSSE3
LibYUVPlanarTest.I420Blend_Opt (2599 ms)
BUG=libyuv:527
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1505673003 .
2015-12-08 18:20:30 -08:00
Frank Barchard
2657688e70
Add support for odd height YUVA alpha blending.
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1507683003 .
2015-12-07 12:03:20 -08:00
Frank Barchard
48a919d86e
Bug fix for UYVYToNV12 odd height
...
TBR=harryjin@google.com
BUG=libyuv:528
Review URL: https://codereview.chromium.org/1506973002 .
2015-12-07 11:39:48 -08:00
Frank Barchard
bea690b3e0
AVX2 YUV alpha blender and improved unittests
...
AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions.
unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1505433002 .
2015-12-05 22:23:29 -08:00
Frank Barchard
b6f37bd8ec
Interpolate plane initial implementation.
...
YUV version of interpolation between two images.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:526
Review URL: https://codereview.chromium.org/1479593002 .
2015-11-25 16:11:42 -08:00
Frank Barchard
d95d2169d9
rename yuv matrix constants to be more clear about what they are
...
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1429693006 .
2015-11-03 17:09:53 -08:00
Frank Barchard
ce4c2fad1d
Raw 24 bit RGB to RGB24 (bgr)
...
Add unittests that do 1 step conversion vs 2 step conversion.
Tests end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1419103007 .
2015-11-03 10:30:30 -08:00
Frank Barchard
2c7aa0070a
remove I422ToBGRA and use I422ToRGBA internally
...
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.
Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
5d97b93369
refactor I420ToABGR to use I420ToARGBRow
...
Using a transposed conversion matrix, I420ToARGB can output ABGR.
R=harryjin@google.com , xhwang@chromium.org
BUG=libyuv:473
Review URL: https://codereview.chromium.org/1413573010 .
2015-10-30 11:56:57 -07:00
Frank Barchard
76a599ec3b
fix jpeg and bt.709 yuvconstants for neon64.
...
yuv constants for bt.601 were previously ported to neon64, as well
as the code to respect other color spaces. But the jpeg and bt.709
colour conversion constants were still in armv7 form. This changes
the constants for aarch64 builds to be compatible with the code.
yuv constants are now passed as const *
Remove Yvu constants which were used for older version on nv21 but not new code.
TBR=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1398623002 .
2015-10-07 19:46:56 -07:00
Frank Barchard
914a9856c7
Reimplement NV21ToARGB to allow different color matrix.
...
Low level for NV21ToARGB written to accept yuv matrix used by
other YUV to ARGB functions.
Previously NV21 was implemented for Windows using NV12 with a different
matrix that swapped U and V. But the Arm version of the low level does
not allow the matrix U and V contributions to be swapped.
Using a new low level function that reads NV21 and uses the same
yuvconstants as other YUV conversion functions allows an Arm port of
this function.
TBR=harryjin@google.com
BUG=libyuv:500
Review URL: https://codereview.chromium.org/1388273002 .
2015-10-06 20:34:44 -07:00
Frank Barchard
2cc1a2b233
Remove sse2 functions that also have ssse3
...
ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2
Since vast majority of CPUs have SSSE3 now, removing the SSE2
improves the performance of CPU dispatching.
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1377053003 .
2015-09-30 14:24:44 -07:00
Frank Barchard
f96890a0be
yuvconstants for all YUV to RGB conversion functions.
...
R=harryjin@google.com
BUG=libyuv:488
Review URL: https://codereview.chromium.org/1363503002 .
2015-09-22 10:26:03 -07:00
Frank Barchard
278d88f872
Copy Alpha odd width support
...
R=harryjin@google.com
BUG=none
Review URL: https://webrtc-codereview.appspot.com/59369004 .
2015-08-13 15:05:14 -07:00
Frank Barchard
ce98129951
yuy2tonv12
...
R=bcornell@google.com
BUG=libyuv:466
Review URL: https://webrtc-codereview.appspot.com/51309004 .
2015-07-17 16:22:59 -07:00
Frank Barchard
faa4b14f85
uyvy to nv12
...
R=harryjin@google.com
BUG=libyuv:466
Review URL: https://webrtc-codereview.appspot.com/50339004 .
2015-07-17 14:43:19 -07:00
fbarchard@google.com
bd2d903e1b
odd width support for ARGBSobel functions. Improves performance for images that are not a multiple of 8 pixels.
...
BUG=444
TESTED=libyuvTest.ARGBSobel_Opt
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/54589004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1415 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-28 22:22:28 +00:00
fbarchard@google.com
cfce47efc8
Change Sobel to use JPeg Luma calculation instead of extracting G channel. Using luma produces a better sobel that respects all 3 channels of RGB. Historically the G channel was used to improve performance, and because the luma of I420 is a constrained range, hurting quality. Using the JPeg variation of YUV, the luma is more accurate, including cross platform, better optimized for AVX2 and odd widths, and full range.
...
BUG=444
TESTED=ARGBSobelXY_Opt
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/57479004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1414 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-27 22:32:26 +00:00
fbarchard@google.com
8f0b32773c
ARGBToUV AVX2 functions hooked up.
...
BUG=none
TESTED=RGB565ToI420
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/46829004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1359 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-07 00:10:52 +00:00
fbarchard@google.com
e6ca9cc2a2
Scale down by 2 AVX2 port. Processes twice as many pixels as SSE2 and takes advantage of 3 argument instructions to reduce register usage and number of instructions.
...
BUG=314
TESTED=libyuvTest.ScaleDownBy2_Box
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/42959004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1347 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-26 23:21:08 +00:00
fbarchard@google.com
d28cd77f99
Enable assembly for clangcl build on Windows. Previously assembly was disabled so clangcl would work, but only with C code. As clangcl mimics both Visual C and GCC, ifdefs need to pick one or the other or often you'll end up with both. In this CL we disable most Visual C code and use the GCC versions which allow assembly for both 32 and 64 bit intel.
...
BUG=412
TESTED=clang=1 build on windows
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/51389004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1341 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-19 20:36:31 +00:00
fbarchard@google.com
446fa95587
I422ToRGB565, ARGB4444 and ARGB1555 for AVX2
...
BUG=403
TESTED=avx2 emulator
Review URL: https://webrtc-codereview.appspot.com/34359004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1293 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-24 23:14:46 +00:00
fbarchard@google.com
fd8054791c
build fixe for InterpolateRow_MIPS_DSPR2
...
BUG=398
TESTED=untested
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/37999004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1268 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-07 01:01:30 +00:00
fbarchard@google.com
b2a6af1be6
Change rectangle low level functions to use more conventional row functions including 'any' variations. Previously the yuv function SetPlane stored 32 bit values. Now a more conventional memset() style function is used for YUV that stores bytes. On Haswell a rep stosb is used for YUV. Overall benefit of this CL is improved performance for 'any' width, and simpler row assembly instead of full image assembly. Previously ARGBRect used a low level function that supported a rectangle in assembly. Now it uses a row function, and relies on row coalesce to combine into a single low level call.
...
BUG=371
TESTED=untested
R=brucedawson@google.com , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/35689004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1222 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-12 03:58:24 +00:00
fbarchard@google.com
8e3db2dc73
Support invert for ARGBRect and SetPlane
...
BUG=387
TESTED=ARGBRect_Invert
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/37539004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1219 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-07 19:02:01 +00:00
fbarchard@google.com
992c3b089a
Use HAS_ARGBSETROWS_X86 to detect presence of function.
...
BUG=none
TESTED=rectangle unittests
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/35639004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1218 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-07 00:11:51 +00:00
fbarchard@google.com
ef67597b48
ARGBMirror use SSE2 pshufd instruction instead of SSSE3 pshufb.
...
BUG=269
TESTED=local benchmark for ARGBMirror
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/32509004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1176 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-21 19:25:14 +00:00
fbarchard@google.com
91f240c5db
Move sub before branch for loops.
...
Remove CopyRow_x86
Add CopyRow_Any versions for AVX, SSE2 and Neon.
BUG=269
TESTED=local build
R=harryjin@google.com , tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/26209004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1175 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-20 21:14:27 +00:00
fbarchard@google.com
9dd083a512
ARGBMirror Any
...
BUG=none
TESTED=mirror and rotate unittests
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30159004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1172 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-19 00:46:51 +00:00
fbarchard@google.com
59ed448685
MirrorAny functions so assembly can always be used.
...
BUG=none
TESTED=untested
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/29069004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1170 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-18 01:03:47 +00:00
fbarchard@google.com
9ed836b154
The 'Any' versions of functions can handle any width now, so remove the check from the calling code. This has 2 advantages - less code, and less overhead in calling function when any function is NOT used. Downside is more code for case where any is used.
...
BUG=373
TESTED=libyuv_unittest still passes
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/24129004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1143 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-24 23:29:31 +00:00
fbarchard@google.com
0a6dab42c0
Add check for minimum of 8 pixels for any functions and multiple of 8 not 16 for neon functions.
...
BUG=373
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/23189004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1139 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-23 23:05:12 +00:00
fbarchard@google.com
f2fa453b94
Port I422ToABGR to AVX2.
...
BUG=269
TESTED=intelsde on I422ToABGR
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/23149004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1138 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-23 17:20:22 +00:00
fbarchard@google.com
c000955bc0
Port I422ToRGBA to AVX.
...
BUG=269
TESTED=intelsde on I422ToRGBA
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/28769004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1136 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-22 22:41:39 +00:00
fbarchard@google.com
d81dddd3d0
port I420ToBGRA to AVX2.
...
BUG=269
TESTED=c:\intelsde\sde -ast -hsw -- out\release\libyuv_unittest.exe --gtest_filter=*I420ToBGRA*
R=brucedawson@google.com , harryjin@google.com , magjed@chromium.org
Review URL: https://webrtc-codereview.appspot.com/26869004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1127 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-20 19:35:55 +00:00
fbarchard@google.com
055725b8fe
Neon does 8 at a time, so a check is added for any function of I422ToBGRA that width is >= 8 and for fast path that it is a multiple of 8 not 16.
...
BUG=373
TESTED=untested
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/24059004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1126 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-20 18:39:21 +00:00
fbarchard@google.com
f713691a6f
Change elif to endif and if to allow AVX2 as well as SSE2 in future changes instead of one or the other.
...
BUG=none
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30719004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1122 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-16 20:47:22 +00:00
fbarchard@google.com
b720049a54
Make row functions used for planarfunctions and convert use movdqu to relax alignment constraint. Step 1 - make functions unaligned.
...
BUG=365
TESTED=libyuv_unittest passes
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/26709004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1111 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-03 21:11:37 +00:00
fbarchard@google.com
044f914c29
Change scale to unaligned movdqu.
...
BUG=365
TESTED=scale unittests
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/22879004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1101 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-01 01:16:04 +00:00