156 Commits

Author SHA1 Message Date
Frank Barchard
fbdc43a03c fix wrong HAS_ARGBCOPYALPHAROW_SSE2 ifdef
TBR=kjellander@chromium.org
BUG=libyuv:593
TESTED=try bots pass.

Review URL: https://codereview.chromium.org/2000393002 .
2016-05-23 16:26:02 -07:00
Frank Barchard
127ff512b3 add perf data files to ignores
document play services update

R=jkellander@chromium.org
BUG=none

Review URL: https://codereview.chromium.org/1712463002 .
2016-02-17 21:37:09 -08:00
Frank Barchard
0d880e5bc0 rename MIPS_DSPR2 to DSPR2 for consistency
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor.  This should be named consistently.

Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.

TBR=harryjin@google.com
BUG=libyuv:562

Review URL: https://codereview.chromium.org/1677633002 .
2016-02-05 14:49:54 -08:00
Frank Barchard
081475b3c8 refactor ARGBToI422 using ARGBToI420 internally
R=harryjin@google.com
BUG=libyuv:546

Review URL: https://codereview.chromium.org/1574253004 .
2016-01-12 17:05:49 -08:00
Frank Barchard
f4447745ae Add rounding to InterpolateRow for improved quality and consistency.
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly.  Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.

BUG=libyuv:535
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1533643005 .
2015-12-17 15:24:06 -08:00
Frank Barchard
a2ea905679 BlendPlane any width.
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms

Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)

Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)

which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1513443002 .
2015-12-08 18:59:48 -08:00
Frank Barchard
526558b2d8 disable debug build of 411 to work around compiler bug
TBR=harryjin@google.com
BUG=libyuv:524

Review URL: https://codereview.chromium.org/1461013002 .
2015-11-19 02:25:00 -08:00
Frank Barchard
72a9e282ec disable more avx2 functions that dont link in chrome
libyuv builds/runs, but when integrated into chromium, produces link errors.  unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.

TBR=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1427683004 .
2015-11-09 17:20:02 -08:00
Frank Barchard
860cc0357a Neon versions of I420AlphaToARGB
Add alpha version of YUV to RGB to neon code for ARMv7 and aarch64.
For other YUV to RGB conversions, hoist alpha set to 255 out of loop.

TBR=harryjin@google.com
BUG=libyuv:516

Review URL: https://codereview.chromium.org/1413763017 .
2015-11-03 19:21:36 -08:00
Frank Barchard
ce4c2fad1d Raw 24 bit RGB to RGB24 (bgr)
Add unittests that do 1 step conversion vs 2 step conversion.

Tests end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1419103007 .
2015-11-03 10:30:30 -08:00
Frank Barchard
2c7aa0070a remove I422ToBGRA and use I422ToRGBA internally
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.

Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
5d97b93369 refactor I420ToABGR to use I420ToARGBRow
Using a transposed conversion matrix, I420ToARGB can output ABGR.

R=harryjin@google.com, xhwang@chromium.org
BUG=libyuv:473

Review URL: https://codereview.chromium.org/1413573010 .
2015-10-30 11:56:57 -07:00
Frank Barchard
b86dbf24d3 refactor I420AlphaToABGR to use I420AlphaToARGB internally
swap U and V and transpose conversion matrix, so I420AlphaToARGB and
I420AlphaToABGR share low level code.

Having less code with same performance allows more focused
optimization for future ARM versions.

R=harryjin@google.com
TBR=harryjin@chromium.org
BUG=libyuv:473,libyuv:516

Review URL: https://codereview.chromium.org/1422263002 .
2015-10-27 14:17:21 -07:00
Frank Barchard
cf160cdbaa implement I444ToABGR by swapping uv and transpose matrix
U contributes to B and G.  V contributes to R and G.
By swapping U and V, they contribute to the opposite channels.  Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the
new B channel and vice versa.
This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers.

As a result the existing I444ToABGRRow functions are no longer needed and are removed.

Previously this function was only Intel AVX2 optimized for Windwos.  Now it is also optimized for Arm and GCC.

ARMv7 Neon
Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms)
Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms)
20.6 times faster.

R=xhwang@chromium.org
BUG=libyuv:515

Review URL: https://codereview.chromium.org/1414133006 .
2015-10-27 10:21:21 -07:00
Frank Barchard
430bb0a0f0 odd width 444 fix
TBR=harryjin@google.com
BUG=libyuv:510

Review URL: https://codereview.chromium.org/1415583003 .
2015-10-21 20:03:19 -07:00
Frank Barchard
90335f6043 bug fix for odd width 16/24 bit to i420
A bug was introduced on arm when the code for 'any' width switch to
a temporary stack buffer and simd.
The C version handles odd width by doing 1 pixel, instead of averaging 2.
But the SIMD any version is supposed to replicate the last pixel, then
the subsampling in Neon will average the pixel with itself, producing
the same result.
The previous version did this, but only for ARGB 32 bit, which was to
avoid introducing issues with subsampled YUY2 source.  This CL adds
replication for RGB 16 bit values.

TBR=harryjin@google.com
BUG=libyuv:510

Review URL: https://codereview.chromium.org/1418983003 .
2015-10-21 18:23:02 -07:00
Frank Barchard
5bf4de0806 width and 3 bug fix in odd width support of ARGBToI411
TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1415213002 .
2015-10-21 12:45:08 -07:00
Frank Barchard
ba4b409d51 Fix ARGBToI411 odd width bug.
The any function for handling ARGBToI411 was not handling the pixel
replication correctly.  On 422 and odd width was handled by duplicating
a pixel of source.  411 needs replication for remainders of 1, 2 or 3
pixels.

The C version was handling odd width but with an average of the remainder
pixels, which does not match the SIMD 'any' handling off remainder.
This changes the odd width handling to mimic the any version.

TBR=harryjin@google.com
BUG=libyuv:491

Review URL: https://codereview.chromium.org/1411733004 .
2015-10-21 12:22:24 -07:00
Frank Barchard
cf19a0c9a2 nv21 any fix
R=harryjin@google.com
BUG=libyuv:507

Review URL: https://codereview.chromium.org/1410643002 .
2015-10-15 16:24:51 -07:00
Frank Barchard
76a599ec3b fix jpeg and bt.709 yuvconstants for neon64.
yuv constants for bt.601 were previously ported to neon64, as well
as the code to respect other color spaces.  But the jpeg and bt.709
colour conversion constants were still in armv7 form.  This changes
the constants for aarch64 builds to be compatible with the code.

yuv constants are now passed as const *

Remove Yvu constants which were used for older version on nv21 but not new code.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1398623002 .
2015-10-07 19:46:56 -07:00
Frank Barchard
914a9856c7 Reimplement NV21ToARGB to allow different color matrix.
Low level for NV21ToARGB written to accept yuv matrix used by
other YUV to ARGB functions.
Previously NV21 was implemented for Windows using NV12 with a different
matrix that swapped U and V.  But the Arm version of the low level does
not allow the matrix U and V contributions to be swapped.
Using a new low level function that reads NV21 and uses the same
yuvconstants as other YUV conversion functions allows an Arm port of
this function.

TBR=harryjin@google.com
BUG=libyuv:500

Review URL: https://codereview.chromium.org/1388273002 .
2015-10-06 20:34:44 -07:00
Frank Barchard
2cc1a2b233 Remove sse2 functions that also have ssse3
ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2
Since vast majority of CPUs have SSSE3 now, removing the SSE2
improves the performance of CPU dispatching.

R=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1377053003 .
2015-09-30 14:24:44 -07:00
Frank Barchard
9a0e12f5f1 AVX2 1 step I422AlphaToARGB for gcc and win.
C     I420AlphaToARGB_Opt (5169 ms)
SSSE3 I420AlphaToARGB_Opt (432 ms)
AVX2  I420AlphaToARGB_Opt (358 ms)

and with premultiplication as 2 step process:
I420AlphaToARGB_Premult (7029 ms)
I420AlphaToARGB_Premult (757 ms)
I420AlphaToARGB_Premult (508 ms)

R=harryjin@google.com
BUG=libyuv:496,libyuv:473

Review URL: https://codereview.chromium.org/1372653003 .
2015-09-25 13:37:42 -07:00
Frank Barchard
e365cdde3b I420Alpha row function in 1 pass.
API change - I420AlphaToARGB takes flag indicating if RGB should be
premultiplied by alpha.

This version implements an efficient SSSE3 version for Windows.
C version done in 2 steps.

Was
libyuvTest.I420AlphaToARGB_Any (1136 ms)
libyuvTest.I420AlphaToARGB_Unaligned (1210 ms)
libyuvTest.I420AlphaToARGB_Invert (966 ms)
libyuvTest.I420AlphaToARGB_Opt (1031 ms)
libyuvTest.I420AlphaToABGR_Any (1020 ms)
libyuvTest.I420AlphaToABGR_Unaligned (1359 ms)
libyuvTest.I420AlphaToABGR_Invert (1082 ms)
libyuvTest.I420AlphaToABGR_Opt (986 ms)

R=harryjin@google.com
BUG=libyuv:496

Review URL: https://codereview.chromium.org/1367093002 .
2015-09-25 10:29:20 -07:00
Frank Barchard
f96890a0be yuvconstants for all YUV to RGB conversion functions.
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1363503002 .
2015-09-22 10:26:03 -07:00
Frank Barchard
28427a53e2 I444ToABGR for android
Reimplements I444ToARGB as a matrix function.
new I444ToABGR as matrix functions with wrappers and any functions.
Allows for future J444 and H444 versions.
I444ToABGR user level function added.

BUG=libyuv:490, libyuv:449
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1355733002 .
2015-09-18 11:20:58 -07:00
Frank Barchard
28ce7d94f5 j422toabgr neon port using i422toabgr matrix function.
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1353923003 .
2015-09-17 15:20:55 -07:00
Frank Barchard
6fcbae1409 J422ToARGB Neon but not aarch64
TBR=harryjin@google.com
BUG=libyuv:493

Review URL: https://codereview.chromium.org/1348203004 .
2015-09-17 12:43:05 -07:00
Frank Barchard
6a6b67e7a9 Add H422ToARGB armv7 neon version.
Patch provided by zhongwei.yao@linaro.org

R=fbarchard@chromium.org, fbarchard@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1344393002 .
2015-09-17 10:38:15 -07:00
Frank Barchard
509c644245 Add J422ToARGB armv7 neon version.
R=fbarchard@chromium.org, fbarchard@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1334173005 .
2015-09-15 15:01:48 -07:00
Frank Barchard
ed55d24d9f H420 functionality
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://webrtc-codereview.appspot.com/54869004 .
2015-09-06 11:01:40 -07:00
Frank Barchard
67b06e66cb I422ToABGR for win64. Moves any functions to accomidate win64 subset of formats.
TBR=harryjin@google.com
BUG=libyuv:488

Review URL: https://webrtc-codereview.appspot.com/57679004 .
2015-09-03 11:00:18 -07:00
Frank Barchard
7060e0d826 I420ToABGRMatrix functions with J420ToABGR wrapper.
Allows direct conversion from JPeg to ABGR for android.

BUG=libyuv:488
R=harryjin@google.com

Review URL: https://webrtc-codereview.appspot.com/55719004 .
2015-09-03 10:42:36 -07:00
Frank Barchard
cda9d38a4e xmmword cast for clang
clangcl use compare_win for 32 bit, allowing fallback and enabling avx2 code for clang.
move defines/protos to compare_row.h
fix issue with odd width ARGBCopyAlpha functions by copying destination to temp buffer, then doing alpha copy, then copy back to destination.

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:484

Review URL: https://webrtc-codereview.appspot.com/59379004.
2015-08-18 11:13:12 -07:00
Frank Barchard
278d88f872 Copy Alpha odd width support
R=harryjin@google.com
BUG=none

Review URL: https://webrtc-codereview.appspot.com/59369004.
2015-08-13 15:05:14 -07:00
Frank Barchard
93464b926c Add rotate any support. Fix for sobel for neon which does 16 at a time, not 8. Disable scaling color test that fails on arm. Test is not complete.
R=harryjin@google.com
BUG=libyuv:479

Review URL: https://webrtc-codereview.appspot.com/52229004.
2015-07-28 15:06:20 -07:00
Frank Barchard
97b35daf75 disable faulty avx2 in argb conversions and box filter. and extend temporary buffer to 128 for an avx2 any function.
R=harryjin@google.com
BUG=libyuv:462
TESTED=libyuv_unittest run on haswell laptop

Review URL: https://webrtc-codereview.appspot.com/53759004.
2015-07-07 15:40:24 -07:00
Frank Barchard
0737ff5bd0 128 for avx2
R=harryjin@google.com
BUG=libyuv:461

Review URL: https://webrtc-codereview.appspot.com/55649004.
2015-07-04 09:13:20 -07:00
Frank Barchard
9487b9d6d8 any allow for avx2 32 pixels at a time of argb
R=harryjin@google.com
BUG=libyuv:461

Review URL: https://webrtc-codereview.appspot.com/54779004.
2015-07-01 17:50:48 -07:00
Frank Barchard
553c7f85f1 mirror odd width with simd
R=harryjin@google.com
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/54769004.
2015-06-23 17:53:02 -07:00
Frank Barchard
6a9ef1ea36 any 1 to 2 with stride use SIMD
R=harryjin@google.com
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/54759004.
2015-06-23 17:08:08 -07:00
Frank Barchard
6dde4f14bd argb to uv read 4 not 8
R=harryjin@google.com
BUG=libyuv:457

Review URL: https://webrtc-codereview.appspot.com/52139004.
2015-06-23 14:48:37 -07:00
Frank Barchard
54100b91c1 copy 2 rows for interpolate and use SIMD.
R=harryjin@google.com
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/50279004.
2015-06-23 10:41:46 -07:00
Frank Barchard
3b5d726a4f 1 to 1 any functions with a parameter use memcpy.
R=harryjin@google.com
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/57619004.
2015-06-22 15:08:20 -07:00
Frank Barchard
a0fca88b1d remove fmemcpy and bump version
R=harryjin@google.com
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/50269004.
2015-06-19 17:58:17 -07:00
Frank Barchard
722e87f19f string.h for memcpy
R=harryjin
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/57609004.
2015-06-19 16:40:22 -07:00
Frank Barchard
dfb2120a42 set us simd
R=harryjin@google.com
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/55629004.
2015-06-19 14:18:48 -07:00
Frank Barchard
6608c100e2 copy last 4
R=harryjin@google.com
BUG=libyuv:448

Review URL: https://webrtc-codereview.appspot.com/54749004.
2015-06-18 17:40:19 -07:00
Frank Barchard
a209d7314b simd for 1 to 1
R=harryjin@google.com, harryjin
BUG=448

Review URL: https://webrtc-codereview.appspot.com/55619004.
2015-06-17 18:22:11 -07:00
Frank Barchard
72a235af9f repeat y for yuy2 so that unittests that check the 2nd y on odd widths will match the C and SIMD. The C code duplicates the last Y.
R=harryjin@google.com
BUG=libyuv:455

Review URL: https://webrtc-codereview.appspot.com/50249004.
2015-06-16 16:27:15 -07:00