1182 Commits

Author SHA1 Message Date
Frank Barchard
528356a128 syntax fix for gcc movzwl
TBR=harryjin@google.com
BUG=libtyv:525

Review URL: https://codereview.chromium.org/1460723003 .
2015-11-18 13:14:15 -08:00
Frank Barchard
50f8cb2db3 port I411 movzx 2 byte reader to gcc
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)

TBR=harryjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1457783004 .
2015-11-18 13:05:39 -08:00
Frank Barchard
5eefbe2330 Fix for drmemory failure on I411ToARGB
Before
I420ToARGB_Opt (594 ms)
I422ToARGB_Opt (483 ms)
I411ToARGB_Opt (748 ms) ***
I444ToARGB_Opt (452 ms)
I400ToARGB_Opt (218 ms)

After
I420ToARGB_Opt (591 ms)
I422ToARGB_Opt (454 ms)
I411ToARGB_Opt (502 ms)  ***
I444ToARGB_Opt (441 ms)
I400ToARGB_Opt (216 ms)

TBR=harryjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1459513002 .
2015-11-17 18:00:52 -08:00
Frank Barchard
0815568a50 test for unaligned vs aligned for CopyRow_SSE2
improves performance on older CPUs where movdqa is faster.
TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1455463002 .
2015-11-17 00:04:03 -08:00
Frank Barchard
1019e4537f port I444ToARGB avx2 code from Visual C to GCC.
SSSE3
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms)
[----------] 8 tests from LibYUVConvertTest (3389 ms total)

AVX2
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms)
[----------] 8 tests from LibYUVConvertTest (2615 ms total)

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1445893002 .
2015-11-13 18:31:22 -08:00
Frank Barchard
60adcbaf32 scale with conversion using 2 steps with unittest
a prototype function to implement the yuv to rgb with conversion and scale.
replace with 1 step function in future version, using same API.

R=harryjin@google.com
BUG=libyuv:471

Review URL: https://codereview.chromium.org/1421553016 .
2015-11-13 11:25:56 -08:00
Frank Barchard
6100f50f13 fix yvu constants for avx2 yuv to rgb
the yvu matrix for yuv to rgb had an incorrect entry, affecting yuv to bgra,
yuv to abgr and yuv to raw.
fix the matrix and reenable avx2 functions.

R=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1411763004 .
2015-11-10 10:45:44 -08:00
Frank Barchard
72a9e282ec disable more avx2 functions that dont link in chrome
libyuv builds/runs, but when integrated into chromium, produces link errors.  unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.

TBR=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1427683004 .
2015-11-09 17:20:02 -08:00
Frank Barchard
98eb102bea set d19 alpha on inner loop
TBR=harryjin@google.com
BUG=libyuv:521

Review URL: https://codereview.chromium.org/1429263004 .
2015-11-06 11:38:21 -08:00
Frank Barchard
431cb3667a YUV to RGB for x64 use registers instead of memory.
On Arm the YVU to RGB conversions move constants into registers.
This change does the same for 64 bit intel builds where additional
registers are available.
The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided.

x64 builds using memory:
AVX2  I420ToARGB_Opt (3059 ms)
SSSE3 I420ToARGB_Opt (3959 ms)

Now using registers
AVX2  I420ToARGB_Opt (2906 ms)
SSSE3 I420ToARGB_Opt (3928 ms)

TBR=harryjin@google.com
BUG=libyuv:520

Review URL: https://codereview.chromium.org/1407353010 .
2015-11-04 16:16:18 -08:00
Frank Barchard
860cc0357a Neon versions of I420AlphaToARGB
Add alpha version of YUV to RGB to neon code for ARMv7 and aarch64.
For other YUV to RGB conversions, hoist alpha set to 255 out of loop.

TBR=harryjin@google.com
BUG=libyuv:516

Review URL: https://codereview.chromium.org/1413763017 .
2015-11-03 19:21:36 -08:00
Frank Barchard
d95d2169d9 rename yuv matrix constants to be more clear about what they are
R=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1429693006 .
2015-11-03 17:09:53 -08:00
Frank Barchard
1f1d140bb6 remove mips dsp detect
DSP code is not actually used, only DSPR2.  Remove the detect.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1405043008 .
2015-11-03 16:57:40 -08:00
Frank Barchard
ce4c2fad1d Raw 24 bit RGB to RGB24 (bgr)
Add unittests that do 1 step conversion vs 2 step conversion.

Tests end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1419103007 .
2015-11-03 10:30:30 -08:00
Frank Barchard
87926cec8b remove store bgra, abgr, raw unused macros
TBR=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1420033004 .
2015-11-02 10:40:03 -08:00
Frank Barchard
2c7aa0070a remove I422ToBGRA and use I422ToRGBA internally
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.

Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
5d97b93369 refactor I420ToABGR to use I420ToARGBRow
Using a transposed conversion matrix, I420ToARGB can output ABGR.

R=harryjin@google.com, xhwang@chromium.org
BUG=libyuv:473

Review URL: https://codereview.chromium.org/1413573010 .
2015-10-30 11:56:57 -07:00
Frank Barchard
cdbdf5b723 Fix debug compilation problems for gcc and 32 bit x86.
In some methods with 7 arguments gcc fails to find enough registers
to compile the assembler code when compiling debug. Simplest solution
is to skip the assembler version in debug of those particular functions

(I422Alpha -> ARBG/ABGR)

R=harryjin@google.com,bratell@opera.com
BUG=libyuv:517

Review URL: https://codereview.chromium.org/1423283002 .
2015-10-28 14:27:29 -07:00
Frank Barchard
b86dbf24d3 refactor I420AlphaToABGR to use I420AlphaToARGB internally
swap U and V and transpose conversion matrix, so I420AlphaToARGB and
I420AlphaToABGR share low level code.

Having less code with same performance allows more focused
optimization for future ARM versions.

R=harryjin@google.com
TBR=harryjin@chromium.org
BUG=libyuv:473,libyuv:516

Review URL: https://codereview.chromium.org/1422263002 .
2015-10-27 14:17:21 -07:00
Frank Barchard
cf160cdbaa implement I444ToABGR by swapping uv and transpose matrix
U contributes to B and G.  V contributes to R and G.
By swapping U and V, they contribute to the opposite channels.  Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the
new B channel and vice versa.
This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers.

As a result the existing I444ToABGRRow functions are no longer needed and are removed.

Previously this function was only Intel AVX2 optimized for Windwos.  Now it is also optimized for Arm and GCC.

ARMv7 Neon
Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms)
Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms)
20.6 times faster.

R=xhwang@chromium.org
BUG=libyuv:515

Review URL: https://codereview.chromium.org/1414133006 .
2015-10-27 10:21:21 -07:00
Frank Barchard
2844662e1c Add avx512bw detection code
R=harryjin@google.com
BUG=libyuv:514

Review URL: https://codereview.chromium.org/1413463004 .
2015-10-26 14:42:49 -07:00
Frank Barchard
1502832a70 switch cpu flags to 0 for unitialized to avoid compare
R=harryjin@google.com
BUG=libyuv:512

Review URL: https://codereview.chromium.org/1418253002 .
2015-10-23 10:57:42 -07:00
Frank Barchard
ad36ba5c48 initialize cpu flags to fix compile error on windows
R=harryjin@google.com
BUG=libyuv:512

Review URL: https://codereview.chromium.org/1422733003 .
2015-10-22 15:16:31 -07:00
Frank Barchard
430bb0a0f0 odd width 444 fix
TBR=harryjin@google.com
BUG=libyuv:510

Review URL: https://codereview.chromium.org/1415583003 .
2015-10-21 20:03:19 -07:00
Frank Barchard
90335f6043 bug fix for odd width 16/24 bit to i420
A bug was introduced on arm when the code for 'any' width switch to
a temporary stack buffer and simd.
The C version handles odd width by doing 1 pixel, instead of averaging 2.
But the SIMD any version is supposed to replicate the last pixel, then
the subsampling in Neon will average the pixel with itself, producing
the same result.
The previous version did this, but only for ARGB 32 bit, which was to
avoid introducing issues with subsampled YUY2 source.  This CL adds
replication for RGB 16 bit values.

TBR=harryjin@google.com
BUG=libyuv:510

Review URL: https://codereview.chromium.org/1418983003 .
2015-10-21 18:23:02 -07:00
Frank Barchard
5bf4de0806 width and 3 bug fix in odd width support of ARGBToI411
TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1415213002 .
2015-10-21 12:45:08 -07:00
Frank Barchard
ba4b409d51 Fix ARGBToI411 odd width bug.
The any function for handling ARGBToI411 was not handling the pixel
replication correctly.  On 422 and odd width was handled by duplicating
a pixel of source.  411 needs replication for remainders of 1, 2 or 3
pixels.

The C version was handling odd width but with an average of the remainder
pixels, which does not match the SIMD 'any' handling off remainder.
This changes the odd width handling to mimic the any version.

TBR=harryjin@google.com
BUG=libyuv:491

Review URL: https://codereview.chromium.org/1411733004 .
2015-10-21 12:22:24 -07:00
Frank Barchard
9daa550a2e Move cpu_info variable outside ifdef
Fix compile error on arm, mips etc due to undefined variable.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1403373008 .
2015-10-20 16:32:44 -07:00
Frank Barchard
9be6d21ae7 write to cpu_flags once
To make init cpu flags thread safe, there can only be one write to the variable.

R=richard.winterton@intel.com, harryjin@google.com
BUG=libyuv:508

Review URL: https://codereview.chromium.org/1412793006 .
2015-10-20 16:24:01 -07:00
Frank Barchard
cf19a0c9a2 nv21 any fix
R=harryjin@google.com
BUG=libyuv:507

Review URL: https://codereview.chromium.org/1410643002 .
2015-10-15 16:24:51 -07:00
Frank Barchard
52a5504950 fix for C version of YUV to RGB for Arm
YuvPixel for arm was miscomputing YG.

TBR=harryjin@google.com
BUG=libyuv:506

Review URL: https://codereview.chromium.org/1402333002 .
2015-10-15 12:43:37 -07:00
Frank Barchard
4abd096548 fix for yuv to rgb on arm64.
fill in aarch64 yuv constants to match how the code expects them.

TBR=harryjin@google.com
BUG=libyuv:502

Review URL: https://codereview.chromium.org/1396253004 .
2015-10-12 12:02:54 -07:00
Frank Barchard
2e4466e282 change all pix parameters to width for consistency
TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1398633002 .
2015-10-07 22:30:36 -07:00
Frank Barchard
76a599ec3b fix jpeg and bt.709 yuvconstants for neon64.
yuv constants for bt.601 were previously ported to neon64, as well
as the code to respect other color spaces.  But the jpeg and bt.709
colour conversion constants were still in armv7 form.  This changes
the constants for aarch64 builds to be compatible with the code.

yuv constants are now passed as const *

Remove Yvu constants which were used for older version on nv21 but not new code.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1398623002 .
2015-10-07 19:46:56 -07:00
Frank Barchard
fae8e66d43 Fix for AVX2 dither function.
Fix for 64 bit gcc parameter in dither function which requires m not r,
when ABI uses register.

BUG=none

Review URL: https://codereview.chromium.org/1399463002 .
2015-10-07 19:17:56 -07:00
Frank Barchard
8f0cadede4 port ARGB to 565 dithering AVX2 code to GCC.
Previously the assembly code was only available to Windows.
This CL ports the AVX2 code to GCC syntax.

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1391273003 .
2015-10-07 19:13:59 -07:00
Frank Barchard
cc89e3a77b port ARGB to 565 dithering SSE2 code to GCC.
Previously the assembly code was only available to Windows.
This CL ports the SSE2 code to GCC syntax.

When running a profiler on all the unittests, this function
was the slowest of all functions that still ran in C code.
   3.71%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C

Was
ARGBToRGB565Dither_Opt (2894 ms)
Now
ARGBToRGB565Dither_Opt (432 ms)

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1397673002 .
2015-10-07 18:24:50 -07:00
Frank Barchard
3e38762d6b fix avx2 box filter bug for yuv down sampling.
offset to second group of pixels was off by 16.
should have been 32, not 16.
requires avx2 hardware and wide image for test.

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:492,libyuv:501

Review URL: https://codereview.chromium.org/1395603002 .
2015-10-07 11:02:33 -07:00
Frank Barchard
013080f2d2 Pass yuvconstants to YUV conversions for neon 64 bit
SETUP provided by zhongwei.yao@linaro.org

Previously the 64 bit Neon code had hard coded constants in the setup macro
for YUV conversion, while 32 bit Neon code supported the yuvconstants
parameter.

This change accepts the constants passed to the YUV conversion row function,
allowing different color spaces to be respected - naming JPEG and BT.709.
As well as the existing BT.601.

TBR=harryjin@google.com
BUG=libyuv:472

Review URL: https://codereview.chromium.org/1384323002 .
2015-10-06 22:19:14 -07:00
Frank Barchard
914a9856c7 Reimplement NV21ToARGB to allow different color matrix.
Low level for NV21ToARGB written to accept yuv matrix used by
other YUV to ARGB functions.
Previously NV21 was implemented for Windows using NV12 with a different
matrix that swapped U and V.  But the Arm version of the low level does
not allow the matrix U and V contributions to be swapped.
Using a new low level function that reads NV21 and uses the same
yuvconstants as other YUV conversion functions allows an Arm port of
this function.

TBR=harryjin@google.com
BUG=libyuv:500

Review URL: https://codereview.chromium.org/1388273002 .
2015-10-06 20:34:44 -07:00
Frank Barchard
68fa59c873 add box scaling avx2 optimization for gcc
TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1392803002 .
2015-10-06 20:01:02 -07:00
Frank Barchard
f00bc9ef46 Add J444ToARGB conversion function.
J444 is JPeg YUV color space with 444 subsampling.
This implementation uses the existing I444ToARGB conversion, which is
BT.601 color space with 444 subsampling, but passing in the jpeg
color matrix constants.

TBR=harryjin@google.com
BUG=449

Review URL: https://codereview.chromium.org/1387313002 .
2015-10-06 18:46:53 -07:00
Frank Barchard
d70293993f port scale box filter sse2 to gcc
TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1393653002 .
2015-10-06 16:54:26 -07:00
Frank Barchard
3eefeaeb69 test xsave before calling xgetbv.
R=agl@chromium.org, harryjin@google.com
BUG=libyuv:497

Review URL: https://codereview.chromium.org/1382803002 .
2015-09-30 17:25:41 -07:00
Frank Barchard
2cc1a2b233 Remove sse2 functions that also have ssse3
ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2
Since vast majority of CPUs have SSSE3 now, removing the SSE2
improves the performance of CPU dispatching.

R=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1377053003 .
2015-09-30 14:24:44 -07:00
Frank Barchard
d039ad6e9b Width use memory instead of register for 32 bit fpic.
Code runs out of registers on 32 bit fpic builts.

TBR=harryjin@google.com
BUG=libyuv:496

Review URL: https://codereview.chromium.org/1369053002 .
2015-09-25 15:36:04 -07:00
Frank Barchard
febc26a2c9 win64 version of I422AlphaToARGB.
Was
I420AlphaToARGB_Premult (8861 ms)
I420AlphaToARGB_Opt (7119 ms)
Now
I420AlphaToABGR_Premult (2840 ms)
I420AlphaToARGB_Opt (484 ms)

C function switched to 1 step.
Was
I420AlphaToARGB_Premult (8862 ms)
I420AlphaToABGR_Opt (6718 ms)

Now
I420AlphaToARGB_Premult (8706 ms)
I420AlphaToARGB_Opt (6541 ms)

R=harryjin@google.com
BUG=libyuv:496, libyuv:473

Review URL: https://codereview.chromium.org/1359183003 .
2015-09-25 15:06:41 -07:00
Frank Barchard
9a0e12f5f1 AVX2 1 step I422AlphaToARGB for gcc and win.
C     I420AlphaToARGB_Opt (5169 ms)
SSSE3 I420AlphaToARGB_Opt (432 ms)
AVX2  I420AlphaToARGB_Opt (358 ms)

and with premultiplication as 2 step process:
I420AlphaToARGB_Premult (7029 ms)
I420AlphaToARGB_Premult (757 ms)
I420AlphaToARGB_Premult (508 ms)

R=harryjin@google.com
BUG=libyuv:496,libyuv:473

Review URL: https://codereview.chromium.org/1372653003 .
2015-09-25 13:37:42 -07:00
Frank Barchard
e365cdde3b I420Alpha row function in 1 pass.
API change - I420AlphaToARGB takes flag indicating if RGB should be
premultiplied by alpha.

This version implements an efficient SSSE3 version for Windows.
C version done in 2 steps.

Was
libyuvTest.I420AlphaToARGB_Any (1136 ms)
libyuvTest.I420AlphaToARGB_Unaligned (1210 ms)
libyuvTest.I420AlphaToARGB_Invert (966 ms)
libyuvTest.I420AlphaToARGB_Opt (1031 ms)
libyuvTest.I420AlphaToABGR_Any (1020 ms)
libyuvTest.I420AlphaToABGR_Unaligned (1359 ms)
libyuvTest.I420AlphaToABGR_Invert (1082 ms)
libyuvTest.I420AlphaToABGR_Opt (986 ms)

R=harryjin@google.com
BUG=libyuv:496

Review URL: https://codereview.chromium.org/1367093002 .
2015-09-25 10:29:20 -07:00
Frank Barchard
d4594beefc switch from ebp to ebx.
ebx encodes more efficiently (1 byte less) for most address modes, than ebp.
previously it was used for 411 format, but the reader uses pinsrw now avoiding
gpr register.

BUG=libyuv:488
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1365003003 .
2015-09-24 17:25:11 -07:00