When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination. Performance
is slower.
Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)
Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545
Review URL: https://codereview.chromium.org/1593833002 .
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly. Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.
BUG=libyuv:535
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1533643005 .
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
cpu flags of 1 disables SIMD and uses C. This used to be 0, but the change
in auto init behavior means that 0 now means uninitialized, and will cause
auto detect to reinit the cpu info. A value of 1 disables the auto init.
TBR=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1408753004 .
in order to compare C and Neon code, a new command line flag is added.
historically environment variables controlled cpu features, but on
android apk it is easier to pass a command line option to disable cpu
optimizations.
R=harryjin@google.com
BUG=libyuv:516
Review URL: https://codereview.chromium.org/1407193009 .
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.
Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1427993004 .
Allows us to ignore flags passed on to us by Chromium build bots
without having to explicitly disable them. (Thanks pbos!)
TESTED=webrtc ran modules_unittests with a bogus flag did not result in an
error.
R=kjellander@chromium.org
BUG=libyuv:507
Review URL: https://codereview.chromium.org/1417573002 .
A hang in color conversion on arm occurs somewhere in yuv to rgb.
Breaking the color test into its own category of test will help
run selective tests to narrow down the issue.
R=harryjin@google.com
BUG=libyuv:506
Review URL: https://codereview.chromium.org/1405543003 .
Low level for NV21ToARGB written to accept yuv matrix used by
other YUV to ARGB functions.
Previously NV21 was implemented for Windows using NV12 with a different
matrix that swapped U and V. But the Arm version of the low level does
not allow the matrix U and V contributions to be swapped.
Using a new low level function that reads NV21 and uses the same
yuvconstants as other YUV conversion functions allows an Arm port of
this function.
TBR=harryjin@google.com
BUG=libyuv:500
Review URL: https://codereview.chromium.org/1388273002 .
J444 is JPeg YUV color space with 444 subsampling.
This implementation uses the existing I444ToARGB conversion, which is
BT.601 color space with 444 subsampling, but passing in the jpeg
color matrix constants.
TBR=harryjin@google.com
BUG=449
Review URL: https://codereview.chromium.org/1387313002 .
API change - I420AlphaToARGB takes flag indicating if RGB should be
premultiplied by alpha.
This version implements an efficient SSSE3 version for Windows.
C version done in 2 steps.
Was
libyuvTest.I420AlphaToARGB_Any (1136 ms)
libyuvTest.I420AlphaToARGB_Unaligned (1210 ms)
libyuvTest.I420AlphaToARGB_Invert (966 ms)
libyuvTest.I420AlphaToARGB_Opt (1031 ms)
libyuvTest.I420AlphaToABGR_Any (1020 ms)
libyuvTest.I420AlphaToABGR_Unaligned (1359 ms)
libyuvTest.I420AlphaToABGR_Invert (1082 ms)
libyuvTest.I420AlphaToABGR_Opt (986 ms)
R=harryjin@google.com
BUG=libyuv:496
Review URL: https://codereview.chromium.org/1367093002 .
random / rand is slow and impacts performance testing.
Although its only called to clear a frame once, a typical profile shows
it high in the overall profile, when doing 1000 frames for a benchmark.
95.10% libyuv_unittest libyuv_unittest [.] YUY2ToARGBRow_SSSE3
2.01% libyuv_unittest libc-2.19.so [.] __random_r
1.13% libyuv_unittest libc-2.19.so [.] __random
Replace random is a faster version for unittests.
set LIBYUV_WIDTH=1280
set LIBYUV_HEIGHT=720
set LIBYUV_REPEAT=999
set LIBYUV_FLAGS=-1
out\release\libyuv_unittest --gtest_filter=*YUY2ToARGB* | findms
Was
libyuvTest.YUY2ToARGB_Opt (497 ms)
Now
libyuvTest.YUY2ToARGB_Opt (454 ms)
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1361813002 .
Reimplements I444ToARGB as a matrix function.
new I444ToABGR as matrix functions with wrappers and any functions.
Allows for future J444 and H444 versions.
I444ToABGR user level function added.
BUG=libyuv:490, libyuv:449
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1355733002 .
clangcl use compare_win for 32 bit, allowing fallback and enabling avx2 code for clang.
move defines/protos to compare_row.h
fix issue with odd width ARGBCopyAlpha functions by copying destination to temp buffer, then doing alpha copy, then copy back to destination.
R=harryjin@google.comTBR=harryjin@google.com
BUG=libyuv:484
Review URL: https://webrtc-codereview.appspot.com/59379004.