When the image height for unittests was set to an
odd height, the TestI420 unittest would not fill
the complete source buffer. This change handles
the odd height test case.
No change to library code.
TBR=harryjin@google.com
BUG=libyuv:549
Review URL: https://codereview.chromium.org/1609103002 .
When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination. Performance
is slower.
Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)
Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545
Review URL: https://codereview.chromium.org/1593833002 .
I420ToNV21 passes the wrong dst_stride_y when it calls I420ToNV12; parameter 8 (convert_from.cc:448) is src_stride_y but should be dst_stride_y. This causes image corruption when converting I420 -> NV21 with mismatched luminance strides.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:547
Review URL: https://codereview.chromium.org/1582793008 .
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
Some functions run out of registers when compiled for debug,
fpic, with stack frames on 32 bit x86 with clang.
Previously they were enabled based on _DEBUG but that macro
is not set in some build systems. This CL adds DEBUG macro as
well to cover those environments.
R=harryjin@google.com
BUG=libyuv:532
Review URL: https://codereview.chromium.org/1517693005 .
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)
TBR=harryjin@google.com
BUG=libyuv:525
Review URL: https://codereview.chromium.org/1457783004 .
libyuv builds/runs, but when integrated into chromium, produces link errors. unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.
TBR=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1427683004 .
disable I422ALPHATOARGBROW_AVX2 I422TOARGBROW_AVX2 I422TORGB24ROW_AVX2 I422TORGBAROW_AVX2 in row.h.
SSSE3 versions will be used instead.
Short term fix until issue can be resolved.
R=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1419513009 .
On Arm the YVU to RGB conversions move constants into registers.
This change does the same for 64 bit intel builds where additional
registers are available.
The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided.
x64 builds using memory:
AVX2 I420ToARGB_Opt (3059 ms)
SSSE3 I420ToARGB_Opt (3959 ms)
Now using registers
AVX2 I420ToARGB_Opt (2906 ms)
SSSE3 I420ToARGB_Opt (3928 ms)
TBR=harryjin@google.com
BUG=libyuv:520
Review URL: https://codereview.chromium.org/1407353010 .
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.
Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1427993004 .
U contributes to B and G. V contributes to R and G.
By swapping U and V, they contribute to the opposite channels. Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the
new B channel and vice versa.
This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers.
As a result the existing I444ToABGRRow functions are no longer needed and are removed.
Previously this function was only Intel AVX2 optimized for Windwos. Now it is also optimized for Arm and GCC.
ARMv7 Neon
Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms)
Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms)
20.6 times faster.
R=xhwang@chromium.org
BUG=libyuv:515
Review URL: https://codereview.chromium.org/1414133006 .