NV12ToRGB565Row for Intel is implemented as a 2 step conversion:
NV12ToARGBRow_SSSE3 and ARGBToRGB565Row_SSE2
NV12ToARGBRow has an AVX2 version, so this CL implements
NV12ToRGB565Row_AVX2 with call to NV12ToARGBRow_AVX2 and
ARGBToRGB565Row_SSE2.
R=harryjin@google.com
BUG=libyuv:554
Review URL: https://codereview.chromium.org/1687953002 .
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor. This should be named consistently.
Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.
TBR=harryjin@google.com
BUG=libyuv:562
Review URL: https://codereview.chromium.org/1677633002 .
This is an UBSan error reported by libjingle
[ RUN ] WebRtcVideoFrameTest.ConvertToYUY2BufferStride
[000:000] (videoframe.cc:375): Validate frame passed. format: I420 bpp: 12 size: 1280x720 bytes: 1382400 expected: 1382400 sample[0..3]: 73, 73, 73, 73
../../chromium/src/third_party/libyuv/source/row_gcc.cc:2903:25: runtime error: signed integer overflow: 128 * 16843009 cannot be represented in type 'int'
[8/614] WebRtcVideoFrameTest.ConvertToYUY2BufferStride returned/aborted with exit code 1 (32 ms)
[9/614] WebRtcVideoFrameTest.ConvertToYUY2BufferInverted (29 ms)
Note: Google Test filter = WebRtcVideoFrameTest.ConvertToYUY2BufferInverted
The source is uint8 and the multiply is by 0x01010101 to replicate the byte to 4 bytes.
Changing the constant to 0x01010101u should avoid overflow.
R=harryjin@google.comTBR=harryjin@google.com
BUG=libyuv:563
Review URL: https://codereview.chromium.org/1657533005 .
When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination. Performance
is slower.
Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)
Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545
Review URL: https://codereview.chromium.org/1593833002 .
I420ToNV21 passes the wrong dst_stride_y when it calls I420ToNV12; parameter 8 (convert_from.cc:448) is src_stride_y but should be dst_stride_y. This causes image corruption when converting I420 -> NV21 with mismatched luminance strides.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:547
Review URL: https://codereview.chromium.org/1582793008 .
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly. Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.
BUG=libyuv:535
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1533643005 .
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)
TBR=harryjin@google.com
BUG=libyuv:525
Review URL: https://codereview.chromium.org/1457783004 .
SSSE3
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN ] LibYUVConvertTest.I444ToARGB_Any
[ OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_Unaligned
[ OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_Invert
[ OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_Opt
[ OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms)
[----------] 8 tests from LibYUVConvertTest (3389 ms total)
AVX2
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN ] LibYUVConvertTest.I444ToARGB_Any
[ OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_Unaligned
[ OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_Invert
[ OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_Opt
[ OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms)
[ RUN ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[ OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms)
[----------] 8 tests from LibYUVConvertTest (2615 ms total)
TBR=harryjin@google.com
BUG=libyuv:492
Review URL: https://codereview.chromium.org/1445893002 .
libyuv builds/runs, but when integrated into chromium, produces link errors. unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.
TBR=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1427683004 .
On Arm the YVU to RGB conversions move constants into registers.
This change does the same for 64 bit intel builds where additional
registers are available.
The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided.
x64 builds using memory:
AVX2 I420ToARGB_Opt (3059 ms)
SSSE3 I420ToARGB_Opt (3959 ms)
Now using registers
AVX2 I420ToARGB_Opt (2906 ms)
SSSE3 I420ToARGB_Opt (3928 ms)
TBR=harryjin@google.com
BUG=libyuv:520
Review URL: https://codereview.chromium.org/1407353010 .