ARGB rotation using scaling code. Previously it had forward
declarations of the low level row functions used. This CL
uses the header and hooks up Any and MSA versions of the code.
Bug: libyuv:779
Test: perf record out/Release/libyuv_unittest --gtest_filter=*ARGBRotate90_Opt --libyuv_width=640 --libyuv_height=359 --libyuv_repeat=999
Change-Id: Ifacd58b26bb17a236181a404fad589fd2543b911
Reviewed-on: https://chromium-review.googlesource.com/927530
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
BGR variation of 10 bit conversion using swapped U and V
and mirrored matrix to produce AB30 format instead of AR30.
Bug: libyuv:777
Test: LibYUVConvertTest.H010ToAB30_Opt
Change-Id: I96d115a5d1e12138f40cb548871e03aa3ab210eb
Reviewed-on: https://chromium-review.googlesource.com/922284
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281
Reviewed-on: https://chromium-review.googlesource.com/902170
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu (%1),%%xmm2
vmovdqu 0x00(%1,%2,1),%%xmm3
vpermq $0xd8,%%ymm2,%%ymm2
vpermq $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2
..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd (%1),%%ymm2
vpmovzxbd 0x00(%1,%2,1),%%ymm3
vpslld $0x10,%%ymm3,%%ymm3
vpor %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.
Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt
Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Use C99 types internally but define old types for compatibility
with older API. (r1690 and earlier)
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: I06f89537da3875f74e65189897e67b69af2c2ec2
Reviewed-on: https://chromium-review.googlesource.com/882501
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.
[ RUN ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[ OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[ OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[ OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
ABGR output is implemented using the same source code as ARGB, by swapping
the u and v and supplying the mirrored conversion matrix.
ABGR format (RGBA in memory) is popular on Android.
Bug: libyuv:751
Test: H010ToABGR, I010ToABGR and I010ToARGB unittests
Change-Id: I0b5103628c58dcb22a6442c03814d4d5972e0339
Reviewed-on: https://chromium-review.googlesource.com/852985
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.
Bug: libyuv:751
Test: I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Convert planar 8 bit formats to planar 16 bit formats.
Includes msan fix for Convert8To16Row_Opt unittest.
I420 is YUV bt.601 8 bits per channel with 420 subsampling.
I010 is YUV bt.601 10 bits per channel with 420 subsampling.
I is color space - bt.601. The function does no color space
conversion so H420ToI010 is aliased to this function as well.
0 = 420 subsampling. The chroma channels are half width / height.
10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.
For SSSE3 version:
out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
[ RUN ] LibYUVConvertTest.I420ToI010_Opt
[ OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToI010_Opt
Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
Reviewed-on: https://chromium-review.googlesource.com/846421
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned
Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Initial AR30ToARGB function to allow converion
from AR30 to other formats if necessary and/or
for testing.
Not optimized at this point.
Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: I38ef192315240f3caa7aee0218b38d5e88a2849f
Reviewed-on: https://chromium-review.googlesource.com/833025
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reduce allocations of row buffers to 1 alloc/free.
Do 2 rows at a time to avoid converting U and V planes twice.
Bug: libyuv:715
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I2f3a03b4875df5e3b969112a78a1a0b28399fa2f
Reviewed-on: https://chromium-review.googlesource.com/816021
Reviewed-by: Cheng Wang <wangcheng@google.com>
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2
Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.
Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Key instruction sets added for each microarchitecture:
AVX512BW, AVX512VL, AVX512DQ - skylake server or later
AVX512_VBMI, AVX512_IFMA - cannon lake or later
AVX512_BITALG, AVX512_VBMI2, AVX512_VPOPCNTDQ, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ - ice lake or later
Bug: libyuv:752
Test: ~/intelsde/sde -icl -- out/Release/libyuv_unittest --gtest_filter=*Cpu*
Change-Id: I9ee28904c90009d66721b9f805a440c5fc2da122
Reviewed-on: https://chromium-review.googlesource.com/755617
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
mingw gcc sets the macro _M_IX86 which is normally only set
by Visual C and clangcl which are Visual C style source code
style for assembly, but gcc is not Visual C compatible.
Add _MSC_VER to most ifdefs to detect that its really Visual C
or clangcl and not mingw gcc so the gcc source code will be used.
Bug: libyuv:744
Test: CXXFLAGS=-m32 CXX=~/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-g++ make -f linux.mk
Change-Id: I3431aa486eb769b145faa8d5eb75ed639f9d6f5e
Reviewed-on: https://chromium-review.googlesource.com/722319
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The low level hamming distance functions have size limitations
based on counter sizes. The higher level calls the low level
in blocks that avoid overflow and then accumulators in int64.
This test compares the results of the low levels to the high
level and against a known value (all ones) to ensure the
count is correct for any specified size.
The the size is very large, the result is expected to be
different.
Bug: libyuv:701
Test: TestHammingDistance_Opt
Change-Id: I6716af7cd09ac4d88a8afa25bc845a1b62af7c93
Reviewed-on: https://chromium-review.googlesource.com/710800
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
This reverts commit ec75df5894845b8d6b1341885a78db1de83decd8.
Reason for revert: <INSERT REASONING HERE>
Original change's description:
> ComputeHammingDistance reduce SIMD loop to 1 call when possible.
>
> 32 bit x86 has high overhead due to -fpic. So this reduces the
> number of calls by 1.
>
> TBR=kjellander@chromium.org
> Bug: libyuv:701
> Test: BenchmarkHammingDistance
> Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
> Reviewed-on: https://chromium-review.googlesource.com/701755
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Reviewed-by: Cheng Wang <wangcheng@google.com>
TBR=rrwinterton@gmail.com,fbarchard@google.com,wangcheng@google.com
Change-Id: Ia61e8558a8f083c14be5f51e0e141550b6f2b5c1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: libyuv:701
Reviewed-on: https://chromium-review.googlesource.com/707823
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
32 bit x86 has high overhead due to -fpic. So this reduces the
number of calls by 1.
TBR=kjellander@chromium.org
Bug: libyuv:701
Test: BenchmarkHammingDistance
Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
Reviewed-on: https://chromium-review.googlesource.com/701755
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
If length of HammingDistance was not a multiple of 4,
the result was incorrect. The old tests did not catch this
so a new test is done to count 1s.
Bug: libyuv:740
Test: LibYUVCompareTest.TestHammingDistance
Change-Id: I93db5437821c597f1f162ac263d4a594bb83231f
Reviewed-on: https://chromium-review.googlesource.com/699614
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Full color test is the slowest of the unittests, and not catching any
additional bugs at the moment. Step thru range of 0 to 255 in steps of
5 to speed up the test. 255 is 3 * 5 * 17, so any of those primes would
hit 0 and 255 exactly.
Was LibYUVColorTest.TestFullYUV (896 ms)
Now LibYUVColorTest.TestFullYUV (212 ms)
TBR=kjellander@chromium.org
Bug: libyuv:736
Test: LibYUVColorTest.TestFullYUV
Change-Id: I5b55fb07ada0dc7bdc3c3c20569d36bf09bb3804
Reviewed-on: https://chromium-review.googlesource.com/672064
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
add ScaleMaxSamples_NEON function with max
done on original values.
TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleMaxSamples_Opt
Change-Id: Id99338860782b10ffd24f66242eb42014c2e229e
Reviewed-on: https://chromium-review.googlesource.com/614685
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Summing 16 bit hamming codes restricts the maximum length,
but saves an inner loop instruction. The outer loop can sum the
values.
32 bit Neon
Now BenchmarkHammingDistance_Opt (78 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
64 bit Neon
Now BenchmarkHammingDistance_Opt (85 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
R=wangcheng@google.comTBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae
Reviewed-on: https://chromium-review.googlesource.com/526932
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
existing test passes
out/Release/libyuv_unittest --gtest_filter=*Blend* --libyuv_width=33 --libyuv_height=16
new test added
BUG=libyuv:705
TEST=LibYUVScaleTest.TestScaleOdd
Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704
Reviewed-on: https://chromium-review.googlesource.com/524932
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The CpuId function is a wrapper for the intrinsic, or
implemented with inline if unavailable. It had been
using uint32, but the intrinsics use int, so it was causing
casting and lint warnings. This change makes the internal
implementation use int.
Casting was also done for xgetbv, and the cast is simply
removed, and is not causing a build error.
MipCpuCaps was doing strlen to check for white space after the
instruction set. Arm also does this but with a hard coded offset.
This was causing a cast from size_t to int, which produced a lint
warning. The change removes the white space detect.
In theory the code could be used to detect SSE vs SSE2, and it would
need to check SSE is followed by a space or end of line. But this
code is only used on Arm and Mips, where there there is one form
of SIMD detected. e.g. MSA for mips. If a new instruction set is
added with a similar name, the write space check could be reintroduced.
But its more likely the code can be rewritten to use a better form
of detection by then. Or remove detection and require the instructions
BUG=libyuv:641
TEST=try bots build on all platforms without error and lint is clean
Change-Id: I9f55f8e57bba0f78571bdddbe63b945dea3e8809
Reviewed-on: https://chromium-review.googlesource.com/514524
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Wan-Teh Chang <wtc@chromium.org>
Detect the compiler's support of C11 atomics, and use C11 atomics when
available.
Note that libyuv::MaskCpuFlags() is still not thread-safe.
BUG=libyuv:641
TEST= cpu_thread_test.cc adds a pthread based test
R=wangcheng@google.com
Change-Id: If05b1e16da833105a0159ed67ef20f4e61bc7abd
Reviewed-on: https://chromium-review.googlesource.com/510079
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
The verion of clang in ndk r14 (3.9) has a built in llvm assembler
that does not have the sgtu pseudo instruction.
sltu is the actual instruction, so switch the 2 operands and use
the instruction instead of the pseudo op.
BUG=libyuv:700
TEST=try bots build mips without error.
Change-Id: I2d5f94f81acbd56cdedea011e7d9308979e19079
Reviewed-on: https://chromium-review.googlesource.com/494026
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Revert the workaround and fix it properly by passing the
additional necessary flag to the compiler.
BUG=libyuv:700
Change-Id: I1c893a8acb5079decbee6963b689424bf2f99f4f
Reviewed-on: https://chromium-review.googlesource.com/487881
Reviewed-by: Frank Barchard <fbarchard@google.com>
I422ToRGB565Row_Any_AVX2 uses 2 step row conversion that calls
I422ToARGBRow_AVX2 and then ARGBToRGB565.
I422ToARGBRow_AVX2 expects multiple of 16 pixels.
Adjust the I422ToRGB565Row_Any_AVX2 to do multiple of 16 with AVX2
and then remainder in a buffer.
Bug: libyuv: 657
Test: out/Release/libyuv_unittest --gtest_filter=*Convert*I*To* --libyuv_width=1280 --libyuv_height=720
Change-Id: Ice1cb6c7ff6b2295513e8b4a9f77522e1c659810
Reviewed-on: https://chromium-review.googlesource.com/474232
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
RGB24, RAW, RGB565, ARGB1555 and ARGB4444 have conditional
2 pass versus direct path. 2 pass method requires a buffer that
is conditionally allocated. ifdef's were confusing lint.
simplifed ifdefs to clean up lint warning
BUG=libyuv:692
TEST=lint source/convert.cc
Change-Id: If868718af30b48824a5e3d28f0d7d01d4609ad55
Reviewed-on: https://chromium-review.googlesource.com/451552
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Previously if MipsCpuCaps were called with something other than
dspr2 or msa, the file was closed but still used.
This change assumed the function is only called internally twice:
once for msa and once for dspr2. If msa is not being detected,
the function assumed dspr2 was being tested and returns dspr2 was
true.
BUG=libyuv:687
TEST=try bots
Change-Id: I80b328eb5ffc7baf5f1ee5a79c16d75c45ff26cc
Reviewed-on: https://chromium-review.googlesource.com/447831
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
BUG=libyuv:680
TEST=builds and runs with no warnings
Change-Id: I7d60ef44292fa6ad4f7c4e2e2657359b864d2dab
Reviewed-on: https://chromium-review.googlesource.com/442670
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
add macros to enable/disable code analyst around blocks of code.
Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.
BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com
Review-Url: https://codereview.chromium.org/2626193002 .
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.
TEST=out/Release/libyuv_unittest --gtest_filter=*Half*
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2467723002 .
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.
BUG=libyuv:602
TEST=untested
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2451503002 .
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.
TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2424233004 .
YUV 411 is very uncommon format. Remove support.
Update documentation to reflect that 411 is deprecated.
Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.
BUG=libyuv:645
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2406123002 .
Original bt709 color space coefficients were full range yuv for higher
quality. This change makes the coefficients use the video constrained
color space the same as bt601 which is 16 to 240 for Y and 16 to 235 for
chroma channels.
BUG=libyuv:639
TEST=libyuv unittests run locally
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2367253003 .
On visual c 2013 and earlier a warning is generated if externs
are not declared with the same alignment as the declaration, when
using /ltcg
BUG=libyuv:633
TEST=standalong test built with cl /Bv /GL /Ox /nologo a.cc b.cc /link /ltcg
R=skal@google.com
Review URL: https://codereview.chromium.org/2291533004 .
Add test for SplitUVPlane and MergeUVPlane
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists.
TEST=SplitUVPlane unittest
BUG=libyuv:629
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2279603002 .
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists. Also, de-duplicate the
CPU dispatching code for these functions by moving them to helper
functions.
BUG=libyuv:629
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2277603004 .
The conversion from NV12 and other Bi or Tri planar formats, differs only in the UV handling. The helper function supports passing a NULL for the dst_y channel indicating you only want to do the UV conversion.
TBR=harryjin@google.com
TEST=LibYUVConvertTest.NV12ToI420_NullY (601 ms)
BUG=libyuv:626
Review URL: https://codereview.chromium.org/2276703002 .
Fix for duplicate define
../../third_party/libyuv/include/libyuv/scale_row.h:29:9: error: 'LIBYUV_DISABLE_X86' macro redefined [-Werror,-Wmacro-redefined]
^
GYP version relys on headers disabling the optimization.
This CL does the same for BUILD.gn
TBR=kjellander@chromium.org
BUG=libyuv:625
Review URL: https://codereview.chromium.org/2149823003 .
to Y,U,V and a pixel stride for U and V. The pixel stride is expected to be 1 or 2.
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Any
[ OK ] LibYUVConvertTest.Android420ToI420_1_Any (253 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Unaligned
[ OK ] LibYUVConvertTest.Android420ToI420_1_Unaligned (250 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Invert
[ OK ] LibYUVConvertTest.Android420ToI420_1_Invert (254 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Opt
[ OK ] LibYUVConvertTest.Android420ToI420_1_Opt (247 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Any
[ OK ] LibYUVConvertTest.Android420ToI420_2_Any (132 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Unaligned
[ OK ] LibYUVConvertTest.Android420ToI420_2_Unaligned (122 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Invert
[ OK ] LibYUVConvertTest.Android420ToI420_2_Invert (124 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Opt
[ OK ] LibYUVConvertTest.Android420ToI420_2_Opt (119 ms)
TEST=LibYUVConvertTest.Android420ToI420_Opt
BUG=libyuv:604
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2146733002 .
webrtc doesnt include the headers that the functions are prototyped in.
This CL makes the convert.h include those headers to allow webrtc to
update to the head libyuv.
TBR=harryjin@google.com
BUG=libyuv:620,webrtc:6091,webrtc:6094
TESTED=local build and try bots
Review URL: https://codereview.chromium.org/2141683002 .
Mking color conversion use simple arrays within structure, which will be referenced via register pointer.
R=harryjin@google.com
BUG=libyuv:616
TEST=CC=gcc-4.4 CXX=g++-4.4 LD=ld-4.4 make -f linux.mk
Review URL: https://codereview.chromium.org/2127863003 .
The old guard only checked for defined(_M_X64) which is defined by mingw64. Add a test for defined(_MSC_VER) which is defined for clangcl and visual c but not mingw. mingw should use row_gcc.cc for both 32 and 64 bit.
R=harryjin@google.com
BUG=webm:1252,libyuv:613
TEST=local gcc/clang builds on linux tested and try bots for others.
Review URL: https://codereview.chromium.org/2105603002 .
upscale a YUV image. observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.
was ScaleFrom1280x720_Bilinear (1620 ms)
now ScaleFrom1280x720_Bilinear (1907 ms)
BUG=libyuv:605
TEST=try bots
R=harryjin@google.com, wangcheng@google.com
Review URL: https://codereview.chromium.org/2084533006 .
upscale a YUV image. observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.
disable HAS_SCALEFILTERCOLS_SSSE3
R=harryjin@google.com
BUG=libyuv:605
Review URL: https://codereview.chromium.org/2080663003 .
Work around for android full debug build runnign out of registers.
5 functions were running out of registers causing the compiler error
error: 'asm' operand has impossible constraints
These functions mostly have 4 pointers, a counter (width) and a tempory
eax register. With fpic and debug using stackframes, 2 registers are
unavailable. So a total of 8 registers are used.
Although fpic and stack frame dont apply to assembly, the compiler
reserves 2 registers. The optimized version builds, so its likely
freeing up the registers once it knows they are not used.
These functions used to build, so compile options and/or compiler may
have updated.. likely fpic was turned on.
An attribute can be done to disable each, and will avoid using the
2 GPR registers, but they are still reserved and unavailable in debug
builds on current compilers (gcc 4.9 and clang 3.8).
R=dhrosa@google.com
BUG=libyuv:602
Review URL: https://codereview.chromium.org/2066933002 .
cpu_info_ is zero for uninitialized state and all bits are off, disabling all cpu optimizations.
the 1 bit indicates cpu_info_ is initialized avoiding calling the detection code again for performance.
MaskCpuFlags initializes the cpu ignoring existing flags, then masks with the supplied flags and stores to cpu_info_.
As a mask, -1 has no effect, enabling all cpu features that were detected, but nothing that wasnt detected.
Setting to 0 will cause the next call to re-initialize the cpu, which is same as enabling all features.
Setting mask to 1 will turn off all cpu features but keep the initialized bit on, so the next detection call wont reinitialize and the cpu features are all disabled.
So normal behavior for command line and programatic masking is:
1 = C
-1 = SIMD
TBR=harryjin@google.com
BUG=libyuv:600
TESTED=out64/Release/bin/run_libyuv_unittest -s libyuv_unittest --verbose --release --gtest_filter=*ARGBExtractAlpha* -a "--libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=1 --libyuv_cpu_info=1"
Review URL: https://codereview.chromium.org/2042933002 .
width %w size modifier the int width can be passed directly to arm assembly.
For functions that take input constants, the outputs are declared as early
write using &, meaning the outputs use used before all inputs are consumed.
R=harryjin@google.com
BUG=libyuv:598
Review URL: https://codereview.chromium.org/2043073003 .
blur requires memory be aligned. change the unittest allocator to guarantee 64 byte alignment.
re-enable blur any test that fails if memory is unaligned.
TBR=harryjin@google.com
BUG=libyuv:596,libyuv:594
TESTED=local build passes with row.h removed from tests.
Review URL: https://codereview.chromium.org/2019753002 .
Inline that uses temporary variables is currently initializing them
to 0 and passing in as output "+r".
This CL replaces the output constraint to "=&r" for most meaning an
output with early write (before inputs). This allows the initialize
to zero step to be removed, saving 1 instruction.
BUG=libyuv:580
TESTED=local libyuv build on gcc/linux and try bots
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1895743008 .
When using C version of I420Interpolate for msan, a 50% interpolation
would cause stride to be cast to int, which could cause erroneous
memory reads on 64 bit build.
This CL makes the stride use ptrdiff_t for HalfRow_C
BUG=libyuv:582
TESTED=try bots tests
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1872953002 .
An SSSE3 version already exists, and an AVX2 version is available for
Visual C. This ports the function to AVX2 completing the AVX2 ports of
all YUV to RGB functions for AVX2 on gcc.
TBR=harryjin@google.com
BUG=libyuv:555
Review URL: https://codereview.chromium.org/1687253002 .
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor. This should be named consistently.
Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.
TBR=harryjin@google.com
BUG=libyuv:562
Review URL: https://codereview.chromium.org/1677633002 .
internal math of the fastrand function uses a multiply
and add that overflows a signed int. This triggers a
ubsan failure:
../../unit_test/../unit_test/unit_test.h:60:33: runtime error: signed integer overflow: 56248274 * 214013 cannot be represented in type 'int'
This change casts the intermediate math to unsigned
int to avoid the overflow.
For more info on ubsan, see
http://dev.chromium.org/developers/testing/undefinedbehaviorsanitizer
TESTED=Passing compilation using:
GYP_DEFINES="ubsan=1"
GYP_DEFINES="ubsan_vptr=1"
R=harryjin@google.com, pbos@webrtc.org
BUG=libyuv:563
Review URL: https://codereview.chromium.org/1662453003 .
This is an UBSan error reported by libjingle
[ RUN ] WebRtcVideoFrameTest.ConvertToYUY2BufferStride
[000:000] (videoframe.cc:375): Validate frame passed. format: I420 bpp: 12 size: 1280x720 bytes: 1382400 expected: 1382400 sample[0..3]: 73, 73, 73, 73
../../chromium/src/third_party/libyuv/source/row_gcc.cc:2903:25: runtime error: signed integer overflow: 128 * 16843009 cannot be represented in type 'int'
[8/614] WebRtcVideoFrameTest.ConvertToYUY2BufferStride returned/aborted with exit code 1 (32 ms)
[9/614] WebRtcVideoFrameTest.ConvertToYUY2BufferInverted (29 ms)
Note: Google Test filter = WebRtcVideoFrameTest.ConvertToYUY2BufferInverted
The source is uint8 and the multiply is by 0x01010101 to replicate the byte to 4 bytes.
Changing the constant to 0x01010101u should avoid overflow.
R=harryjin@google.comTBR=harryjin@google.com
BUG=libyuv:563
Review URL: https://codereview.chromium.org/1657533005 .
When the image height for unittests was set to an
odd height, the TestI420 unittest would not fill
the complete source buffer. This change handles
the odd height test case.
No change to library code.
TBR=harryjin@google.com
BUG=libyuv:549
Review URL: https://codereview.chromium.org/1609103002 .
When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination. Performance
is slower.
Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)
Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545
Review URL: https://codereview.chromium.org/1593833002 .
I420ToNV21 passes the wrong dst_stride_y when it calls I420ToNV12; parameter 8 (convert_from.cc:448) is src_stride_y but should be dst_stride_y. This causes image corruption when converting I420 -> NV21 with mismatched luminance strides.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:547
Review URL: https://codereview.chromium.org/1582793008 .
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
Some functions run out of registers when compiled for debug,
fpic, with stack frames on 32 bit x86 with clang.
Previously they were enabled based on _DEBUG but that macro
is not set in some build systems. This CL adds DEBUG macro as
well to cover those environments.
R=harryjin@google.com
BUG=libyuv:532
Review URL: https://codereview.chromium.org/1517693005 .
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)
TBR=harryjin@google.com
BUG=libyuv:525
Review URL: https://codereview.chromium.org/1457783004 .
libyuv builds/runs, but when integrated into chromium, produces link errors. unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.
TBR=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1427683004 .
disable I422ALPHATOARGBROW_AVX2 I422TOARGBROW_AVX2 I422TORGB24ROW_AVX2 I422TORGBAROW_AVX2 in row.h.
SSSE3 versions will be used instead.
Short term fix until issue can be resolved.
R=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1419513009 .
On Arm the YVU to RGB conversions move constants into registers.
This change does the same for 64 bit intel builds where additional
registers are available.
The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided.
x64 builds using memory:
AVX2 I420ToARGB_Opt (3059 ms)
SSSE3 I420ToARGB_Opt (3959 ms)
Now using registers
AVX2 I420ToARGB_Opt (2906 ms)
SSSE3 I420ToARGB_Opt (3928 ms)
TBR=harryjin@google.com
BUG=libyuv:520
Review URL: https://codereview.chromium.org/1407353010 .
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.
Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1427993004 .
U contributes to B and G. V contributes to R and G.
By swapping U and V, they contribute to the opposite channels. Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the
new B channel and vice versa.
This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers.
As a result the existing I444ToABGRRow functions are no longer needed and are removed.
Previously this function was only Intel AVX2 optimized for Windwos. Now it is also optimized for Arm and GCC.
ARMv7 Neon
Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms)
Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms)
20.6 times faster.
R=xhwang@chromium.org
BUG=libyuv:515
Review URL: https://codereview.chromium.org/1414133006 .
The any function for handling ARGBToI411 was not handling the pixel
replication correctly. On 422 and odd width was handled by duplicating
a pixel of source. 411 needs replication for remainders of 1, 2 or 3
pixels.
The C version was handling odd width but with an average of the remainder
pixels, which does not match the SIMD 'any' handling off remainder.
This changes the odd width handling to mimic the any version.
TBR=harryjin@google.com
BUG=libyuv:491
Review URL: https://codereview.chromium.org/1411733004 .
Allows us to ignore flags passed on to us by Chromium build bots
without having to explicitly disable them. (Thanks pbos!)
TESTED=webrtc ran modules_unittests with a bogus flag did not result in an
error.
R=kjellander@chromium.org
BUG=libyuv:507
Review URL: https://codereview.chromium.org/1417573002 .
This reverts commit 2dd3d9230ee663e71ed4ad9164033ed672e571de.
Reason: chromium_git is a missing variable, and to properly
add gflags, we need to check in GYP files in third_party/gflags
first, then add the DEPS entry.
BUG=libyuv:507
TBR=fbarchard@chromium.org
Review URL: https://codereview.chromium.org/1406323002 .
A hang in color conversion on arm occurs somewhere in yuv to rgb.
Breaking the color test into its own category of test will help
run selective tests to narrow down the issue.
R=harryjin@google.com
BUG=libyuv:506
Review URL: https://codereview.chromium.org/1405543003 .
previously the neon source code was broken into a separate
library built with -mfpu=neon for the neon assembly, while
the C code was built without neon.
In this change, the neon code is added to the main library
and all code built with neon.
TBR=harryjin@google.com
BUG=libyuv:371
Review URL: https://codereview.chromium.org/1392043003 .
yuv constants for bt.601 were previously ported to neon64, as well
as the code to respect other color spaces. But the jpeg and bt.709
colour conversion constants were still in armv7 form. This changes
the constants for aarch64 builds to be compatible with the code.
yuv constants are now passed as const *
Remove Yvu constants which were used for older version on nv21 but not new code.
TBR=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1398623002 .
SETUP provided by zhongwei.yao@linaro.org
Previously the 64 bit Neon code had hard coded constants in the setup macro
for YUV conversion, while 32 bit Neon code supported the yuvconstants
parameter.
This change accepts the constants passed to the YUV conversion row function,
allowing different color spaces to be respected - naming JPEG and BT.709.
As well as the existing BT.601.
TBR=harryjin@google.com
BUG=libyuv:472
Review URL: https://codereview.chromium.org/1384323002 .
J444 is JPeg YUV color space with 444 subsampling.
This implementation uses the existing I444ToARGB conversion, which is
BT.601 color space with 444 subsampling, but passing in the jpeg
color matrix constants.
TBR=harryjin@google.com
BUG=449
Review URL: https://codereview.chromium.org/1387313002 .
ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2
Since vast majority of CPUs have SSSE3 now, removing the SSE2
improves the performance of CPU dispatching.
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1377053003 .
API change - I420AlphaToARGB takes flag indicating if RGB should be
premultiplied by alpha.
This version implements an efficient SSSE3 version for Windows.
C version done in 2 steps.
Was
libyuvTest.I420AlphaToARGB_Any (1136 ms)
libyuvTest.I420AlphaToARGB_Unaligned (1210 ms)
libyuvTest.I420AlphaToARGB_Invert (966 ms)
libyuvTest.I420AlphaToARGB_Opt (1031 ms)
libyuvTest.I420AlphaToABGR_Any (1020 ms)
libyuvTest.I420AlphaToABGR_Unaligned (1359 ms)
libyuvTest.I420AlphaToABGR_Invert (1082 ms)
libyuvTest.I420AlphaToABGR_Opt (986 ms)
R=harryjin@google.com
BUG=libyuv:496
Review URL: https://codereview.chromium.org/1367093002 .
Reimplements I444ToARGB as a matrix function.
new I444ToABGR as matrix functions with wrappers and any functions.
Allows for future J444 and H444 versions.
I444ToABGR user level function added.
BUG=libyuv:490, libyuv:449
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1355733002 .
clangcl use compare_win for 32 bit, allowing fallback and enabling avx2 code for clang.
move defines/protos to compare_row.h
fix issue with odd width ARGBCopyAlpha functions by copying destination to temp buffer, then doing alpha copy, then copy back to destination.
R=harryjin@google.comTBR=harryjin@google.com
BUG=libyuv:484
Review URL: https://webrtc-codereview.appspot.com/59379004.