Full color test is the slowest of the unittests, and not catching any
additional bugs at the moment. Step thru range of 0 to 255 in steps of
5 to speed up the test. 255 is 3 * 5 * 17, so any of those primes would
hit 0 and 255 exactly.
Was LibYUVColorTest.TestFullYUV (896 ms)
Now LibYUVColorTest.TestFullYUV (212 ms)
TBR=kjellander@chromium.org
Bug: libyuv:736
Test: LibYUVColorTest.TestFullYUV
Change-Id: I5b55fb07ada0dc7bdc3c3c20569d36bf09bb3804
Reviewed-on: https://chromium-review.googlesource.com/672064
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
add ScaleMaxSamples_NEON function with max
done on original values.
TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleMaxSamples_Opt
Change-Id: Id99338860782b10ffd24f66242eb42014c2e229e
Reviewed-on: https://chromium-review.googlesource.com/614685
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Summing 16 bit hamming codes restricts the maximum length,
but saves an inner loop instruction. The outer loop can sum the
values.
32 bit Neon
Now BenchmarkHammingDistance_Opt (78 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
64 bit Neon
Now BenchmarkHammingDistance_Opt (85 ms)
Was BenchmarkHammingDistance_Opt (92 ms)
R=wangcheng@google.comTBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance
Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae
Reviewed-on: https://chromium-review.googlesource.com/526932
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
existing test passes
out/Release/libyuv_unittest --gtest_filter=*Blend* --libyuv_width=33 --libyuv_height=16
new test added
BUG=libyuv:705
TEST=LibYUVScaleTest.TestScaleOdd
Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704
Reviewed-on: https://chromium-review.googlesource.com/524932
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The CpuId function is a wrapper for the intrinsic, or
implemented with inline if unavailable. It had been
using uint32, but the intrinsics use int, so it was causing
casting and lint warnings. This change makes the internal
implementation use int.
Casting was also done for xgetbv, and the cast is simply
removed, and is not causing a build error.
MipCpuCaps was doing strlen to check for white space after the
instruction set. Arm also does this but with a hard coded offset.
This was causing a cast from size_t to int, which produced a lint
warning. The change removes the white space detect.
In theory the code could be used to detect SSE vs SSE2, and it would
need to check SSE is followed by a space or end of line. But this
code is only used on Arm and Mips, where there there is one form
of SIMD detected. e.g. MSA for mips. If a new instruction set is
added with a similar name, the write space check could be reintroduced.
But its more likely the code can be rewritten to use a better form
of detection by then. Or remove detection and require the instructions
BUG=libyuv:641
TEST=try bots build on all platforms without error and lint is clean
Change-Id: I9f55f8e57bba0f78571bdddbe63b945dea3e8809
Reviewed-on: https://chromium-review.googlesource.com/514524
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Wan-Teh Chang <wtc@chromium.org>
Detect the compiler's support of C11 atomics, and use C11 atomics when
available.
Note that libyuv::MaskCpuFlags() is still not thread-safe.
BUG=libyuv:641
TEST= cpu_thread_test.cc adds a pthread based test
R=wangcheng@google.com
Change-Id: If05b1e16da833105a0159ed67ef20f4e61bc7abd
Reviewed-on: https://chromium-review.googlesource.com/510079
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
The verion of clang in ndk r14 (3.9) has a built in llvm assembler
that does not have the sgtu pseudo instruction.
sltu is the actual instruction, so switch the 2 operands and use
the instruction instead of the pseudo op.
BUG=libyuv:700
TEST=try bots build mips without error.
Change-Id: I2d5f94f81acbd56cdedea011e7d9308979e19079
Reviewed-on: https://chromium-review.googlesource.com/494026
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Revert the workaround and fix it properly by passing the
additional necessary flag to the compiler.
BUG=libyuv:700
Change-Id: I1c893a8acb5079decbee6963b689424bf2f99f4f
Reviewed-on: https://chromium-review.googlesource.com/487881
Reviewed-by: Frank Barchard <fbarchard@google.com>
I422ToRGB565Row_Any_AVX2 uses 2 step row conversion that calls
I422ToARGBRow_AVX2 and then ARGBToRGB565.
I422ToARGBRow_AVX2 expects multiple of 16 pixels.
Adjust the I422ToRGB565Row_Any_AVX2 to do multiple of 16 with AVX2
and then remainder in a buffer.
Bug: libyuv: 657
Test: out/Release/libyuv_unittest --gtest_filter=*Convert*I*To* --libyuv_width=1280 --libyuv_height=720
Change-Id: Ice1cb6c7ff6b2295513e8b4a9f77522e1c659810
Reviewed-on: https://chromium-review.googlesource.com/474232
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
RGB24, RAW, RGB565, ARGB1555 and ARGB4444 have conditional
2 pass versus direct path. 2 pass method requires a buffer that
is conditionally allocated. ifdef's were confusing lint.
simplifed ifdefs to clean up lint warning
BUG=libyuv:692
TEST=lint source/convert.cc
Change-Id: If868718af30b48824a5e3d28f0d7d01d4609ad55
Reviewed-on: https://chromium-review.googlesource.com/451552
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Previously if MipsCpuCaps were called with something other than
dspr2 or msa, the file was closed but still used.
This change assumed the function is only called internally twice:
once for msa and once for dspr2. If msa is not being detected,
the function assumed dspr2 was being tested and returns dspr2 was
true.
BUG=libyuv:687
TEST=try bots
Change-Id: I80b328eb5ffc7baf5f1ee5a79c16d75c45ff26cc
Reviewed-on: https://chromium-review.googlesource.com/447831
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
BUG=libyuv:680
TEST=builds and runs with no warnings
Change-Id: I7d60ef44292fa6ad4f7c4e2e2657359b864d2dab
Reviewed-on: https://chromium-review.googlesource.com/442670
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
add macros to enable/disable code analyst around blocks of code.
Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.
BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com
Review-Url: https://codereview.chromium.org/2626193002 .
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.
TEST=out/Release/libyuv_unittest --gtest_filter=*Half*
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2467723002 .
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.
BUG=libyuv:602
TEST=untested
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2451503002 .
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.
TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2424233004 .
YUV 411 is very uncommon format. Remove support.
Update documentation to reflect that 411 is deprecated.
Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.
BUG=libyuv:645
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2406123002 .
Original bt709 color space coefficients were full range yuv for higher
quality. This change makes the coefficients use the video constrained
color space the same as bt601 which is 16 to 240 for Y and 16 to 235 for
chroma channels.
BUG=libyuv:639
TEST=libyuv unittests run locally
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2367253003 .
On visual c 2013 and earlier a warning is generated if externs
are not declared with the same alignment as the declaration, when
using /ltcg
BUG=libyuv:633
TEST=standalong test built with cl /Bv /GL /Ox /nologo a.cc b.cc /link /ltcg
R=skal@google.com
Review URL: https://codereview.chromium.org/2291533004 .
Add test for SplitUVPlane and MergeUVPlane
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists.
TEST=SplitUVPlane unittest
BUG=libyuv:629
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2279603002 .
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists. Also, de-duplicate the
CPU dispatching code for these functions by moving them to helper
functions.
BUG=libyuv:629
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2277603004 .
The conversion from NV12 and other Bi or Tri planar formats, differs only in the UV handling. The helper function supports passing a NULL for the dst_y channel indicating you only want to do the UV conversion.
TBR=harryjin@google.com
TEST=LibYUVConvertTest.NV12ToI420_NullY (601 ms)
BUG=libyuv:626
Review URL: https://codereview.chromium.org/2276703002 .
Fix for duplicate define
../../third_party/libyuv/include/libyuv/scale_row.h:29:9: error: 'LIBYUV_DISABLE_X86' macro redefined [-Werror,-Wmacro-redefined]
^
GYP version relys on headers disabling the optimization.
This CL does the same for BUILD.gn
TBR=kjellander@chromium.org
BUG=libyuv:625
Review URL: https://codereview.chromium.org/2149823003 .
to Y,U,V and a pixel stride for U and V. The pixel stride is expected to be 1 or 2.
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Any
[ OK ] LibYUVConvertTest.Android420ToI420_1_Any (253 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Unaligned
[ OK ] LibYUVConvertTest.Android420ToI420_1_Unaligned (250 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Invert
[ OK ] LibYUVConvertTest.Android420ToI420_1_Invert (254 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Opt
[ OK ] LibYUVConvertTest.Android420ToI420_1_Opt (247 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Any
[ OK ] LibYUVConvertTest.Android420ToI420_2_Any (132 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Unaligned
[ OK ] LibYUVConvertTest.Android420ToI420_2_Unaligned (122 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Invert
[ OK ] LibYUVConvertTest.Android420ToI420_2_Invert (124 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Opt
[ OK ] LibYUVConvertTest.Android420ToI420_2_Opt (119 ms)
TEST=LibYUVConvertTest.Android420ToI420_Opt
BUG=libyuv:604
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2146733002 .
webrtc doesnt include the headers that the functions are prototyped in.
This CL makes the convert.h include those headers to allow webrtc to
update to the head libyuv.
TBR=harryjin@google.com
BUG=libyuv:620,webrtc:6091,webrtc:6094
TESTED=local build and try bots
Review URL: https://codereview.chromium.org/2141683002 .
Mking color conversion use simple arrays within structure, which will be referenced via register pointer.
R=harryjin@google.com
BUG=libyuv:616
TEST=CC=gcc-4.4 CXX=g++-4.4 LD=ld-4.4 make -f linux.mk
Review URL: https://codereview.chromium.org/2127863003 .
The old guard only checked for defined(_M_X64) which is defined by mingw64. Add a test for defined(_MSC_VER) which is defined for clangcl and visual c but not mingw. mingw should use row_gcc.cc for both 32 and 64 bit.
R=harryjin@google.com
BUG=webm:1252,libyuv:613
TEST=local gcc/clang builds on linux tested and try bots for others.
Review URL: https://codereview.chromium.org/2105603002 .
upscale a YUV image. observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.
was ScaleFrom1280x720_Bilinear (1620 ms)
now ScaleFrom1280x720_Bilinear (1907 ms)
BUG=libyuv:605
TEST=try bots
R=harryjin@google.com, wangcheng@google.com
Review URL: https://codereview.chromium.org/2084533006 .
upscale a YUV image. observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.
disable HAS_SCALEFILTERCOLS_SSSE3
R=harryjin@google.com
BUG=libyuv:605
Review URL: https://codereview.chromium.org/2080663003 .
Work around for android full debug build runnign out of registers.
5 functions were running out of registers causing the compiler error
error: 'asm' operand has impossible constraints
These functions mostly have 4 pointers, a counter (width) and a tempory
eax register. With fpic and debug using stackframes, 2 registers are
unavailable. So a total of 8 registers are used.
Although fpic and stack frame dont apply to assembly, the compiler
reserves 2 registers. The optimized version builds, so its likely
freeing up the registers once it knows they are not used.
These functions used to build, so compile options and/or compiler may
have updated.. likely fpic was turned on.
An attribute can be done to disable each, and will avoid using the
2 GPR registers, but they are still reserved and unavailable in debug
builds on current compilers (gcc 4.9 and clang 3.8).
R=dhrosa@google.com
BUG=libyuv:602
Review URL: https://codereview.chromium.org/2066933002 .
cpu_info_ is zero for uninitialized state and all bits are off, disabling all cpu optimizations.
the 1 bit indicates cpu_info_ is initialized avoiding calling the detection code again for performance.
MaskCpuFlags initializes the cpu ignoring existing flags, then masks with the supplied flags and stores to cpu_info_.
As a mask, -1 has no effect, enabling all cpu features that were detected, but nothing that wasnt detected.
Setting to 0 will cause the next call to re-initialize the cpu, which is same as enabling all features.
Setting mask to 1 will turn off all cpu features but keep the initialized bit on, so the next detection call wont reinitialize and the cpu features are all disabled.
So normal behavior for command line and programatic masking is:
1 = C
-1 = SIMD
TBR=harryjin@google.com
BUG=libyuv:600
TESTED=out64/Release/bin/run_libyuv_unittest -s libyuv_unittest --verbose --release --gtest_filter=*ARGBExtractAlpha* -a "--libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=1 --libyuv_cpu_info=1"
Review URL: https://codereview.chromium.org/2042933002 .
width %w size modifier the int width can be passed directly to arm assembly.
For functions that take input constants, the outputs are declared as early
write using &, meaning the outputs use used before all inputs are consumed.
R=harryjin@google.com
BUG=libyuv:598
Review URL: https://codereview.chromium.org/2043073003 .
blur requires memory be aligned. change the unittest allocator to guarantee 64 byte alignment.
re-enable blur any test that fails if memory is unaligned.
TBR=harryjin@google.com
BUG=libyuv:596,libyuv:594
TESTED=local build passes with row.h removed from tests.
Review URL: https://codereview.chromium.org/2019753002 .
Inline that uses temporary variables is currently initializing them
to 0 and passing in as output "+r".
This CL replaces the output constraint to "=&r" for most meaning an
output with early write (before inputs). This allows the initialize
to zero step to be removed, saving 1 instruction.
BUG=libyuv:580
TESTED=local libyuv build on gcc/linux and try bots
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1895743008 .
When using C version of I420Interpolate for msan, a 50% interpolation
would cause stride to be cast to int, which could cause erroneous
memory reads on 64 bit build.
This CL makes the stride use ptrdiff_t for HalfRow_C
BUG=libyuv:582
TESTED=try bots tests
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1872953002 .
An SSSE3 version already exists, and an AVX2 version is available for
Visual C. This ports the function to AVX2 completing the AVX2 ports of
all YUV to RGB functions for AVX2 on gcc.
TBR=harryjin@google.com
BUG=libyuv:555
Review URL: https://codereview.chromium.org/1687253002 .
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor. This should be named consistently.
Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.
TBR=harryjin@google.com
BUG=libyuv:562
Review URL: https://codereview.chromium.org/1677633002 .
internal math of the fastrand function uses a multiply
and add that overflows a signed int. This triggers a
ubsan failure:
../../unit_test/../unit_test/unit_test.h:60:33: runtime error: signed integer overflow: 56248274 * 214013 cannot be represented in type 'int'
This change casts the intermediate math to unsigned
int to avoid the overflow.
For more info on ubsan, see
http://dev.chromium.org/developers/testing/undefinedbehaviorsanitizer
TESTED=Passing compilation using:
GYP_DEFINES="ubsan=1"
GYP_DEFINES="ubsan_vptr=1"
R=harryjin@google.com, pbos@webrtc.org
BUG=libyuv:563
Review URL: https://codereview.chromium.org/1662453003 .