Revert the workaround and fix it properly by passing the
additional necessary flag to the compiler.
BUG=libyuv:700
Change-Id: I1c893a8acb5079decbee6963b689424bf2f99f4f
Reviewed-on: https://chromium-review.googlesource.com/487881
Reviewed-by: Frank Barchard <fbarchard@google.com>
I422ToRGB565Row_Any_AVX2 uses 2 step row conversion that calls
I422ToARGBRow_AVX2 and then ARGBToRGB565.
I422ToARGBRow_AVX2 expects multiple of 16 pixels.
Adjust the I422ToRGB565Row_Any_AVX2 to do multiple of 16 with AVX2
and then remainder in a buffer.
Bug: libyuv: 657
Test: out/Release/libyuv_unittest --gtest_filter=*Convert*I*To* --libyuv_width=1280 --libyuv_height=720
Change-Id: Ice1cb6c7ff6b2295513e8b4a9f77522e1c659810
Reviewed-on: https://chromium-review.googlesource.com/474232
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
RGB24, RAW, RGB565, ARGB1555 and ARGB4444 have conditional
2 pass versus direct path. 2 pass method requires a buffer that
is conditionally allocated. ifdef's were confusing lint.
simplifed ifdefs to clean up lint warning
BUG=libyuv:692
TEST=lint source/convert.cc
Change-Id: If868718af30b48824a5e3d28f0d7d01d4609ad55
Reviewed-on: https://chromium-review.googlesource.com/451552
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
BUG=None
TEST=try bots and lint test
Change-Id: I1ab462adf2d309117862c5eb4b244a61ae202951
Reviewed-on: https://chromium-review.googlesource.com/450658
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Previously if MipsCpuCaps were called with something other than
dspr2 or msa, the file was closed but still used.
This change assumed the function is only called internally twice:
once for msa and once for dspr2. If msa is not being detected,
the function assumed dspr2 was being tested and returns dspr2 was
true.
BUG=libyuv:687
TEST=try bots
Change-Id: I80b328eb5ffc7baf5f1ee5a79c16d75c45ff26cc
Reviewed-on: https://chromium-review.googlesource.com/447831
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
BUG=libyuv:680
TEST=builds and runs with no warnings
Change-Id: I7d60ef44292fa6ad4f7c4e2e2657359b864d2dab
Reviewed-on: https://chromium-review.googlesource.com/442670
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
android.mk builds have unused parameter warning on by default.
This change for GN makes libyuv build the same way.
BUG=libyuv:681
TEST=build on linux with clang and ninja.
Change-Id: I76c627d446b96653f147725bca915d94a42ce9a6
Reviewed-on: https://chromium-review.googlesource.com/441194
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Uses 1 add instead of 2 leas to reduce port pressure on ports 1 and 5
used for SIMD instructions.
BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -arch HSW out/Release/obj/libyuv/row_gcc.o
Change-Id: I3965ee5dcb49941a535efa611b5988d977f5b65c
Reviewed-on: https://chromium-review.googlesource.com/433391
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
ARGBToUV_C and ARGBToUVJ_C are generated functions with subtle
difference in rounding. Adding comment to make them easier to find.
TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=untested
Change-Id: I9912d256a1e04c58475d33bdb472c37484f6cab9
Reviewed-on: https://chromium-review.googlesource.com/434980
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
add macros to enable/disable code analyst around blocks of code.
Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.
BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com
Review-Url: https://codereview.chromium.org/2626193002 .
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.
TEST=out/Release/libyuv_unittest --gtest_filter=*Half*
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2467723002 .
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.
BUG=libyuv:602
TEST=untested
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2451503002 .
R=fbarchard@google.com
BUG=libyuv:634
Performance Gains :- (vs C vectorized)
I422ToARGBRow_MSA : ~1.6x
I422ToRGBARow_MSA : ~1.6x
I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x
Performance Gains :- (vs C non-vectorized)
I422ToARGBRow_MSA : ~7x
I422ToRGBARow_MSA : ~7x
I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x
Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
Review URL: https://codereview.chromium.org/2430313005 .
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.
TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2424233004 .
YUV 411 is very uncommon format. Remove support.
Update documentation to reflect that 411 is deprecated.
Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.
BUG=libyuv:645
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2406123002 .
YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
the last pixel. The replication was not necessary after a previous
change to treat YUY2 to 4 byte macro pixels.
TBR=harryjin@google.com
BUG=libyuv:648
TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"
Review URL: https://codereview.chromium.org/2399143002 .
Original bt709 color space coefficients were full range yuv for higher
quality. This change makes the coefficients use the video constrained
color space the same as bt601 which is 16 to 240 for Y and 16 to 235 for
chroma channels.
BUG=libyuv:639
TEST=libyuv unittests run locally
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2367253003 .
follow up warning fixs
cpu_id.cc(167): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
lint warning: cpu_id.cc:171: Missing space before ( in if( [whitespace/parens] [5]
TBR=manojkumar.bhosale@imgtec.com
BUG=libyuv:634
TEST=try bots for windows.
Review URL: https://codereview.chromium.org/2365813002 .
As per the preparation patch added in Chromium sources at,
2150943003: Add MIPS SIMD Arch (MSA) build flags for GYP/GN builds
This patch adds first MSA optimized function in libYUV project.
BUG=libyuv:634
R=fbarchard@google.com
Review URL: https://codereview.chromium.org/2285683002 .
On visual c 2013 and earlier a warning is generated if externs
are not declared with the same alignment as the declaration, when
using /ltcg
BUG=libyuv:633
TEST=standalong test built with cl /Bv /GL /Ox /nologo a.cc b.cc /link /ltcg
R=skal@google.com
Review URL: https://codereview.chromium.org/2291533004 .
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists. Also, de-duplicate the
CPU dispatching code for these functions by moving them to helper
functions.
BUG=libyuv:629
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2277603004 .
The conversion from NV12 and other Bi or Tri planar formats, differs only in the UV handling. The helper function supports passing a NULL for the dst_y channel indicating you only want to do the UV conversion.
TBR=harryjin@google.com
TEST=LibYUVConvertTest.NV12ToI420_NullY (601 ms)
BUG=libyuv:626
Review URL: https://codereview.chromium.org/2276703002 .
to Y,U,V and a pixel stride for U and V. The pixel stride is expected to be 1 or 2.
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Any
[ OK ] LibYUVConvertTest.Android420ToI420_1_Any (253 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Unaligned
[ OK ] LibYUVConvertTest.Android420ToI420_1_Unaligned (250 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Invert
[ OK ] LibYUVConvertTest.Android420ToI420_1_Invert (254 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_1_Opt
[ OK ] LibYUVConvertTest.Android420ToI420_1_Opt (247 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Any
[ OK ] LibYUVConvertTest.Android420ToI420_2_Any (132 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Unaligned
[ OK ] LibYUVConvertTest.Android420ToI420_2_Unaligned (122 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Invert
[ OK ] LibYUVConvertTest.Android420ToI420_2_Invert (124 ms)
[ RUN ] LibYUVConvertTest.Android420ToI420_2_Opt
[ OK ] LibYUVConvertTest.Android420ToI420_2_Opt (119 ms)
TEST=LibYUVConvertTest.Android420ToI420_Opt
BUG=libyuv:604
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2146733002 .
The old guard only checked for defined(_M_X64) which is defined by mingw64. Add a test for defined(_MSC_VER) which is defined for clangcl and visual c but not mingw. mingw should use row_gcc.cc for both 32 and 64 bit.
R=harryjin@google.com
BUG=webm:1252,libyuv:613
TEST=local gcc/clang builds on linux tested and try bots for others.
Review URL: https://codereview.chromium.org/2105603002 .
upscale a YUV image. observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.
was ScaleFrom1280x720_Bilinear (1620 ms)
now ScaleFrom1280x720_Bilinear (1907 ms)
BUG=libyuv:605
TEST=try bots
R=harryjin@google.com, wangcheng@google.com
Review URL: https://codereview.chromium.org/2084533006 .
Work around for android full debug build runnign out of registers.
5 functions were running out of registers causing the compiler error
error: 'asm' operand has impossible constraints
These functions mostly have 4 pointers, a counter (width) and a tempory
eax register. With fpic and debug using stackframes, 2 registers are
unavailable. So a total of 8 registers are used.
Although fpic and stack frame dont apply to assembly, the compiler
reserves 2 registers. The optimized version builds, so its likely
freeing up the registers once it knows they are not used.
These functions used to build, so compile options and/or compiler may
have updated.. likely fpic was turned on.
An attribute can be done to disable each, and will avoid using the
2 GPR registers, but they are still reserved and unavailable in debug
builds on current compilers (gcc 4.9 and clang 3.8).
R=dhrosa@google.com
BUG=libyuv:602
Review URL: https://codereview.chromium.org/2066933002 .
width %w size modifier the int width can be passed directly to arm assembly.
For functions that take input constants, the outputs are declared as early
write using &, meaning the outputs use used before all inputs are consumed.
R=harryjin@google.com
BUG=libyuv:598
Review URL: https://codereview.chromium.org/2043073003 .
ifdefs on a function level are not needed for neon functions, unless
they are conditionally enabled in row.h. No functions are conditionally
enabled at this time, so all ifdefs can be removed from row_neon.cc and
row_neon64.cc
TBR=kjellander@chromium.org
BUG=libyuv:599
Review URL: https://codereview.chromium.org/2044223002 .
blur requires memory be aligned. change the unittest allocator to guarantee 64 byte alignment.
re-enable blur any test that fails if memory is unaligned.
TBR=harryjin@google.com
BUG=libyuv:596,libyuv:594
TESTED=local build passes with row.h removed from tests.
Review URL: https://codereview.chromium.org/2019753002 .
Inline that uses temporary variables is currently initializing them
to 0 and passing in as output "+r".
This CL replaces the output constraint to "=&r" for most meaning an
output with early write (before inputs). This allows the initialize
to zero step to be removed, saving 1 instruction.
BUG=libyuv:580
TESTED=local libyuv build on gcc/linux and try bots
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1895743008 .