Frank Barchard 000cf89ca8 YUY2ToARGB avx2 in 1 step conversion.
Includes UYVYToARGB ssse3 fix.

Was
YUY2ToARGB_Opt (433 ms)
69.79%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
20.73%  libyuv_unittest  libyuv_unittest      [.] YUY2ToUV422Row_AVX2
 6.04%  libyuv_unittest  libyuv_unittest      [.] YUY2ToYRow_AVX2
 0.77%  libyuv_unittest  libyuv_unittest      [.] YUY2ToARGBRow_AVX2

Now
YUY2ToARGB_Opt (280 ms)
95.66%  libyuv_unittest  libyuv_unittest      [.] YUY2ToARGBRow_AVX2

BUG=libyuv:494
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1364813002 .
2015-09-23 11:15:18 -07:00
..
compare_common.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare_gcc.cc nolint removed 2015-08-31 10:52:13 -07:00
compare_neon64.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare_neon.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare_win.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
convert_argb.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
convert_from_argb.cc ARGBToYJRow_AVX2 hooked up for ARGBToJ422 2015-04-07 00:39:25 +00:00
convert_from.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
convert_jpeg.cc libyuv::MJPGToI420() and libyuv::MJPGToARGB() return failure if callback to JPeg fails. 2014-01-28 03:08:59 +00:00
convert_to_argb.cc Remove Q420 fourcc support. 2015-02-11 18:20:54 +00:00
convert_to_i420.cc Remove Q420 fourcc support. 2015-02-11 18:20:54 +00:00
convert.cc disable faulty avx2 in argb conversions and box filter. and extend temporary buffer to 128 for an avx2 any function. 2015-07-07 15:40:24 -07:00
cpu_id.cc nolint removed 2015-08-31 10:52:13 -07:00
mjpeg_decoder.cc nolint removed 2015-08-31 10:52:13 -07:00
mjpeg_validate.cc validate scan EOI from end for better coverage 2015-09-14 10:58:51 -07:00
planar_functions.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
rotate_any.cc rotate nv12 any width 2015-08-07 23:48:38 -07:00
rotate_argb.cc rotate include and proto cleanup 2015-07-22 18:09:04 -07:00
rotate_common.cc rotate include and proto cleanup 2015-07-22 18:09:04 -07:00
rotate_gcc.cc use visual c 32 bit code for clangcl 2015-08-11 10:10:45 -07:00
rotate_mips.cc rename rotate macros and functions to match 2015-07-27 17:00:41 -07:00
rotate_neon64.cc rotate include and proto cleanup 2015-07-22 18:09:04 -07:00
rotate_neon.cc remove align directives 2015-08-04 17:00:03 -07:00
rotate_win.cc use visual c 32 bit code for clangcl 2015-08-11 10:10:45 -07:00
rotate.cc rotate nv12 any width 2015-08-07 23:48:38 -07:00
row_any.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
row_common.cc YUY2ToARGB avx2 in 1 step conversion. 2015-09-23 11:15:18 -07:00
row_gcc.cc YUY2ToARGB avx2 in 1 step conversion. 2015-09-23 11:15:18 -07:00
row_mips.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
row_neon64.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
row_neon.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
row_win.cc YUY2ToARGB avx2 in 1 step conversion. 2015-09-23 11:15:18 -07:00
scale_any.cc Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows. 2015-06-09 01:05:18 +00:00
scale_argb.cc odd width support for scale by even scale factor and box scale down by 4. scale down by 4 uses scale down by 2 internally. 2015-05-26 17:56:51 +00:00
scale_common.cc Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows. 2015-06-09 01:05:18 +00:00
scale_gcc.cc clang use scalewin 2015-08-18 14:50:27 -07:00
scale_mips.cc remove align directives 2015-08-04 17:00:03 -07:00
scale_neon64.cc work arounds for ios 64 bit compiler where int passed into assembly needs to be explicitely cast to 'w' register. 2015-05-05 22:46:16 +00:00
scale_neon.cc remove align directives 2015-08-04 17:00:03 -07:00
scale_win.cc clang use scalewin 2015-08-18 14:50:27 -07:00
scale.cc Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows. 2015-06-09 01:05:18 +00:00
video_common.cc Remove bayer format support from libyuv. This format is very rare and used on legacy hardware. Its not well optimized and has bugs related to odd widths. Removing the format will allow tests to pass under more circumstances, run faster and allow focus on higher priority quality and performance issues. 2015-02-09 19:58:19 +00:00