Frank Barchard 9a0e12f5f1 AVX2 1 step I422AlphaToARGB for gcc and win.
C     I420AlphaToARGB_Opt (5169 ms)
SSSE3 I420AlphaToARGB_Opt (432 ms)
AVX2  I420AlphaToARGB_Opt (358 ms)

and with premultiplication as 2 step process:
I420AlphaToARGB_Premult (7029 ms)
I420AlphaToARGB_Premult (757 ms)
I420AlphaToARGB_Premult (508 ms)

R=harryjin@google.com
BUG=libyuv:496,libyuv:473

Review URL: https://codereview.chromium.org/1372653003 .
2015-09-25 13:37:42 -07:00
..
compare_common.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare_gcc.cc nolint removed 2015-08-31 10:52:13 -07:00
compare_neon64.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare_neon.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare_win.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
compare.cc xmmword cast for clang 2015-08-18 11:13:12 -07:00
convert_argb.cc I420Alpha row function in 1 pass. 2015-09-25 10:29:20 -07:00
convert_from_argb.cc ARGBToYJRow_AVX2 hooked up for ARGBToJ422 2015-04-07 00:39:25 +00:00
convert_from.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
convert_jpeg.cc libyuv::MJPGToI420() and libyuv::MJPGToARGB() return failure if callback to JPeg fails. 2014-01-28 03:08:59 +00:00
convert_to_argb.cc Remove Q420 fourcc support. 2015-02-11 18:20:54 +00:00
convert_to_i420.cc Remove Q420 fourcc support. 2015-02-11 18:20:54 +00:00
convert.cc disable faulty avx2 in argb conversions and box filter. and extend temporary buffer to 128 for an avx2 any function. 2015-07-07 15:40:24 -07:00
cpu_id.cc nolint removed 2015-08-31 10:52:13 -07:00
mjpeg_decoder.cc nolint removed 2015-08-31 10:52:13 -07:00
mjpeg_validate.cc validate scan EOI from end for better coverage 2015-09-14 10:58:51 -07:00
planar_functions.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
rotate_any.cc rotate nv12 any width 2015-08-07 23:48:38 -07:00
rotate_argb.cc rotate include and proto cleanup 2015-07-22 18:09:04 -07:00
rotate_common.cc rotate include and proto cleanup 2015-07-22 18:09:04 -07:00
rotate_gcc.cc use visual c 32 bit code for clangcl 2015-08-11 10:10:45 -07:00
rotate_mips.cc rename rotate macros and functions to match 2015-07-27 17:00:41 -07:00
rotate_neon64.cc rotate include and proto cleanup 2015-07-22 18:09:04 -07:00
rotate_neon.cc remove align directives 2015-08-04 17:00:03 -07:00
rotate_win.cc use visual c 32 bit code for clangcl 2015-08-11 10:10:45 -07:00
rotate.cc rotate nv12 any width 2015-08-07 23:48:38 -07:00
row_any.cc AVX2 1 step I422AlphaToARGB for gcc and win. 2015-09-25 13:37:42 -07:00
row_common.cc I420Alpha row function in 1 pass. 2015-09-25 10:29:20 -07:00
row_gcc.cc AVX2 1 step I422AlphaToARGB for gcc and win. 2015-09-25 13:37:42 -07:00
row_mips.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
row_neon64.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
row_neon.cc yuvconstants for all YUV to RGB conversion functions. 2015-09-22 10:26:03 -07:00
row_win.cc AVX2 1 step I422AlphaToARGB for gcc and win. 2015-09-25 13:37:42 -07:00
scale_any.cc Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows. 2015-06-09 01:05:18 +00:00
scale_argb.cc odd width support for scale by even scale factor and box scale down by 4. scale down by 4 uses scale down by 2 internally. 2015-05-26 17:56:51 +00:00
scale_common.cc Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows. 2015-06-09 01:05:18 +00:00
scale_gcc.cc clang use scalewin 2015-08-18 14:50:27 -07:00
scale_mips.cc remove align directives 2015-08-04 17:00:03 -07:00
scale_neon64.cc work arounds for ios 64 bit compiler where int passed into assembly needs to be explicitely cast to 'w' register. 2015-05-05 22:46:16 +00:00
scale_neon.cc remove align directives 2015-08-04 17:00:03 -07:00
scale_win.cc clang use scalewin 2015-08-18 14:50:27 -07:00
scale.cc Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows. 2015-06-09 01:05:18 +00:00
video_common.cc Remove bayer format support from libyuv. This format is very rare and used on legacy hardware. Its not well optimized and has bugs related to odd widths. Removing the format will allow tests to pass under more circumstances, run faster and allow focus on higher priority quality and performance issues. 2015-02-09 19:58:19 +00:00