Frank Barchard 3db22ebc4b RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996 ms, now 3309. 20.7% faster.
Call a row function for each row, based on ARGBToI400 code.
But implement row functions as 2 step conversion.  Adds the
row functions:
RAWToYJ, RGBToYJ, SSSE3 and AVX2 versions, and Any versions.
The smaller row buffer is more cache friendly on large images.

The max cache size can be configured, and is currently:
// Maximum temporary width for wrappers to process at a time, in pixels.
And the row buffer is
  SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4]);
So 8192 bytes are used for the row buffer, leaving the rest for source
and destination buffers.

blaze-bin/third_party/libyuv/libyuv_test '--gunit_filter=*R*To?400_Opt' --libyuv_width=3600 --libyuv_height=2500 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 | sortms

Was
RAWToJ400_Opt (3996 ms)
ARGBToI400_Opt (3964 ms)
RGB24ToJ400_Opt (3960 ms)
ARGBToJ400_Opt (3909 ms)
RGBAToJ400_Opt (3885 ms)

Now
ARGBToJ400_Opt (4091 ms)
ARGBToI400_Opt (3936 ms)
RGBAToJ400_Opt (3428 ms)
RGB24ToJ400_Opt (3324 ms)
RAWToJ400_Opt (3309 ms)

Bug: libyuv:854, b/147753855
Change-Id: Ieb65fbda94e812c737f4c3c74107354b73c4bcd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2016203
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-01-23 03:23:38 +00:00
..
compare_common.cc add const to casts 2018-04-13 22:52:52 +00:00
compare_gcc.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_mmi.cc MMI ifdef guards and add source to various build files. 2018-08-03 18:37:23 +00:00
compare_msa.cc use unix line endings 2018-06-20 23:19:59 +00:00
compare_neon64.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_neon.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_win.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare.cc Fix arm unittest failure by removing unused FloatDivToByteRow. 2019-07-02 20:00:30 +00:00
convert_argb.cc I210ToAR30 support for 422 10 bit to 10 bit RGB 2019-11-06 19:37:22 +00:00
convert_from_argb.cc ARGBToY use 8 bit precision instead of 7 bit. 2019-10-07 23:01:10 +00:00
convert_from.cc ARGBToY use 8 bit precision instead of 7 bit. 2019-10-07 23:01:10 +00:00
convert_jpeg.cc Add commment for jpeg parameters. 2018-11-01 18:18:50 +00:00
convert_to_argb.cc Add U444ToABGR, J444ToABGR, H444ToABGR, H444ToARGB and ConvertToARGB support 2019-11-05 22:11:20 +00:00
convert_to_i420.cc Fix ConvertToI420() for odd crop_y 2018-10-03 19:14:01 +00:00
convert.cc RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996 ms, now 3309. 20.7% faster. 2020-01-23 03:23:38 +00:00
cpu_id.cc Fix ConvertToI420() for odd crop_y 2018-10-03 19:14:01 +00:00
mjpeg_decoder.cc Fix for jpeg to allow fuzz 2019-10-28 23:35:13 +00:00
mjpeg_validate.cc Update to r1732 for more robust jpeg 2019-07-01 22:32:36 +00:00
planar_functions.cc Floating point Gaussian kernels 2019-12-09 04:45:59 +00:00
rotate_any.cc Restore the file mode for source files 2018-08-06 18:53:32 +00:00
rotate_argb.cc Restore the file mode for source files 2018-08-06 18:53:32 +00:00
rotate_common.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_gcc.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_mmi.cc MMI ifdef guards and add source to various build files. 2018-08-03 18:37:23 +00:00
rotate_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_neon64.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_neon.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_win.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate.cc Upstream minor changes. Faster tests, Faster YUV Rotate180 and Mirror 2019-11-13 20:02:40 +00:00
row_any.cc RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996 ms, now 3309. 20.7% faster. 2020-01-23 03:23:38 +00:00
row_common.cc RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996 ms, now 3309. 20.7% faster. 2020-01-23 03:23:38 +00:00
row_gcc.cc Upstream minor changes. Faster tests, Faster YUV Rotate180 and Mirror 2019-11-13 20:02:40 +00:00
row_mmi.cc RAWToJ400 for big endian RGB to grey scale. 2020-01-16 00:29:11 +00:00
row_msa.cc Fix ConvertToI420() for odd crop_y 2018-10-03 19:14:01 +00:00
row_neon64.cc RAWToJ400 for big endian RGB to grey scale. 2020-01-16 00:29:11 +00:00
row_neon.cc RAWToJ400 for big endian RGB to grey scale. 2020-01-16 00:29:11 +00:00
row_win.cc Add ABGRToNV21 and ABGRToNV12 2019-08-07 01:29:13 +00:00
scale_any.cc MMI Optimized functions I422ToARGB for 1080p video 2019-09-11 21:06:21 +00:00
scale_argb.cc MMI Optimized functions I422ToARGB for 1080p video 2019-09-11 21:06:21 +00:00
scale_common.cc Fix ConvertToI420() for odd crop_y 2018-10-03 19:14:01 +00:00
scale_gcc.cc Add AYUVToNV12 and NV21ToNV12 2019-04-12 17:48:45 +00:00
scale_mmi.cc MMI Optimized functions I422ToARGB for 1080p video 2019-09-11 21:06:21 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc Add AYUVToNV12 and NV21ToNV12 2019-04-12 17:48:45 +00:00
scale_neon.cc Add AYUVToNV12 and NV21ToNV12 2019-04-12 17:48:45 +00:00
scale_win.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale.cc MMI Optimized functions I422ToARGB for 1080p video 2019-09-11 21:06:21 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00