libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-01-01 03:12:16 +08:00

History

Frank Barchard f0a9d6d206 Gaussian reorder for benefit of A73 Roughly. instead of 4 loads and 8 multiples, use 1 load and 2 multiples 4 times over. The original code, as with the C code from clang and gcc, did all the loads, then all the math, then the store. The new code does a load, then the math, then the next load, etc. This schedules better on current arm 64 cpus. Number of registers also reduced, reusing the same registers. HiSilicon ARM A73: Now TestGaussRow_Opt (890 ms) TestGaussCol_Opt (571 ms) Was TestGaussRow_Opt (1061 ms) TestGaussCol_Opt (595 ms) Qualcomm 821 (Pixel): Now TestGaussRow_Opt (571 ms) TestGaussCol_Opt (474 ms) Was TestGaussRow_Opt (751 ms) TestGaussCol_Opt (520 ms) TBR=kjellander@chromium.org BUG=libyuv:719 TEST=LibYUVPlanarTest.TestGaussRow_Opt Reviewed-on: https://chromium-review.googlesource.com/627478 Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Change-Id: I5ec81191d460801f0d4a89f0384f89925ff036de Reviewed-on: https://chromium-review.googlesource.com/634448 Commit-Queue: Frank Barchard <fbarchard@google.com>		2017-08-25 19:00:05 +00:00
..
compare_common.cc	Optimize Hamming Distance C code to do 64 bits at a time.	2017-05-12 17:53:52 +00:00
compare_gcc.cc	Remove volatile from variables to improve performance	2017-05-09 18:14:21 +00:00
compare_neon64.cc	scale float samples and return max value	2017-08-04 23:34:30 +00:00
compare_neon.cc	scale float samples and return max value	2017-08-04 23:34:30 +00:00
compare_win.cc	Fix data races in libyuv::TestCpuFlag().	2017-05-24 02:09:03 +00:00
compare.cc	Optimize Hamming Distance C code to do 64 bits at a time.	2017-05-12 17:53:52 +00:00
convert_argb.cc	Add MSA optimized I444/I400/J400/YUY2/UYVY to ARGB row functions	2017-02-21 23:22:07 +00:00
convert_from_argb.cc	Add MSA optimized Interpolate/MergeUV/Misc functions	2017-02-23 01:42:22 +00:00
convert_from.cc	Add I422ToRGB565	2017-04-17 17:51:17 +00:00
convert_jpeg.cc	clang-format libyuv	2016-11-07 17:37:23 -08:00
convert_to_argb.cc	add Intel Code Analyst markers	2017-01-13 15:50:24 -08:00
convert_to_i420.cc	clang-format libyuv	2016-11-07 17:37:23 -08:00
convert.cc	lint cleanup for convert RGB24ToI420	2017-03-09 10:32:23 +00:00
cpu_id.cc	lint warning fixes for CpuID	2017-05-25 22:00:17 +00:00
mjpeg_decoder.cc	Revert "include <new> header for benefit of new clang builds"	2017-08-03 22:03:47 +00:00
mjpeg_validate.cc	clang-format libyuv	2016-11-07 17:37:23 -08:00
planar_functions.cc	Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.	2017-08-17 18:39:22 +00:00
rotate_any.cc	Add MSA optimized rotate functions (used 16x16 transpose)	2017-01-13 15:50:02 +05:30
rotate_argb.cc	clang-format 5.0 applied to libyuv	2017-03-08 18:50:12 +00:00
rotate_common.cc	clang-format 5.0 applied to libyuv	2017-03-08 18:50:12 +00:00
rotate_dspr2.cc	clang-format 5.0 applied to libyuv	2017-03-08 18:50:12 +00:00
rotate_gcc.cc	clang-format 5.0 applied to libyuv	2017-03-08 18:50:12 +00:00
rotate_msa.cc	Add MSA optimized rotate functions (used 16x16 transpose)	2017-01-13 15:50:02 +05:30
rotate_neon64.cc	scale float samples and return max value	2017-08-04 23:34:30 +00:00
rotate_neon.cc	Remove ARM NaCL macros from source	2017-06-09 22:22:07 +00:00
rotate_win.cc	scale float samples and return max value	2017-08-04 23:34:30 +00:00
rotate.cc	Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.	2017-08-17 18:39:22 +00:00
row_any.cc	Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.	2017-08-17 18:39:22 +00:00
row_common.cc	GaussRow_NEON from int to short	2017-08-24 01:09:23 +00:00
row_dspr2.cc	mips switch sgtu to sltu for clang in ndk r14	2017-05-02 21:34:13 +00:00
row_gcc.cc	HalfFloat SSE2/AVX2 optimized port scheduling.	2017-02-11 01:02:06 +00:00
row_msa.cc	Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.	2017-08-17 18:39:22 +00:00
row_neon64.cc	Gaussian reorder for benefit of A73	2017-08-25 19:00:05 +00:00
row_neon.cc	Remove ARM NaCL macros from source	2017-06-09 22:22:07 +00:00
row_win.cc	scale float samples and return max value	2017-08-04 23:34:30 +00:00
scale_any.cc	Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions	2017-08-18 17:23:27 +00:00
scale_argb.cc	Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions	2017-08-18 17:23:27 +00:00
scale_common.cc	clang-format 5.0 applied to libyuv	2017-03-08 18:50:12 +00:00
scale_dspr2.cc	Rename mips source files to dspr2.	2017-01-27 23:11:43 +00:00
scale_gcc.cc	scale warning fixes for unused parameters	2017-02-15 21:38:59 +00:00
scale_msa.cc	Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions	2017-08-18 17:23:27 +00:00
scale_neon64.cc	Gauss unittest, Scale comments for neon64 half size updated	2017-08-21 23:41:46 +00:00
scale_neon.cc	Remove ARM NaCL macros from source	2017-06-09 22:22:07 +00:00
scale_win.cc	scale float samples and return max value	2017-08-04 23:34:30 +00:00
scale.cc	Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions	2017-08-18 17:23:27 +00:00
video_common.cc	clang-format libyuv	2016-11-07 17:37:23 -08:00