mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2025-12-09 10:16:46 +08:00
Roughly. instead of 4 loads and 8 multiples, use 1 load and 2 multiples 4 times over. The original code, as with the C code from clang and gcc, did all the loads, then all the math, then the store. The new code does a load, then the math, then the next load, etc. This schedules better on current arm 64 cpus. Number of registers also reduced, reusing the same registers. HiSilicon ARM A73: Now TestGaussRow_Opt (890 ms) TestGaussCol_Opt (571 ms) Was TestGaussRow_Opt (1061 ms) TestGaussCol_Opt (595 ms) Qualcomm 821 (Pixel): Now TestGaussRow_Opt (571 ms) TestGaussCol_Opt (474 ms) Was TestGaussRow_Opt (751 ms) TestGaussCol_Opt (520 ms) TBR=kjellander@chromium.org BUG=libyuv:719 TEST=LibYUVPlanarTest.TestGaussRow_Opt Reviewed-on: https://chromium-review.googlesource.com/627478 Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Change-Id: I5ec81191d460801f0d4a89f0384f89925ff036de Reviewed-on: https://chromium-review.googlesource.com/634448 Commit-Queue: Frank Barchard <fbarchard@google.com> |
||
|---|---|---|
| .. | ||
| testdata | ||
| basictypes_test.cc | ||
| color_test.cc | ||
| compare_test.cc | ||
| convert_test.cc | ||
| cpu_test.cc | ||
| cpu_thread_test.cc | ||
| math_test.cc | ||
| planar_test.cc | ||
| rotate_argb_test.cc | ||
| rotate_test.cc | ||
| scale_argb_test.cc | ||
| scale_test.cc | ||
| unit_test.cc | ||
| unit_test.h | ||
| video_common_test.cc | ||