Frank Barchard 7ff53f324c I422ToYUY2Row_AVX2 use vpmovzxbd instead of vpermq
I422ToYUY2Row_AVX2 optimized from 7 cycles per 32 pixels to 6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
lea        0x10(%1),%1
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

I422ToYUY2Row_AVX2 optimization

Improve performance of AVX2 code by avoiding vpermq

Bug: libyuv:556
Test: /usr/local/google/home/fbarchard/iaca-lin64/bin/iaca.sh -reduceout -arch BDW out/Release/obj/libyuv_internal/row_gcc.o
Change-Id: Ie36732da23ecea1ffcc6b297bacc962780b59ef1
Reviewed-on: https://chromium-review.googlesource.com/898067
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-02 18:57:49 +00:00
..
compare_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_gcc.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_msa.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_neon64.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_neon.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_win.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_argb.cc AR30ToABGR for 10 to 8 bit RGB on Android 2018-01-29 22:21:42 +00:00
convert_from_argb.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
convert_from.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
convert_jpeg.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_to_argb.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
convert_to_i420.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
convert.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
cpu_id.cc basic_types.h - remove unused macros 2018-01-23 02:24:58 +00:00
mjpeg_decoder.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
mjpeg_validate.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
planar_functions.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
rotate_any.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
rotate_argb.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_common.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_gcc.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_neon64.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_neon.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_win.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_any.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
row_common.cc ABGRToAR30 used AVX2 with reversed shuffler 2018-01-29 22:31:31 +00:00
row_gcc.cc I422ToYUY2Row_AVX2 use vpmovzxbd instead of vpermq 2018-02-02 18:57:49 +00:00
row_msa.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_neon64.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_neon.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_win.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_any.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_argb.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_gcc.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_neon.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_win.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00