Frank Barchard 5790a765b9 I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-02 23:57:35 +00:00
..
compare_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_gcc.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_msa.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_neon64.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_neon.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare_win.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_argb.cc AR30ToABGR for 10 to 8 bit RGB on Android 2018-01-29 22:21:42 +00:00
convert_from_argb.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
convert_from.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
convert_jpeg.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_to_argb.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
convert_to_i420.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
convert.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
cpu_id.cc basic_types.h - remove unused macros 2018-01-23 02:24:58 +00:00
mjpeg_decoder.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
mjpeg_validate.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
planar_functions.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
rotate_any.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
rotate_argb.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_common.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_gcc.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_neon64.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_neon.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_win.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_any.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
row_common.cc ABGRToAR30 used AVX2 with reversed shuffler 2018-01-29 22:31:31 +00:00
row_gcc.cc I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq 2018-02-02 23:57:35 +00:00
row_msa.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_neon64.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_neon.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
row_win.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_any.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_argb.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_gcc.cc I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
scale_msa.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_neon64.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale_neon.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_win.cc Switch to C99 types 2018-01-23 19:16:05 +00:00
scale.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
video_common.cc Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00