Frank Barchard 5790a765b9 I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-02 23:57:35 +00:00
..
basic_types.h Define basic_types backward compatible layer 2018-01-24 00:26:07 +00:00
compare_row.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
compare.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
convert_argb.h AR30ToABGR for 10 to 8 bit RGB on Android 2018-01-29 22:21:42 +00:00
convert_from_argb.h ABGRToAR30 used AVX2 with reversed shuffler 2018-01-29 22:31:31 +00:00
convert_from.h Add H420ToAR30 and a test that does a histogram 2018-01-25 00:36:40 +00:00
convert.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
cpu_id.h Remove Mips DSPR2 code 2017-12-14 18:22:16 +00:00
macros_msa.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
mjpeg_decoder.h Switch to C99 types 2018-01-23 19:16:05 +00:00
planar_functions.h Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate_row.h Switch to C99 types 2018-01-23 19:16:05 +00:00
rotate.h Switch to C99 types 2018-01-23 19:16:05 +00:00
row.h I420ToYUY2_AVX2 port 2018-02-01 00:33:25 +00:00
scale_argb.h Switch to C99 types 2018-01-23 19:16:05 +00:00
scale_row.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00
scale.h Switch to C99 types 2018-01-23 19:16:05 +00:00
version.h I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq 2018-02-02 23:57:35 +00:00
video_common.h Lint cleanup after C99 change CL 2018-01-24 19:16:03 +00:00