114 Commits

Author SHA1 Message Date
Frank Barchard
e1f6c1c0b5 tidy applied with readability-inconsistent-declaration-parameter-name
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281
Reviewed-on: https://chromium-review.googlesource.com/902170
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2018-02-07 00:24:25 +00:00
Frank Barchard
5790a765b9 I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-02 23:57:35 +00:00
Frank Barchard
7ff53f324c I422ToYUY2Row_AVX2 use vpmovzxbd instead of vpermq
I422ToYUY2Row_AVX2 optimized from 7 cycles per 32 pixels to 6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
lea        0x10(%1),%1
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

I422ToYUY2Row_AVX2 optimization

Improve performance of AVX2 code by avoiding vpermq

Bug: libyuv:556
Test: /usr/local/google/home/fbarchard/iaca-lin64/bin/iaca.sh -reduceout -arch BDW out/Release/obj/libyuv_internal/row_gcc.o
Change-Id: Ie36732da23ecea1ffcc6b297bacc962780b59ef1
Reviewed-on: https://chromium-review.googlesource.com/898067
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-02 18:57:49 +00:00
Frank Barchard
664c735677 I420ToYUY2_AVX2 port
I420 and I422 To YUY2 and UYVY ported from SSE2 to AVX2.

Was SSE2
I420ToYUY2_Opt (135 ms)
I420ToUYVY_Opt (148 ms)
I422ToYUY2_Opt (145 ms)
I422ToUYVY_Opt (142 ms)

Now AVX2
I420ToYUY2_Opt (133 ms)
I420ToUYVY_Opt (130 ms)
I422ToYUY2_Opt (127 ms)
I422ToUYVY_Opt (137 ms)

Bug: libyuv:556
Test: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*I42?To*UY*Opt
Change-Id: Ic35f97cee02dc009fd98785589ba17c7cf50bb35
Reviewed-on: https://chromium-review.googlesource.com/892493
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-01 00:33:25 +00:00
Frank Barchard
ffec313dbe ABGRToAR30 used AVX2 with reversed shuffler
vpshufb is used to reverse R and B channels;
Code is otherwise the same as ARGBToAR30.

Bug: libyuv:751
Test: ABGRToAR30 unittest
Change-Id: I30e02925f5c729e4496c5963ba4ba4af16633b3b
Reviewed-on: https://chromium-review.googlesource.com/891807
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-29 22:31:31 +00:00
Frank Barchard
ed96b7b2c7 AVX2 port of H010ToAR30_AVX2
Was SSSE3 H010ToAR30_Opt (635 ms)
Now AVX2  H010ToAR30_Opt (448 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I17b1a0e3268c4a9836e09683dd3377fb1ce60932
Reviewed-on: https://chromium-review.googlesource.com/889906
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-27 00:14:27 +00:00
Frank Barchard
c95fd57993 AVX2 port of I010ToAR30_AVX2
Was SSSE3 I420ToAR30_Opt (635 ms)
Now AVX2  I420ToAR30_Opt (446 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I261be19ec981136a8f453ae0d3211532a790e5c5
Reviewed-on: https://chromium-review.googlesource.com/887750
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-26 02:12:07 +00:00
Frank Barchard
92e22cf5b6 Lint cleanup after C99 change CL
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: git cl lint
Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743
Reviewed-on: https://chromium-review.googlesource.com/882217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 19:16:03 +00:00
Frank Barchard
7e389884a1 Switch to C99 types
Append _t to all sized types.
uint64 becomes uint64_t etc

Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Frank Barchard
8af6ea4100 I420ToAR30 in 1 step SSSE3 assembly
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToAR30_Opt
Change-Id: Ie89c3eb2526354cf11175746bc8af72be83a1e00
Reviewed-on: https://chromium-review.googlesource.com/877541
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-23 01:33:10 +00:00
Frank Barchard
09db0c4ce2 H010ToAR30 in 1 step with SSSE3 assembly
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.

[ RUN      ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[       OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[       OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-19 19:46:58 +00:00
Frank Barchard
ecab5430c2 Remove MEMOPREG x64 NaCL macros
MEMOPREG macros are deprecated in row.h

Regular expressions to remove MEMOPREG macros:

MEMOPREG(movd, 0x00, [u_buf], [v_buf], 1, xmm1)                            \
MEMOPREG\((.*), (.*), (.*), (.*), (.*), (.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6           \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: If8743abd9af2e8c549d0c7d3d49733a9b0f0ca86
Reviewed-on: https://chromium-review.googlesource.com/865964
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-16 19:10:44 +00:00
Frank Barchard
b33e0f97e7 Remove MEMOPMEM x64 NaCL macros
MEMOPMEM macros are deprecated in row.h

Usage examples
    MEMOPMEM(vmovdqu,ymm0,0x00,0,1,1)          //  vmovdqu %%ymm0,(%0,%1)
    MEMOPMEM(movdqu,xmm2,0x00,1,0,1)

Regular expressions to remove MEMACCESS macros:

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    %%\2,\3(%\4,%\5,\6)\7 \\n"

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    %%\2,\3(%\4,%\5,\6)            \\n"

TBR=braveyao@chromium.org
Bug: libyuv:702
Test: try bots pass
Change-Id: Id8c6963d544d16e39bb6a9a0536babfb7f554b3a
Reviewed-on: https://chromium-review.googlesource.com/865934
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-13 01:33:21 +00:00
Frank Barchard
a875ed173d Remove VMEMOPREG x64 NaCL macros
VMEMOPREG macros are deprecated in row.h

Usage examples
    VMEMOPREG(vpavgb,0x00,0,4,1,ymm0,ymm0)     // vpavgb (%0,%4,1),%%ymm0,%%ymm0
    VMEMOPREG(vpavgb,0x20,0,4,1,ymm1,ymm1)

Regular expressions to remove MEMACCESS macros:

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6,%%\7      \\n"

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6,%%\7            \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: I472446606f7fd568fdf33aaacc22d5ed78673dab
Reviewed-on: https://chromium-review.googlesource.com/865640
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 22:54:24 +00:00
Frank Barchard
030042a2ff Remove VEXTOPMEM x64 NaCL macros
VEXTOPMEM macros are deprecated in row.h

Usage examples
    VEXTOPMEM(vextractf128,1,ymm0,0x0,1,2,1) // vextractf128 $1,%%ymm0,(%1,%2,1)

Regular expressions to remove MEMACCESS macros:

VEXTOPMEM\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1 $\2,%\3,\4(%\5,%\6,\7)        \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I177edf9813128408e74816672dd25abb03a5e1ca
Reviewed-on: https://chromium-review.googlesource.com/865283
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 21:16:34 +00:00
Frank Barchard
5088f00165 Remove MEMACCESS x64 NaCL macros
MEMACCESS macros are deprecated in row.h

Usage examples
    "movdqu    " MEMACCESS(0) ",%%xmm0         \n"
    "movdqu    " MEMACCESS2(0x10,0) ",%%xmm1   \n"

Regular expressions to remove MEMACCESS macros:

" MEMACCESS2\((.*),(.*)\) "(.*)\\n"
\1(%\2)\3              \\n"

" MEMACCESS\((.*)\) "(.*)\\n"
(%\1)\2            \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I42f62d5dede8ef2ea643e78c204371a7659d25e6
Reviewed-on: https://chromium-review.googlesource.com/862803
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 20:37:41 +00:00
Frank Barchard
e3797d1765 Remove MEMOPARG x64 NaCL macros
MEMOPARG macros are deprecated in row.h

  #opcode " " #offset "(%" #base ",%" #index "," #scale "),%" #arg "\n"

Usage examples
    MEMOPARG(movzwl,0x00,1,3,1,k2)             //  movzwl  (%1,%3,1),%k2

Regular expression to remove MEMACCESS macro:

MEMOPARG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1    \2(%\3,%\4,\5),%\6                \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I4a5ad2abf5017e651576f4c8c784be1c8dbf5a83
Reviewed-on: https://chromium-review.googlesource.com/863108
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-12 18:26:06 +00:00
Frank Barchard
3694891922 Remove MEMLEA x64 NaCL macros
Bug: libyuv:702
Test: try bots pass
Change-Id: I0ee094551734368f2179c298e7bf423ec80a929c
Reviewed-on: https://chromium-review.googlesource.com/857845
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 19:16:16 +00:00
Frank Barchard
a2142148e9 Remove x64 native_client macros.
Bug: libyuv:702
Test: try bots pass
Change-Id: I76d74b5f02fe9843418108b84742e2f714d1ab0a
Reviewed-on: https://chromium-review.googlesource.com/855656
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 01:27:22 +00:00
Frank Barchard
00d526d4ea H010ToARGB_AVX2 optimized conversion
AVX2 optimized 10 bit YUV to ARGB.

Bug: libyuv:751
Test: H010ToARGB unittest
Change-Id: I705630beb62714b52042c2a5dcdb8b7859e734ae
Reviewed-on: https://chromium-review.googlesource.com/852563
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-09 03:17:33 +00:00
Frank Barchard
55310f92bc Remove NACL_R14 macro
Bug: libyuv:702
Test: try bots still build
Change-Id: I05317e45c885955fcda233bdddbd11ce1d246d90
Reviewed-on: https://chromium-review.googlesource.com/854770
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-08 22:41:15 +00:00
Frank Barchard
9d2cd6a3ef H010ToAR30 optimized to 2 step conversion
Previously H010ToAR30 was done in a 3 step conversion:
H010ToH420, H420ToARGB, ARGBToAR30.
This CL merges the first 2 steps into H010ToARGB, to
improve performance.
Caveat - only 10 bit YUV is supported at this time.
Previously the low level code supported different numbers
of bits - 9, 10, 12 or 16.

Was 3 step conversion:
LibYUVConvertTest.H010ToAR30_Any (1263 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
LibYUVConvertTest.H010ToAR30_Invert (913 ms)
LibYUVConvertTest.H010ToAR30_Opt (901 ms)

Now 2 step conversion:
LibYUVConvertTest.H010ToAR30_Any (853 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
LibYUVConvertTest.H010ToAR30_Invert (781 ms)
LibYUVConvertTest.H010ToAR30_Opt (755 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
Reviewed-on: https://chromium-review.googlesource.com/853288
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-07 08:36:57 +00:00
Frank Barchard
a64658593e I210ToARGB conversion from 10 bit YUV to RGB
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.

Bug: libyuv:751
Test:  I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-05 02:43:38 +00:00
Frank Barchard
140fc0a261 Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.

Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-02 18:57:39 +00:00
Frank Barchard
768f103b8b Convert8To16 for better H010 support
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.

Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-28 22:27:24 +00:00
Frank Barchard
c67db60534 HalfFloat_SSE2 use movd from memory
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned

Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-21 19:37:50 +00:00
Frank Barchard
5336217f11 H010Copy function to copy 16 bit planar formats
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToH010_Opt
Change-Id: I996d309040a14193a97d05b62ac0b3e1ad1ee74b
Reviewed-on: https://chromium-review.googlesource.com/823445
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-15 03:34:34 +00:00
Frank Barchard
bb3180ae80 Add I420ToAR30 10 bit RGB
For more complete support of AR30 format, add I420ToAR30 allowing
the new RGB 10 bit format to be used from standard 8 bit I420 format.

Bug: libyuv:751
Test: I420ToAR30 unittest added
Change-Id: Ia8b0857447408bd6adab485158ce5f38d6dc2faa
Reviewed-on: https://chromium-review.googlesource.com/823243
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 23:40:58 +00:00
Frank Barchard
c367751430 ARGBToAR30 SSSE3 use pmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 20:12:58 +00:00
Frank Barchard
11dd1b956f ARGBToAR30 use vpmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. vpmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. vpshufb is used to shift and mask 2 channels of R and B

Red Blue
With the 8 bit value in the upper bits, vpmulhuw by (1024+4) will produce a 10
bit value in the low 10 bits of each 16 bit value. This is whats wanted for the
blue channel. The red needs to be shifted 4 left, so multiply by (1024+4)*16 for
red.

Alpha Green
Alpha and Green are already in the high bits so vpand can zero out the other
bits, keeping just 2 upper bits of alpha and 8 bit green. The same multiplier
could be used for Green - (1024+4) putting the 10 bit green in the lsb.  Alpha
would be a simple multiplier to shift it into position.  It wants a gap of 10
above the green.  Green is 10 bits, so there are 6 bits in the low short.  4
more are needed, so a multiplier of 4 gets the 2 bits into the upper 16 bits,
and then a shift of 4 is a multiply of 16, so (4*16) = 64.  Then shift the
result left 10 to position the A and G channels.

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: Ie4f20dce18203bae7b75acb1fd5232db8a8a4f11
Reviewed-on: https://chromium-review.googlesource.com/820046
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-12-12 02:57:54 +00:00
Frank Barchard
0f98c3c1df Add ARGBToAR30Row_SSE2 to speed up H010ToAR30
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.

Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-09 00:11:20 +00:00
Frank Barchard
324fa32739 Convert16To8Row_SSSE3 port from AVX2
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used.  This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.

Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-28 19:22:39 +00:00
Lei Zhang
8445617191 Mark a bunch of kArray variables as const.
This allows the linker to move the variables from the .data section to
the .rodata section.

Bug: libyuv:254
Test: out/Release/libyuv_unittest --gtest_filter=* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1

Change-Id: I6998570f1af4337d7b80313d9e18e36aa20d6ec0
Reviewed-on: https://chromium-review.googlesource.com/777033
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-27 23:38:44 +00:00
Frank Barchard
26173eb73e H010ToAR30 for 10 bit bt.709 YUV to 30 bit RGB
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2

Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-22 23:58:30 +00:00
Frank Barchard
a98d6cdb17 ARGBToAR30 AVX2 conversion function
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: I09c13eb53ba5f1ce1740c013dc587f8300f1d9e0
Reviewed-on: https://chromium-review.googlesource.com/780437
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-21 20:37:01 +00:00
Frank Barchard
46594be758 add ScalePlane_16 unit tests
Tests ScalePlane vs ScalePlane_16 match.

Bug: libyuv:749
Test: LibYUVScaleTest.ScalePlaneDownBy4_Box_16
Change-Id: I3f71748da404982d5d48bfb11bbd3ae95a1d021c
Reviewed-on: https://chromium-review.googlesource.com/765045
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-16 01:40:48 +00:00
Frank Barchard
49d1e3b036 MultiplyRow_16_AVX2 for converting 10 bit YUV
When converting from lsb 10 bit formats to msb, the values
need to be shifted to the top 10 bits.  Using a multiply
allows the different numbers of bits to be copied:
// 128 = 9 bits
// 64 = 10 bits
// 16 = 12 bits
// 1 = 16 bits
Bug: libyuv:751
Test: LibYUVPlanarTest.MultiplyRow_16_Opt
Change-Id: I9cf226053a164baa14155215cb175065b1c4f169
Reviewed-on: https://chromium-review.googlesource.com/762951
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 22:02:32 +00:00
Frank Barchard
2f58d126b9 MergeUV10Row_AVX2 use multiply to handle different bit depths
Instead of hardcoded shift, use a multiply by a parameter.
128 = 9 bits
64 = 10 bits
16 = 12 bits
1 = 16 bits

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: Id925edfdbf91243370c90641b50eb8e7625ec329
Reviewed-on: https://chromium-review.googlesource.com/762523
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 03:38:07 +00:00
Frank Barchard
e26b0a7e0e casting for c89 compatibility and lint cleanup
Bug: libyuv:756
Test: CFLAGS="-m32 -static -std=gnu89 -mno-sse -O2" CXXFLAGS="-m32 -x c -static -std=gnu99 -mno-sse -O2" make -f linux.mk libyuv.a
Change-Id: Ic362f93e01ccbb0bea14f361a58585e79297e7d2
Reviewed-on: https://chromium-review.googlesource.com/759423
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Patrik Höglund <phoglund@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 18:22:17 +00:00
Frank Barchard
735ace2ed3 Re-enable x86 assembly without requiring -msse2
clang does not require -msse2 or -msse for inline, except
the "x" parameter.  So change this to "m" for 32 bit.  64 bit
requires sse2 so use "x" for 64 bit.

gcc requires -msse for xmm registers in clobber list.
Reduce compiler requirement from -msse2 to -msse for enabling
assembly.

Bug: libyuv:754, libyuv:757
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I86df72cfee80b7d349561c1fd7c97ad360767255
Reviewed-on: https://chromium-review.googlesource.com/759303
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 00:51:06 +00:00
Frank Barchard
d997ac287d Revert "Enable SSE2 code without -msse"
This reverts commit 01e994d74e4e3937ee1a3efdc048320a1e51f818.

Change-Id: Ie76710d0f4e641e071889c5125fd3be23cdcdb59
Reviewed-on: https://chromium-review.googlesource.com/758499
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 19:33:09 +00:00
Frank Barchard
01e994d74e Enable SSE2 code without -msse
Bug: libyuv:754
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I74bf8d032013694e65ea7637bc38d3253db53ff2
Reviewed-on: https://chromium-review.googlesource.com/758043
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 02:54:41 +00:00
Frank Barchard
a0c32b9e49 MergeUV10Row_AVX2 for converting H010 to P010
H010 is 10 bit planar format with 10 bits in lower bits.
P010 is 10 bit biplanar format with 10 bits in upper bits.
This function weaves the U and V channels and shifts the bits
into the upper bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: I4a0bac0ef1ff95aa1b8d68261ec8e8e86f2d1fbf
Reviewed-on: https://chromium-review.googlesource.com/752692
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-03 18:55:36 +00:00
Frank Barchard
1e16cb5c38 SplitRGBPlane and MergeRGBPlane functions added
Converts packed RGB to planar and back.

TBR=kjellander@chromium.org
BUG=libyuv:728
TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added

Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc
Reviewed-on: https://chromium-review.googlesource.com/658069
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-11 21:02:04 +00:00
Frank Barchard
6825b161d7 HalfFloat SSE2/AVX2 optimized port scheduling.
Uses 1 add instead of 2 leas to reduce port pressure on ports 1 and 5
used for SIMD instructions.

BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -arch HSW out/Release/obj/libyuv/row_gcc.o

Change-Id: I3965ee5dcb49941a535efa611b5988d977f5b65c
Reviewed-on: https://chromium-review.googlesource.com/433391
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-11 01:02:06 +00:00
Frank Barchard
76e7f104ae documentation updates
BUG=None
TEST=Untested

Change-Id: I8ab95654255d1aa9cf05a664ecf59ee6c0757e66
Reviewed-on: https://chromium-review.googlesource.com/434941
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-02 18:31:32 +00:00
Frank Barchard
749e316ed8 Remove commented out code
TEST=None
BUG=libyuv:672
Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169

Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169
Reviewed-on: https://chromium-review.googlesource.com/430321
Reviewed-by: Aaron Gable <agable@chromium.org>
2017-01-20 02:03:12 +00:00
Frank Barchard
a7c87e19f0 add Intel Code Analyst markers
add macros to enable/disable code analyst around blocks of code.

Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.

BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com

Review-Url: https://codereview.chromium.org/2626193002 .
2017-01-13 15:50:24 -08:00
Frank Barchard
3028e1bd97 clang-format row_gcc.cc with some functions disabled
BUG=libyuv:654
TEST=try bots build
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2484083003 .
2016-11-07 18:37:29 -08:00
Frank Barchard
e62309f259 clang-format libyuv
BUG=libyuv:654
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00