1763 Commits

Author SHA1 Message Date
Frank Barchard
36ebec9d46 apply clang-tidy -fix-errors to arm
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I5a6654876bc2e79cfdbbe5c11d5aec2b10b05ef6
Reviewed-on: https://chromium-review.googlesource.com/899844
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-02-05 19:13:05 +00:00
Frank Barchard
5790a765b9 I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-02 23:57:35 +00:00
Frank Barchard
664c735677 I420ToYUY2_AVX2 port
I420 and I422 To YUY2 and UYVY ported from SSE2 to AVX2.

Was SSE2
I420ToYUY2_Opt (135 ms)
I420ToUYVY_Opt (148 ms)
I422ToYUY2_Opt (145 ms)
I422ToUYVY_Opt (142 ms)

Now AVX2
I420ToYUY2_Opt (133 ms)
I420ToUYVY_Opt (130 ms)
I422ToYUY2_Opt (127 ms)
I422ToUYVY_Opt (137 ms)

Bug: libyuv:556
Test: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*I42?To*UY*Opt
Change-Id: Ic35f97cee02dc009fd98785589ba17c7cf50bb35
Reviewed-on: https://chromium-review.googlesource.com/892493
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-01 00:33:25 +00:00
Frank Barchard
ffec313dbe ABGRToAR30 used AVX2 with reversed shuffler
vpshufb is used to reverse R and B channels;
Code is otherwise the same as ARGBToAR30.

Bug: libyuv:751
Test: ABGRToAR30 unittest
Change-Id: I30e02925f5c729e4496c5963ba4ba4af16633b3b
Reviewed-on: https://chromium-review.googlesource.com/891807
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-29 22:31:31 +00:00
Frank Barchard
ff8ab9baf1 AR30ToABGR for 10 to 8 bit RGB on Android
ABGR is the more common format on Android.
This CL converts 10 bit AR30, to standard 8 bit ABGR.
Unoptimized but allows better testing and feature completeness.

Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToABGR_Opt
Change-Id: I0c7e7273158be215129e0a1d355587ae15942299
Reviewed-on: https://chromium-review.googlesource.com/891694
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-29 22:21:42 +00:00
Frank Barchard
ed96b7b2c7 AVX2 port of H010ToAR30_AVX2
Was SSSE3 H010ToAR30_Opt (635 ms)
Now AVX2  H010ToAR30_Opt (448 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I17b1a0e3268c4a9836e09683dd3377fb1ce60932
Reviewed-on: https://chromium-review.googlesource.com/889906
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-27 00:14:27 +00:00
Frank Barchard
c95fd57993 AVX2 port of I010ToAR30_AVX2
Was SSSE3 I420ToAR30_Opt (635 ms)
Now AVX2  I420ToAR30_Opt (446 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I261be19ec981136a8f453ae0d3211532a790e5c5
Reviewed-on: https://chromium-review.googlesource.com/887750
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-26 02:12:07 +00:00
Frank Barchard
3f43ecc029 Add H420ToAR30 and a test that does a histogram
[ RUN      ] LibYUVConvertTest.TestH420ToAR30
uniques: B 222, G, 222, R 222
[       OK ] LibYUVConvertTest.TestH420ToAR30 (0 ms)
[ RUN      ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)

Bug: libyuv: 751
Test: LibYUVConvertTest.TestH420ToAR30
Change-Id: I9b75af286124c058c24799778a58c3feb9a1a1ab
Reviewed-on: https://chromium-review.googlesource.com/884845
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-25 00:36:40 +00:00
Frank Barchard
92e22cf5b6 Lint cleanup after C99 change CL
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: git cl lint
Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743
Reviewed-on: https://chromium-review.googlesource.com/882217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 19:16:03 +00:00
Frank Barchard
f1c5345046 Define basic_types backward compatible layer
Use C99 types internally but define old types for compatibility
with older API.  (r1690 and earlier)

TBR=braveyao@chromium.org
Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: I06f89537da3875f74e65189897e67b69af2c2ec2
Reviewed-on: https://chromium-review.googlesource.com/882501
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 00:26:07 +00:00
Frank Barchard
7e389884a1 Switch to C99 types
Append _t to all sized types.
uint64 becomes uint64_t etc

Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Frank Barchard
13771ffaad basic_types.h - remove unused macros
Removes macros that were part of standard basic_types
header but not used by libyuv itself.

TBR=braveyao@chromium.org
Bug: libyuv:774
Test: try bots still build
Change-Id: I8de6fad5a9277df0a50959881392ba212b1b5972
Reviewed-on: https://chromium-review.googlesource.com/879591
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-23 02:24:58 +00:00
Frank Barchard
8af6ea4100 I420ToAR30 in 1 step SSSE3 assembly
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToAR30_Opt
Change-Id: Ie89c3eb2526354cf11175746bc8af72be83a1e00
Reviewed-on: https://chromium-review.googlesource.com/877541
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-23 01:33:10 +00:00
Frank Barchard
09db0c4ce2 H010ToAR30 in 1 step with SSSE3 assembly
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.

[ RUN      ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[       OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[       OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-19 19:46:58 +00:00
Frank Barchard
ecab5430c2 Remove MEMOPREG x64 NaCL macros
MEMOPREG macros are deprecated in row.h

Regular expressions to remove MEMOPREG macros:

MEMOPREG(movd, 0x00, [u_buf], [v_buf], 1, xmm1)                            \
MEMOPREG\((.*), (.*), (.*), (.*), (.*), (.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6           \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: If8743abd9af2e8c549d0c7d3d49733a9b0f0ca86
Reviewed-on: https://chromium-review.googlesource.com/865964
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-16 19:10:44 +00:00
Frank Barchard
b33e0f97e7 Remove MEMOPMEM x64 NaCL macros
MEMOPMEM macros are deprecated in row.h

Usage examples
    MEMOPMEM(vmovdqu,ymm0,0x00,0,1,1)          //  vmovdqu %%ymm0,(%0,%1)
    MEMOPMEM(movdqu,xmm2,0x00,1,0,1)

Regular expressions to remove MEMACCESS macros:

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    %%\2,\3(%\4,%\5,\6)\7 \\n"

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    %%\2,\3(%\4,%\5,\6)            \\n"

TBR=braveyao@chromium.org
Bug: libyuv:702
Test: try bots pass
Change-Id: Id8c6963d544d16e39bb6a9a0536babfb7f554b3a
Reviewed-on: https://chromium-review.googlesource.com/865934
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-13 01:33:21 +00:00
Frank Barchard
a875ed173d Remove VMEMOPREG x64 NaCL macros
VMEMOPREG macros are deprecated in row.h

Usage examples
    VMEMOPREG(vpavgb,0x00,0,4,1,ymm0,ymm0)     // vpavgb (%0,%4,1),%%ymm0,%%ymm0
    VMEMOPREG(vpavgb,0x20,0,4,1,ymm1,ymm1)

Regular expressions to remove MEMACCESS macros:

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6,%%\7      \\n"

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6,%%\7            \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: I472446606f7fd568fdf33aaacc22d5ed78673dab
Reviewed-on: https://chromium-review.googlesource.com/865640
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 22:54:24 +00:00
Frank Barchard
030042a2ff Remove VEXTOPMEM x64 NaCL macros
VEXTOPMEM macros are deprecated in row.h

Usage examples
    VEXTOPMEM(vextractf128,1,ymm0,0x0,1,2,1) // vextractf128 $1,%%ymm0,(%1,%2,1)

Regular expressions to remove MEMACCESS macros:

VEXTOPMEM\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1 $\2,%\3,\4(%\5,%\6,\7)        \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I177edf9813128408e74816672dd25abb03a5e1ca
Reviewed-on: https://chromium-review.googlesource.com/865283
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 21:16:34 +00:00
Frank Barchard
5088f00165 Remove MEMACCESS x64 NaCL macros
MEMACCESS macros are deprecated in row.h

Usage examples
    "movdqu    " MEMACCESS(0) ",%%xmm0         \n"
    "movdqu    " MEMACCESS2(0x10,0) ",%%xmm1   \n"

Regular expressions to remove MEMACCESS macros:

" MEMACCESS2\((.*),(.*)\) "(.*)\\n"
\1(%\2)\3              \\n"

" MEMACCESS\((.*)\) "(.*)\\n"
(%\1)\2            \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I42f62d5dede8ef2ea643e78c204371a7659d25e6
Reviewed-on: https://chromium-review.googlesource.com/862803
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 20:37:41 +00:00
Frank Barchard
e3797d1765 Remove MEMOPARG x64 NaCL macros
MEMOPARG macros are deprecated in row.h

  #opcode " " #offset "(%" #base ",%" #index "," #scale "),%" #arg "\n"

Usage examples
    MEMOPARG(movzwl,0x00,1,3,1,k2)             //  movzwl  (%1,%3,1),%k2

Regular expression to remove MEMACCESS macro:

MEMOPARG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1    \2(%\3,%\4,\5),%\6                \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I4a5ad2abf5017e651576f4c8c784be1c8dbf5a83
Reviewed-on: https://chromium-review.googlesource.com/863108
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-12 18:26:06 +00:00
Frank Barchard
3694891922 Remove MEMLEA x64 NaCL macros
Bug: libyuv:702
Test: try bots pass
Change-Id: I0ee094551734368f2179c298e7bf423ec80a929c
Reviewed-on: https://chromium-review.googlesource.com/857845
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 19:16:16 +00:00
Frank Barchard
a2142148e9 Remove x64 native_client macros.
Bug: libyuv:702
Test: try bots pass
Change-Id: I76d74b5f02fe9843418108b84742e2f714d1ab0a
Reviewed-on: https://chromium-review.googlesource.com/855656
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 01:27:22 +00:00
Frank Barchard
00d526d4ea H010ToARGB_AVX2 optimized conversion
AVX2 optimized 10 bit YUV to ARGB.

Bug: libyuv:751
Test: H010ToARGB unittest
Change-Id: I705630beb62714b52042c2a5dcdb8b7859e734ae
Reviewed-on: https://chromium-review.googlesource.com/852563
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-09 03:17:33 +00:00
Frank Barchard
55310f92bc Remove NACL_R14 macro
Bug: libyuv:702
Test: try bots still build
Change-Id: I05317e45c885955fcda233bdddbd11ce1d246d90
Reviewed-on: https://chromium-review.googlesource.com/854770
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-08 22:41:15 +00:00
Frank Barchard
50f9e618fa Add H010ToABGR, I010ToABGR and I010ToARGB functions
ABGR output is implemented using the same source code as ARGB, by swapping
the u and v and supplying the mirrored conversion matrix.
ABGR format (RGBA in memory) is popular on Android.

Bug: libyuv:751
Test: H010ToABGR, I010ToABGR and I010ToARGB unittests

Change-Id: I0b5103628c58dcb22a6442c03814d4d5972e0339
Reviewed-on: https://chromium-review.googlesource.com/852985
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-08 17:40:33 +00:00
Frank Barchard
a64658593e I210ToARGB conversion from 10 bit YUV to RGB
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.

Bug: libyuv:751
Test:  I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-05 02:43:38 +00:00
Frank Barchard
1e4600be3b Remove unused ARGBAttenuateRow_Any_SSE2 prototype
Bug: libyuv:769
Test: try bots build
Change-Id: I9633637cee1dc17bc62dd0598b1ea1edc15cf646
Reviewed-on: https://chromium-review.googlesource.com/847702
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-02 23:28:25 +00:00
Frank Barchard
2ed2402fa0 I420ToI010 for 8 to 10 bit YUV conversion.
Convert planar 8 bit formats to planar 16 bit formats.

Includes msan fix for Convert8To16Row_Opt unittest.

I420 is YUV bt.601 8 bits per channel with 420 subsampling.
I010 is YUV bt.601 10 bits per channel with 420 subsampling.
I is color space - bt.601.  The function does no color space
 conversion so H420ToI010 is aliased to this function as well.
0 = 420 subsampling.  The chroma channels are half width / height.
10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.

For SSSE3 version:
out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
[ RUN      ] LibYUVConvertTest.I420ToI010_Opt
[       OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.I420ToI010_Opt
Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
Reviewed-on: https://chromium-review.googlesource.com/846421
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-02 21:09:39 +00:00
Frank Barchard
140fc0a261 Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.

Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-02 18:57:39 +00:00
Frank Barchard
768f103b8b Convert8To16 for better H010 support
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.

Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-28 22:27:24 +00:00
Frank Barchard
c67db60534 HalfFloat_SSE2 use movd from memory
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned

Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-21 19:37:50 +00:00
Frank Barchard
790054ff03 Add AR30ToARGB function
Initial AR30ToARGB function to allow converion
from AR30 to other formats if necessary and/or
for testing.
Not optimized at this point.

Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: I38ef192315240f3caa7aee0218b38d5e88a2849f
Reviewed-on: https://chromium-review.googlesource.com/833025
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-19 01:54:42 +00:00
Frank Barchard
5336217f11 H010Copy function to copy 16 bit planar formats
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToH010_Opt
Change-Id: I996d309040a14193a97d05b62ac0b3e1ad1ee74b
Reviewed-on: https://chromium-review.googlesource.com/823445
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-15 03:34:34 +00:00
Frank Barchard
3b81288ece Remove Mips DSPR2 code
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-14 18:22:16 +00:00
Frank Barchard
bb3180ae80 Add I420ToAR30 10 bit RGB
For more complete support of AR30 format, add I420ToAR30 allowing
the new RGB 10 bit format to be used from standard 8 bit I420 format.

Bug: libyuv:751
Test: I420ToAR30 unittest added
Change-Id: Ia8b0857447408bd6adab485158ce5f38d6dc2faa
Reviewed-on: https://chromium-review.googlesource.com/823243
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 23:40:58 +00:00
Frank Barchard
c367751430 ARGBToAR30 SSSE3 use pmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 20:12:58 +00:00
Frank Barchard
0f98c3c1df Add ARGBToAR30Row_SSE2 to speed up H010ToAR30
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.

Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-09 00:11:20 +00:00
Frank Barchard
aabe380890 H010ToAR30 and H010ToARGB optimized YUV buffering
Reduce allocations of row buffers to 1 alloc/free.
Do 2 rows at a time to avoid converting U and V planes twice.

Bug: libyuv:715
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I2f3a03b4875df5e3b969112a78a1a0b28399fa2f
Reviewed-on: https://chromium-review.googlesource.com/816021
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-12-08 18:55:03 +00:00
Frank Barchard
3541e46a7e Add H010ToARGB for 10 bit YUV to ARGB
Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToARGB_Opt
Change-Id: I668d3f3810e59a4fb6611503aae1c8edc7d596e7
Reviewed-on: https://chromium-review.googlesource.com/815015
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-07 20:17:50 +00:00
Frank Barchard
49d9b1039b NV21ToABGR for Android camera conversions
Bug: libyuv:762
Test: NV21ToABGR unittest
Change-Id: I71448ab83930339083f07eeafccf240c6cb41c48
Reviewed-on: https://chromium-review.googlesource.com/795212
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-11-30 20:29:28 +00:00
Frank Barchard
324fa32739 Convert16To8Row_SSSE3 port from AVX2
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used.  This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.

Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-28 19:22:39 +00:00
Frank Barchard
26173eb73e H010ToAR30 for 10 bit bt.709 YUV to 30 bit RGB
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2

Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-22 23:58:30 +00:00
Frank Barchard
a98d6cdb17 ARGBToAR30 AVX2 conversion function
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: I09c13eb53ba5f1ce1740c013dc587f8300f1d9e0
Reviewed-on: https://chromium-review.googlesource.com/780437
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-21 20:37:01 +00:00
Frank Barchard
19a126ddfa Add AR30 fourcc unittest
Bug: libyuv:749
Test: LibYUVBaseTest.TestFourCC
Change-Id: Iec378947248840c7e2cd87b1198503f39e7c7258
Reviewed-on: https://chromium-review.googlesource.com/780619
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-11-20 23:52:01 +00:00
Frank Barchard
a37fe16557 Add AR30 fourcc
Bug: libyuv:749
Test: none
Change-Id: Icdfb0ff7bb5886d73498f4d88ca4629b2dc3425c
Reviewed-on: https://chromium-review.googlesource.com/780443
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-20 23:09:50 +00:00
Frank Barchard
f2978400d5 Document AR30 format
Bug: libyuv:751
Test: none
Change-Id: If6d5e7b9c5e6e8d2a272e03ce5a1cc199ef364ca
Reviewed-on: https://chromium-review.googlesource.com/779980
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-20 22:05:45 +00:00
Frank Barchard
12c904a97c H420ToRAW and H420ToRGB24 added for bt.709 support.
Bug: libyuv:760
Test: LibYUVConvertTest.H420ToRAW_Opt
Change-Id: I050385f477309d5db02bb2218088f224c83392ed
Reviewed-on: https://chromium-review.googlesource.com/775785
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-17 01:20:05 +00:00
Frank Barchard
46594be758 add ScalePlane_16 unit tests
Tests ScalePlane vs ScalePlane_16 match.

Bug: libyuv:749
Test: LibYUVScaleTest.ScalePlaneDownBy4_Box_16
Change-Id: I3f71748da404982d5d48bfb11bbd3ae95a1d021c
Reviewed-on: https://chromium-review.googlesource.com/765045
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-16 01:40:48 +00:00
Frank Barchard
49d1e3b036 MultiplyRow_16_AVX2 for converting 10 bit YUV
When converting from lsb 10 bit formats to msb, the values
need to be shifted to the top 10 bits.  Using a multiply
allows the different numbers of bits to be copied:
// 128 = 9 bits
// 64 = 10 bits
// 16 = 12 bits
// 1 = 16 bits
Bug: libyuv:751
Test: LibYUVPlanarTest.MultiplyRow_16_Opt
Change-Id: I9cf226053a164baa14155215cb175065b1c4f169
Reviewed-on: https://chromium-review.googlesource.com/762951
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 22:02:32 +00:00
Frank Barchard
2f58d126b9 MergeUV10Row_AVX2 use multiply to handle different bit depths
Instead of hardcoded shift, use a multiply by a parameter.
128 = 9 bits
64 = 10 bits
16 = 12 bits
1 = 16 bits

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: Id925edfdbf91243370c90641b50eb8e7625ec329
Reviewed-on: https://chromium-review.googlesource.com/762523
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 03:38:07 +00:00
Frank Barchard
e26b0a7e0e casting for c89 compatibility and lint cleanup
Bug: libyuv:756
Test: CFLAGS="-m32 -static -std=gnu89 -mno-sse -O2" CXXFLAGS="-m32 -x c -static -std=gnu99 -mno-sse -O2" make -f linux.mk libyuv.a
Change-Id: Ic362f93e01ccbb0bea14f361a58585e79297e7d2
Reviewed-on: https://chromium-review.googlesource.com/759423
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Patrik Höglund <phoglund@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 18:22:17 +00:00
Frank Barchard
735ace2ed3 Re-enable x86 assembly without requiring -msse2
clang does not require -msse2 or -msse for inline, except
the "x" parameter.  So change this to "m" for 32 bit.  64 bit
requires sse2 so use "x" for 64 bit.

gcc requires -msse for xmm registers in clobber list.
Reduce compiler requirement from -msse2 to -msse for enabling
assembly.

Bug: libyuv:754, libyuv:757
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I86df72cfee80b7d349561c1fd7c97ad360767255
Reviewed-on: https://chromium-review.googlesource.com/759303
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 00:51:06 +00:00
Frank Barchard
68f852d835 Remove DISABLE_CLANG_MSA
cleanup to remove ifdefs around functions affected by
a clang bug.

gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest

Bug: libyuv:634
Test: build for mips with clang
Change-Id: I278b368dbb2fe89082240e280267d0a27a214c78
Reviewed-on: https://chromium-review.googlesource.com/757980
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-08 19:55:14 +00:00
Frank Barchard
d997ac287d Revert "Enable SSE2 code without -msse"
This reverts commit 01e994d74e4e3937ee1a3efdc048320a1e51f818.

Change-Id: Ie76710d0f4e641e071889c5125fd3be23cdcdb59
Reviewed-on: https://chromium-review.googlesource.com/758499
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 19:33:09 +00:00
Frank Barchard
01e994d74e Enable SSE2 code without -msse
Bug: libyuv:754
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I74bf8d032013694e65ea7637bc38d3253db53ff2
Reviewed-on: https://chromium-review.googlesource.com/758043
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 02:54:41 +00:00
Frank Barchard
522fd699e6 AVX512 feature detects for cnl and icl
Key instruction sets added for each microarchitecture:

AVX512BW, AVX512VL, AVX512DQ - skylake server or later
AVX512_VBMI, AVX512_IFMA - cannon lake or later
AVX512_BITALG, AVX512_VBMI2, AVX512_VPOPCNTDQ, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ - ice lake or later

Bug: libyuv:752
Test: ~/intelsde/sde -icl -- out/Release/libyuv_unittest --gtest_filter=*Cpu*
Change-Id: I9ee28904c90009d66721b9f805a440c5fc2da122
Reviewed-on: https://chromium-review.googlesource.com/755617
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-07 00:56:37 +00:00
Frank Barchard
a0c32b9e49 MergeUV10Row_AVX2 for converting H010 to P010
H010 is 10 bit planar format with 10 bits in lower bits.
P010 is 10 bit biplanar format with 10 bits in upper bits.
This function weaves the U and V channels and shifts the bits
into the upper bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: I4a0bac0ef1ff95aa1b8d68261ec8e8e86f2d1fbf
Reviewed-on: https://chromium-review.googlesource.com/752692
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-03 18:55:36 +00:00
Frank Barchard
80077a80c2 HammingDistance_X86 using popcnt assembly
popcnt has a fake dependency on the destination.
This assembly avoids the dependency by using a different
register for each popcnt.

Bug: libyuv:701
Test: LIBYUV_DISABLE_SSSE3=1 out/Release/libyuv_unittest --gtest_filter=*Ham*Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: Ie1d202e2613b7fa8a3c02acd433940e92c80eafa
Reviewed-on: https://chromium-review.googlesource.com/731826
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-23 21:15:12 +00:00
Frank Barchard
8fa02df3c0 mingw fix ifdefs to use gcc source
mingw gcc sets the macro _M_IX86 which is normally only set
by Visual C and clangcl which are Visual C style source code
style for assembly, but gcc is not Visual C compatible.
Add _MSC_VER to most ifdefs to detect that its really Visual C
or clangcl and not mingw gcc so the gcc source code will be used.

Bug: libyuv:744
Test: CXXFLAGS=-m32 CXX=~/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-g++ make -f linux.mk
Change-Id: I3431aa486eb769b145faa8d5eb75ed639f9d6f5e
Reviewed-on: https://chromium-review.googlesource.com/722319
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-17 17:36:35 +00:00
Frank Barchard
1cebe2c622 TestHammingDistance_Opt to test low level matches C reference.
The low level hamming distance functions have size limitations
based on counter sizes.  The higher level calls the low level
in blocks that avoid overflow and then accumulators in int64.
This test compares the results of the low levels to the high
level and against a known value (all ones) to ensure the
count is correct for any specified size.
The the size is very large, the result is expected to be
different.

Bug: libyuv:701
Test: TestHammingDistance_Opt
Change-Id: I6716af7cd09ac4d88a8afa25bc845a1b62af7c93
Reviewed-on: https://chromium-review.googlesource.com/710800
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-11 20:21:31 +00:00
Frank Barchard
60f433fbd9 Revert "ComputeHammingDistance reduce SIMD loop to 1 call when possible."
This reverts commit ec75df5894845b8d6b1341885a78db1de83decd8.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> ComputeHammingDistance reduce SIMD loop to 1 call when possible.
> 
> 32 bit x86 has high overhead due to -fpic.  So this reduces the
> number of calls by 1.
> 
> TBR=kjellander@chromium.org
> Bug: libyuv:701
> Test: BenchmarkHammingDistance
> Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
> Reviewed-on: https://chromium-review.googlesource.com/701755
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Reviewed-by: Cheng Wang <wangcheng@google.com>

TBR=rrwinterton@gmail.com,fbarchard@google.com,wangcheng@google.com

Change-Id: Ia61e8558a8f083c14be5f51e0e141550b6f2b5c1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: libyuv:701
Reviewed-on: https://chromium-review.googlesource.com/707823
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-10 01:16:15 +00:00
Frank Barchard
ec75df5894 ComputeHammingDistance reduce SIMD loop to 1 call when possible.
32 bit x86 has high overhead due to -fpic.  So this reduces the
number of calls by 1.

TBR=kjellander@chromium.org
Bug: libyuv:701
Test: BenchmarkHammingDistance
Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
Reviewed-on: https://chromium-review.googlesource.com/701755
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-10-09 22:51:23 +00:00
Frank Barchard
1734712a6f Fix odd length HammingDistance
If length of HammingDistance was not a multiple of 4,
the result was incorrect.  The old tests did not catch this
so a new test is done to count 1s.

Bug: libyuv:740
Test: LibYUVCompareTest.TestHammingDistance
Change-Id: I93db5437821c597f1f162ac263d4a594bb83231f
Reviewed-on: https://chromium-review.googlesource.com/699614
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-04 22:21:36 +00:00
Frank Barchard
fecd741794 Port HammingDistance to SSSE3
Bug: libyuv:701
Test: BenchmarkHammingDistance_Opt
Change-Id: Ibdd5d382677ebef4f82a62e0d5c3b88614a3b6e4
Reviewed-on: https://chromium-review.googlesource.com/696290
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-10-03 19:11:05 +00:00
Frank Barchard
bde789b176 Hamming Distance SSE2 and AVX2 optimized
Bug: None
Test: None
Change-Id: Id52663f9c957aac3172fba92d888ad1b041d5cf0
Reviewed-on: https://chromium-review.googlesource.com/692981
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-02 22:32:54 +00:00
Frank Barchard
efbf15754a Step thru full color test by increments of 5 for better test speed.
Full color test is the slowest of the unittests, and not catching any
additional bugs at the moment.  Step thru range of 0 to 255 in steps of
5 to speed up the test.  255 is 3 * 5 * 17, so any of those primes would
hit 0 and 255 exactly.

Was LibYUVColorTest.TestFullYUV (896 ms)
Now LibYUVColorTest.TestFullYUV (212 ms)

TBR=kjellander@chromium.org
Bug: libyuv:736
Test: LibYUVColorTest.TestFullYUV
Change-Id: I5b55fb07ada0dc7bdc3c3c20569d36bf09bb3804
Reviewed-on: https://chromium-review.googlesource.com/672064
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-19 02:01:53 +00:00
Frank Barchard
00c501fe43 Cast xgetbv from int64 to int to avoid Visual C warning.
TBR=kjellander@chromium.org
Bug: libyuv:735
Test: try bots
Change-Id: I00dc06689cd0a23847865c0c8edeb538b0cc81ac
Reviewed-on: https://chromium-review.googlesource.com/669142
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-15 22:00:52 +00:00
Frank Barchard
753a91cbcb fix fmov build error on gcc 4.7 for neon64
TBR=kjellander@chromium.org
BUG=libyuv:732
TEST=LibYUVPlanarTest.TestScaleSumSamples_Opt

Change-Id: If80e9510ad5668b080b9384e656c0bd73cf5b4a6
Reviewed-on: https://chromium-review.googlesource.com/663764
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-12 22:46:33 +00:00
Frank Barchard
1e16cb5c38 SplitRGBPlane and MergeRGBPlane functions added
Converts packed RGB to planar and back.

TBR=kjellander@chromium.org
BUG=libyuv:728
TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added

Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc
Reviewed-on: https://chromium-review.googlesource.com/658069
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-11 21:02:04 +00:00
Frank Barchard
367c0d8f81 enable MSA for clang
clang version 6.0.0 (trunk 310694) is able to compile MSA code.
Previous versions had an issue with _msa_fill_w(v32)
In this CL the macro DISABLE_CLANG_MSA is not set, allowing clang
to build the full MSA source.

TBR=kjellander@chromium.org
BUG=libyuv:715
TEST=gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest

Change-Id: I47401e3b1a3e4c57d9626ec2d3cd131c3ccf613c
Reviewed-on: https://chromium-review.googlesource.com/656501
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-07 23:50:12 +00:00
Manojkumar Bhosale
2621c91bf1 Add MSA optimized HammingDistance and SumSquareError functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Id0126ba5aff38817525b1efa6044f1dc2cfa1a36
Reviewed-on: https://chromium-review.googlesource.com/625739
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-05 21:32:33 +00:00
Frank Barchard
0acc67712f clang format / lint cleanup for arm scale functions
TBR=kjellander@chromium.org
BUG=libyuv:725
TEST=lint

Change-Id: I76f777427f9b1458faba12796fb0011d8e3228d5
Reviewed-on: https://chromium-review.googlesource.com/646586
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-31 22:41:08 +00:00
Manojkumar Bhosale
b6e8e9aa97 Add MSA optimized HalfFloatRow function
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: I54a2c57d66093b887c8ba31fd7a21a102165393a
Reviewed-on: https://chromium-review.googlesource.com/628557
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-29 18:40:08 +00:00
Frank Barchard
8cd3e4f3f2 Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Ib139b9701fc67e24d27a6886377c0cb8b2773fda
Reviewed-on: https://chromium-review.googlesource.com/620791
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-18 17:23:27 +00:00
Frank Barchard
78e44628c6 Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Ie2342f841f1bb8469fc4631b784eddd804f5d53e
Reviewed-on: https://chromium-review.googlesource.com/616765
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-17 18:39:22 +00:00
Frank Barchard
56bbcdf422 Reintroduce the max version of scale
add ScaleMaxSamples_NEON function with max
done on original values.

TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleMaxSamples_Opt

Change-Id: Id99338860782b10ffd24f66242eb42014c2e229e
Reviewed-on: https://chromium-review.googlesource.com/614685
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-14 23:33:56 +00:00
Manojkumar Bhosale
dbd7c1a9c5 Add MSA optimized ARGBExtractAlpha, ARGBBlend, ARGBQuantize and ARGBColorMatrix row functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: I17bd3f87336f613ad363af7d7b9d7af49d725e56
Reviewed-on: https://chromium-review.googlesource.com/613100
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-14 17:38:31 +00:00
Frank Barchard
83ca1abe09 Change ScaleSumSamples to return Sum of Squares
TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleSumSamples_Opt

Change-Id: I5208666f3968c5c4b0f1b0c951f24216d78ee3fe
Reviewed-on: https://chromium-review.googlesource.com/607184
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-09 22:19:45 +00:00
Frank Barchard
8676ad7004 scale float samples and return max value
BUG=libyuv:717
TEST=ScaleSum unittest to compare C vs Arm implementation
TBR=kjellander@chromium.org

Change-Id: Iaa7af5547d979aad4722f868d31b405340115748
Reviewed-on: https://chromium-review.googlesource.com/600534
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-04 23:34:30 +00:00
Frank Barchard
6d083e2d12 clang 6 build disable some msa functions
R=kjellander@chromium.org

Bug: libyuv:715
Test: gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
Change-Id: Ia3943b0afc02e05a8bc32350719b296b0b9d5479
Reviewed-on: https://chromium-review.googlesource.com/592720
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-03 17:44:35 +00:00
Frank Barchard
d8136924bd Rename convert to yuvconvert for linux.mk
TBR=kjellander@chromium.org
BUG=None
TEST=make -f linux.mk

Change-Id: I747c2eb6ed03cacddf3265e65088472507f3436c
Reviewed-on: https://chromium-review.googlesource.com/581874
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-07-21 19:05:11 +00:00
Frank Barchard
db25485ee2 Move compare functions into a unittest class
BUG=None
TEST=LibYUVCompareTest.*
R=jkellander@chromium.org

Change-Id: I3131ca73020f855ead08255d09aa7a846bf0d556
Reviewed-on: https://chromium-review.googlesource.com/540064
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-06-19 19:39:10 +00:00
Frank Barchard
6c94ad13b5 Remove ARM NaCL macros from source
NaCL has been disabled for awhile, so the code
will still build, but only with C versions.
This change removes the MEMACCESS() macros from
Neon and Neon64 source.

BUG=libyuv:702
TEST=try bots build for arm.
R=kjellander@chromium.org

Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19
Reviewed-on: https://chromium-review.googlesource.com/528332
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-09 22:22:07 +00:00
Frank Barchard
d981495b42 Hamming Distance using 16 bit accumulators
Summing 16 bit hamming codes restricts the maximum length,
but saves an inner loop instruction.  The outer loop can sum the
values.

32 bit Neon
Now BenchmarkHammingDistance_Opt (78 ms)
Was BenchmarkHammingDistance_Opt (92 ms)

64 bit Neon
Now BenchmarkHammingDistance_Opt (85 ms)
Was BenchmarkHammingDistance_Opt (92 ms)

R=wangcheng@google.com
TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance

Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae
Reviewed-on: https://chromium-review.googlesource.com/526932
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-07 23:23:24 +00:00
Frank Barchard
baf5248242 HammingDistance_NEON ported to 32 bit
TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance

Change-Id: I252efd8a27aa11a0fe7d8030d7c8b57f20f04760
Reviewed-on: https://chromium-review.googlesource.com/525232
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-06 17:58:29 +00:00
Frank Barchard
44abf70187 ScaleDown odd functions adjust math so last pixel is half width source.
existing test passes
out/Release/libyuv_unittest --gtest_filter=*Blend* --libyuv_width=33 --libyuv_height=16

new test added
BUG=libyuv:705
TEST=LibYUVScaleTest.TestScaleOdd

Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704
Reviewed-on: https://chromium-review.googlesource.com/524932
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-06 01:37:26 +00:00
Frank Barchard
7bffe5e1c5 lint warning fixes for CpuID
The CpuId function is a wrapper for the intrinsic, or
implemented with inline if unavailable.  It had been
using uint32, but the intrinsics use int, so it was causing
casting and lint warnings.  This change makes the internal
implementation use int.

Casting was also done for xgetbv, and the cast is simply
removed, and is not causing a build error.

MipCpuCaps was doing strlen to check for white space after the
instruction set.  Arm also does this but with a hard coded offset.
This was causing a cast from size_t to int, which produced a lint
warning.  The change removes the white space detect.
In theory the code could be used to detect SSE vs SSE2, and it would
need to check SSE is followed by a space or end of line.  But this
code is only used on Arm and Mips, where there there is one form
of SIMD detected.  e.g. MSA for mips.  If a new instruction set is
added with a similar name, the write space check could be reintroduced.
But its more likely the code can be rewritten to use a better form
of detection by then. Or remove detection and require the instructions

BUG=libyuv:641
TEST=try bots build on all platforms without error and lint is clean

Change-Id: I9f55f8e57bba0f78571bdddbe63b945dea3e8809
Reviewed-on: https://chromium-review.googlesource.com/514524
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Wan-Teh Chang <wtc@chromium.org>
2017-05-25 22:00:17 +00:00
Frank Barchard
8edd2286fd MaskCpuFlags return cpuinfo so InitCpuFlags can call it
Reduce number of atomic references to cpu_info by making
InitCpuFlags call MaskCpuFlags and return the same value.

BUG=libyuv:641
TEST=libyuv_unittests pass

Change-Id: I5dfff8f7a10671bc8ef3ec0ed6f302791e752faa
Reviewed-on: https://chromium-review.googlesource.com/514145
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-24 22:27:03 +00:00
Frank Barchard
651ccc0c3a Fix data races in libyuv::TestCpuFlag().
Detect the compiler's support of C11 atomics, and use C11 atomics when
available.

Note that libyuv::MaskCpuFlags() is still not thread-safe.

BUG=libyuv:641
TEST= cpu_thread_test.cc adds a pthread based test
R=wangcheng@google.com

Change-Id: If05b1e16da833105a0159ed67ef20f4e61bc7abd
Reviewed-on: https://chromium-review.googlesource.com/510079
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-24 02:09:03 +00:00
Frank Barchard
77f6916da2 use __popcnt for visual c HammingDistance_X86
BUG=libyuv:701
TEST=HammingDistance unittest performance is comparable to x64
R=wangcheng@google.com

Change-Id: I8abe861e086e0162ba4c7ba6f1ef7d1c006cd9d4
Reviewed-on: https://chromium-review.googlesource.com/505454
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-05-12 22:59:00 +00:00
Frank Barchard
e0615c0e69 Optimize Hamming Distance C code to do 64 bits at a time.
BUG=libyuv:701
TEST=LibYUVBaseTest.BenchmarkHammingDistance_C
R=wangcheng@google.com

Change-Id: I243003b098bea8ef3809298bbec349ed52a43d8c
Reviewed-on: https://chromium-review.googlesource.com/499487
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-12 17:53:52 +00:00
Frank Barchard
2136e349da Hamming code difference of 2 memory blocks
BUG=libyuv:701
TEST=built and disassembled for aarch64
R=kjellander@chromium.org

Change-Id: I7712b1c7934e5dfb55fda1fa7c8405c32d6964ce
Reviewed-on: https://chromium-review.googlesource.com/495327
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-08 21:37:51 +00:00
Frank Barchard
945ea1b746 mips switch sgtu to sltu for clang in ndk r14
The verion of clang in ndk r14 (3.9) has a built in llvm assembler
that does not have the sgtu pseudo instruction.
sltu is the actual instruction, so switch the 2 operands and use
the instruction instead of the pseudo op.

BUG=libyuv:700
TEST=try bots build mips without error.

Change-Id: I2d5f94f81acbd56cdedea011e7d9308979e19079
Reviewed-on: https://chromium-review.googlesource.com/494026
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-05-02 21:34:13 +00:00
Vignesh Venkatasubramanian
54289f1bb0 Fix mips build on android ndk r14+
Revert the workaround and fix it properly by passing the
additional necessary flag to the compiler.

BUG=libyuv:700

Change-Id: I1c893a8acb5079decbee6963b689424bf2f99f4f
Reviewed-on: https://chromium-review.googlesource.com/487881
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-04-26 08:39:55 +00:00
Frank Barchard
3b583396bf Disable CopyRow_MIPS
CopyRow_MIPS produces a compile error on some compilers.

TBR=kjellander@chromium.org
BUG=libyuv:700
TEST=try bots

Change-Id: Ie88f2006ef5cf14bffaf80fd4c0dd1caa409c569
Reviewed-on: https://chromium-review.googlesource.com/486127
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-04-25 01:31:13 +00:00
Frank Barchard
fc02cc3806 Add I422ToRGB565
BUG=libyuv:699
TESTED=LibYUVConvertTest.I420ToARGB_RGB565_Opt

Change-Id: I87943bcad056fbbe051301f45c7dc0ae0620c837
Reviewed-on: https://chromium-review.googlesource.com/478578
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-04-17 17:51:17 +00:00
Frank Barchard
bd0faedbd2 add libyuv_unittest to Android.mk
BUG=libyuv:698
TESTED=mm libyuv_unittest within android external/libyuv builds unittests

Change-Id: I4b5fed9f5af86c8a910f73b14053ef83f38431cc
Reviewed-on: https://chromium-review.googlesource.com/478572
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-04-14 23:46:27 +00:00
Frank Barchard
8cab2e31d7 I422ToRGB565 fix for odd widths
I422ToRGB565Row_Any_AVX2 uses 2 step row conversion that calls
I422ToARGBRow_AVX2 and then ARGBToRGB565.
I422ToARGBRow_AVX2 expects multiple of 16 pixels.
Adjust the I422ToRGB565Row_Any_AVX2 to do multiple of 16 with AVX2
and then remainder in a buffer.

Bug: libyuv: 657
Test: out/Release/libyuv_unittest --gtest_filter=*Convert*I*To* --libyuv_width=1280 --libyuv_height=720
Change-Id: Ice1cb6c7ff6b2295513e8b4a9f77522e1c659810
Reviewed-on: https://chromium-review.googlesource.com/474232
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-04-11 17:24:05 +00:00
Frank Barchard
2adb84e39e make gflags command line parser optional
BUG=libyuv:691
TEST=gn gen out/Release "--args=is_debug=false target_cpu=\"x64\" libyuv_include_tests=true"

Change-Id: Ib481189be884c34d9bbc30bfcf71c7969c6f4dae
Reviewed-on: https://chromium-review.googlesource.com/452736
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-14 01:52:52 +00:00
Frank Barchard
e6fec061cf lint cleanup for convert RGB24ToI420
RGB24, RAW, RGB565, ARGB1555 and ARGB4444 have conditional
2 pass versus direct path.  2 pass method requires a buffer that
is conditionally allocated.  ifdef's were confusing lint.
simplifed ifdefs to clean up lint warning

BUG=libyuv:692
TEST=lint source/convert.cc

Change-Id: If868718af30b48824a5e3d28f0d7d01d4609ad55
Reviewed-on: https://chromium-review.googlesource.com/451552
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-09 10:32:23 +00:00
Frank Barchard
27acadbf9d Roll chromium_revision c793ec77b2..7950721f08 (454713:454907)
Change log: c793ec77b2..7950721f08
Full diff: c793ec77b2..7950721f08

Changed dependencies:
* src/base: 8fe126945c..d75864a2c5
* src/build: 8a0a5a27d4..bf8911f59b
* src/ios: 2c58c1ed6b..8b8111f841
* src/testing: 9cacf531de..c2c74bc1d1
* src/third_party: 0ea751c2fe..4c0908d22e
* src/third_party/catapult: 3c626eaf72..353ee60a45
* src/tools: 41a0ccf0e1..14318cc69b
DEPS diff: c793ec77b2..7950721f08/DEPS

No update to Clang.

TBR=kjellander@chromium.org
BUG=libyuv:689

Change-Id: Ife134b4af1c8c1e63aae2b811342d325abe0b600
Reviewed-on: https://chromium-review.googlesource.com/450317
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-03-06 21:49:04 +00:00
Frank Barchard
136aa9d37c any11p fix for buffer overrun
BUG=libyuv:686
TESTED=untested

Change-Id: Idfae93349dd78b1b633a596631e5397e11b77d0b
Reviewed-on: https://chromium-review.googlesource.com/448320
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-03 19:57:35 +00:00
Frank Barchard
91ee9b729e Fix missing return in MipsCpuCaps.
Previously if MipsCpuCaps were called with something other than
dspr2 or msa, the file was closed but still used.

This change assumed the function is only called internally twice:
once for msa and once for dspr2.  If msa is not being detected,
the function assumed dspr2 was being tested and returns dspr2 was
true.

BUG=libyuv:687
TEST=try bots

Change-Id: I80b328eb5ffc7baf5f1ee5a79c16d75c45ff26cc
Reviewed-on: https://chromium-review.googlesource.com/447831
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-01 23:07:03 +00:00
Manojkumar Bhosale
45b176d153 Add MSA optimized Interpolate/MergeUV/Misc functions
BUG=libyuv:634

Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126

Performance Gain (vs C auto-vectorized)
InterpolateRow_MSA      - ~3.3x
InterpolateRow_Any_MSA  - ~2.5x
ARGBSetRow_MSA          - ~1.0x
ARGBSetRow_Any_MSA      - ~1.0x
ARGBToRGB24Row_MSA      - ~1.9x
ARGBToRGB24Row_Any_MSA  - ~1.6x
MergeUVRow_MSA          - ~1.6x
MergeUVRow_Any_MSA      - ~1.2x

Performance Gain (vs C non-vectorized)
InterpolateRow_MSA      - ~11.3x
InterpolateRow_Any_MSA  - ~ 7.9x
ARGBSetRow_MSA          - ~ 6.2x
ARGBSetRow_Any_MSA      - ~ 4.0x
ARGBToRGB24Row_MSA      - ~ 9.9x
ARGBToRGB24Row_Any_MSA  - ~ 8.4x
MergeUVRow_MSA          - ~12.7x
MergeUVRow_Any_MSA      - ~ 8.0x

Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
Reviewed-on: https://chromium-review.googlesource.com/445817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-23 01:42:22 +00:00
Frank Barchard
a041b0ae03 Android.mk for libyuv - unused parameters warning enabled
BUG=libyuv:682
TEST=mm from android tree.

Change-Id: I13be3eaa6a33741797360d57bc5cf5fed91678ef
Reviewed-on: https://chromium-review.googlesource.com/445935
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-22 19:45:12 +00:00
Manojkumar Bhosale
eed66b2028 Add MSA optimized I444/I400/J400/YUY2/UYVY to ARGB row functions
BUG=libyuv:634

Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be

Performance Gain (vs C auto-vectorized)
I444ToARGBRow_MSA       - ~1.6x
I444ToARGBRow_Any_MSA   - ~1.6x
I400ToARGBRow_MSA       - ~5.5x
I400ToARGBRow_Any_MSA   - ~5.3x
J400ToARGBRow_MSA       - ~1.0x
J400ToARGBRow_Any_MSA   - ~1.0x
YUY2ToARGBRow_MSA       - ~1.6x
YUY2ToARGBRow_Any_MSA   - ~1.6x
UYVYToARGBRow_MSA       - ~1.6x
UYVYToARGBRow_Any_MSA   - ~1.6x

Performance Gain (vs C non-vectorized)
I444ToARGBRow_MSA       - ~7.3x
I444ToARGBRow_Any_MSA   - ~7.1x
I400ToARGBRow_MSA       - ~5.5x
I400ToARGBRow_Any_MSA   - ~5.2x
J400ToARGBRow_MSA       - ~6.8x
J400ToARGBRow_Any_MSA   - ~5.7x
YUY2ToARGBRow_MSA       - ~7.2x
YUY2ToARGBRow_Any_MSA   - ~7.0x
UYVYToARGBRow_MSA       - ~7.1x
UYVYToARGBRow_Any_MSA   - ~6.9x

Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be
Reviewed-on: https://chromium-review.googlesource.com/439246
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-21 23:22:07 +00:00
Frank Barchard
bbe8c233f2 scale warning fixes for unused parameters
BUG=libyuv:680
TEST=builds and runs with no warnings

Change-Id: I7d60ef44292fa6ad4f7c4e2e2657359b864d2dab
Reviewed-on: https://chromium-review.googlesource.com/442670
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-02-15 21:38:59 +00:00
Frank Barchard
0fb5675902 Fix dspr2 rename changes. Fix unused variables
TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=try bots

Review-Url: https://codereview.chromium.org/2675583002 .
2017-02-01 18:51:06 -08:00
Manojkumar Bhosale
54ce8f23d6 Add MSA optimized ARGB/ABGR/BGRA/RGBA To Y/UV row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C auto-vectorized)
ARGBToYJRow_MSA       - ~3.2x
ARGBToYJRow_Any_MSA   - ~2.7x
BGRAToYRow_MSA        - ~3.2x
BGRAToYRow_Any_MSA    - ~2.7x
ABGRToYRow_MSA        - ~3.2x
ABGRToYRow_Any_MSA    - ~2.6x
RGBAToYRow_MSA        - ~3.1x
RGBAToYRow_Any_MSA    - ~2.7x
ARGBToUVJRow_MSA      - ~5.5x
ARGBToUVJRow_Any_MSA  - ~4.5x
BGRAToUVRow_MSA       - ~2.1x
BGRAToUVRow_Any_MSA   - ~2.0x
ABGRToUVRow_MSA       - ~2.1x
ABGRToUVRow_Any_MSA   - ~1.9x
RGBAToUVRow_MSA       - ~2.2x
RGBAToUVRow_Any_MSA   - ~1.9x

Performance Gain (vs C non-vectorized)
ARGBToYJRow_MSA       - ~10.9x
ARGBToYJRow_Any_MSA   -  ~9.2x
BGRAToYRow_MSA        - ~10.9x
BGRAToYRow_Any_MSA    -  ~9.3x
ABGRToYRow_MSA        - ~11.0x
ABGRToYRow_Any_MSA    -  ~9.3x
RGBAToYRow_MSA        - ~10.9x
RGBAToYRow_Any_MSA    -  ~9.1x
ARGBToUVJRow_MSA      - ~12.4x
ARGBToUVJRow_Any_MSA  - ~10.5x
BGRAToUVRow_MSA       -  ~4.7x
BGRAToUVRow_Any_MSA   -  ~4.4x
ABGRToUVRow_MSA       -  ~4.7x
ABGRToUVRow_Any_MSA   -  ~4.5x
RGBAToUVRow_MSA       -  ~4.8x
RGBAToUVRow_Any_MSA   -  ~4.4x

Review-Url: https://codereview.chromium.org/2641153003 .
2017-02-01 10:31:28 +05:30
Frank Barchard
54f2094a5e Rename mips source files to dspr2.
Add MSA detect to unittest.
Change macro to disable DSPR2 code to LIBYUV_DISABLE_DSPR2

BUG=libyuv:634
TEST=try bots

Change-Id: I9e0aa2452204fc529bb6f9e6fd93c4e1c379bba6
Reviewed-on: https://chromium-review.googlesource.com/433463
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-01-27 23:11:43 +00:00
Frank Barchard
33f52bdac9 Add installer builds to cmake for linux
cd ~/my_projects/libyuv
git pull
mkdir cbuild  # (for out-of-source builds)
cd cbuild
cmake -DCMAKE_BUILD_TYPE=Release ..
make -j4
make package

BUG=libyuv:673
TEST=make package

Change-Id: Ia449cbfd0bc118cc90c8648f8199a0526b7ae2a2
Reviewed-on: https://chromium-review.googlesource.com/433440
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-01-26 23:05:17 +00:00
Frank Barchard
749e316ed8 Remove commented out code
TEST=None
BUG=libyuv:672
Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169

Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169
Reviewed-on: https://chromium-review.googlesource.com/430321
Reviewed-by: Aaron Gable <agable@chromium.org>
2017-01-20 02:03:12 +00:00
Manojkumar Bhosale
09b8c971b3 Add MSA optimized NV12/21 To RGB row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C auto-vectorized)
NV12ToARGBRow_MSA       - ~1.5x
NV12ToARGBRow_Any_MSA   - ~1.4x
NV12ToRGB565Row_MSA     - ~1.4x
NV12ToRGB565Row_Any_MSA - ~1.4x
NV21ToARGBRow_MSA       - ~1.5x
NV21ToARGBRow_Any_MSA   - ~1.5x
SobelRow_MSA            - ~4.3x
SobelRow_Any_MSA        - ~3.4x
SobelToPlaneRow_MSA     - ~8.0x
SobelToPlaneRow_Any_MSA - ~4.7x
SobelXYRow_MSA          - ~3.0x
SobelXYRow_Any_MSA      - ~2.5x

Performance Gain (vs C non-vectorized)
NV12ToARGBRow_MSA       - ~6.5x
NV12ToARGBRow_Any_MSA   - ~6.5x
NV12ToRGB565Row_MSA     - ~6.2x
NV12ToRGB565Row_Any_MSA - ~6.1x
NV21ToARGBRow_MSA       - ~6.5x
NV21ToARGBRow_Any_MSA   - ~6.5x
SobelRow_MSA            - ~14.5x
SobelRow_Any_MSA        - ~11.3x
SobelToPlaneRow_MSA     - ~34.2x
SobelToPlaneRow_Any_MSA - ~19.4x
SobelXYRow_MSA          - ~11.1x
SobelXYRow_Any_MSA      - ~9.1x

Review-Url: https://codereview.chromium.org/2636483002 .
2017-01-18 09:24:39 +05:30
Frank Barchard
a7c87e19f0 add Intel Code Analyst markers
add macros to enable/disable code analyst around blocks of code.

Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.

BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com

Review-Url: https://codereview.chromium.org/2626193002 .
2017-01-13 15:50:24 -08:00
Manojkumar Bhosale
73a6f100a9 Add MSA optimized rotate functions (used 16x16 transpose)
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
TransposeWx16_MSA        - ~6.0x
TransposeWx16_Any_MSA    - ~4.7x
TransposeUVWx16_MSA      - ~6.3x
TransposeUVWx16_Any_MSA  - ~5.4x

Performance Gain (vs C non-vectorized)
TransposeWx16_MSA        - ~6.0x
TransposeWx16_Any_MSA    - ~4.8x
TransposeUVWx16_MSA      - ~6.3x
TransposeUVWx16_Any_MSA  - ~5.4x

Review-Url: https://codereview.chromium.org/2617703002 .
2017-01-13 15:50:02 +05:30
Manojkumar Bhosale
7c64163ff4 Add MSA optimized RAW/RGB/ARGB to ARGB/Y/UV row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGB1555ToARGBRow_MSA     - 1.85
ARGB1555ToARGBRow_Any_MSA - 1.82
RGB565ToARGBRow_MSA       - 2.14
RGB565ToARGBRow_Any_MSA   - 2.08
RGB24ToARGBRow_MSA        - 8.57
RGB24ToARGBRow_Any_MSA    - 7.42
RAWToARGBRow_MSA          - 8.57
RAWToARGBRow_Any_MSA      - 7.42
ARGB1555ToYRow_MSA        - 2.60
ARGB1555ToYRow_Any_MSA    - 2.47
RGB565ToYRow_MSA          - 2.45
RGB565ToYRow_Any_MSA      - 2.33
RGB24ToYRow_MSA           - 2.23
RGB24ToYRow_Any_MSA       - 2.01
RAWToYRow_MSA             - 2.25
RAWToYRow_Any_MSA         - 2.02
ARGB1555ToUVRow_MSA       - 1.40
ARGB1555ToUVRow_Any_MSA   - 1.37
RGB565ToUVRow_MSA         - 1.68
RGB565ToUVRow_Any_MSA     - 1.63
RGB24ToUVRow_MSA          - 3.02
RGB24ToUVRow_Any_MSA      - 2.87
RAWToUVRow_MSA            - 3.04
RAWToUVRow_Any_MSA        - 2.85

Performance Gain (vs C non-vectorized)
ARGB1555ToARGBRow_MSA     - 4.66
ARGB1555ToARGBRow_Any_MSA - 4.45
RGB565ToARGBRow_MSA       - 5.58
RGB565ToARGBRow_Any_MSA   - 5.34
RGB24ToARGBRow_MSA        - 8.57
RGB24ToARGBRow_Any_MSA    - 7.42
RAWToARGBRow_MSA          - 8.57
RAWToARGBRow_Any_MSA      - 7.42
ARGB1555ToYRow_MSA        - 6.38
ARGB1555ToYRow_Any_MSA    - 5.98
RGB565ToYRow_MSA          - 6.42
RGB565ToYRow_Any_MSA      - 6.05
RGB24ToYRow_MSA           - 7.87
RGB24ToYRow_Any_MSA       - 7.01
RAWToYRow_MSA             - 7.98
RAWToYRow_Any_MSA         - 7.01
ARGB1555ToUVRow_MSA       - 5.39
ARGB1555ToUVRow_Any_MSA   - 5.06
RGB565ToUVRow_MSA         - 6.39
RGB565ToUVRow_Any_MSA     - 5.90
RGB24ToUVRow_MSA          - 3.04
RGB24ToUVRow_Any_MSA      - 2.87
RAWToUVRow_MSA            - 3.04
RAWToUVRow_Any_MSA        - 2.88

Review-Url: https://codereview.chromium.org/2600713002 .
2017-01-13 15:43:37 +05:30
Frank Barchard
000d2fa91a Libyuv MIPS DSPR2 optimizations.
Optimized functions:

I444ToARGBRow_DSPR2
I422ToARGB4444Row_DSPR2
I422ToARGB1555Row_DSPR2
NV12ToARGBRow_DSPR2
BGRAToUVRow_DSPR2
BGRAToYRow_DSPR2
ABGRToUVRow_DSPR2
ARGBToYRow_DSPR2
ABGRToYRow_DSPR2
RGBAToUVRow_DSPR2
RGBAToYRow_DSPR2
ARGBToUVRow_DSPR2
RGB24ToARGBRow_DSPR2
RAWToARGBRow_DSPR2
RGB565ToARGBRow_DSPR2
ARGB1555ToARGBRow_DSPR2
ARGB4444ToARGBRow_DSPR2
ScaleAddRow_DSPR2

Bug-fixes in functions:

ScaleRowDown2_DSPR2
ScaleRowDown4_DSPR2

BUG=

Review-Url: https://codereview.chromium.org/2626123003 .
2017-01-11 12:19:13 -08:00
Manojkumar Bhosale
288bfbefb5 Add MSA optimized remaining scale row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ScaleRowDown2_MSA            - ~22.3x
ScaleRowDown2_Any_MSA        - ~19.9x
ScaleRowDown2Linear_MSA      - ~31.2x
ScaleRowDown2Linear_Any_MSA  - ~29.4x
ScaleRowDown2Box_MSA         - ~20.1x
ScaleRowDown2Box_Any_MSA     - ~19.6x
ScaleRowDown4_MSA            - ~11.7x
ScaleRowDown4_Any_MSA        - ~11.2x
ScaleRowDown4Box_MSA         - ~15.1x
ScaleRowDown4Box_Any_MSA     - ~15.1x
ScaleRowDown38_MSA           - ~1x
ScaleRowDown38_Any_MSA       - ~1x
ScaleRowDown38_2_Box_MSA     - ~1.7x
ScaleRowDown38_2_Box_Any_MSA - ~1.7x
ScaleRowDown38_3_Box_MSA     - ~1.7x
ScaleRowDown38_3_Box_Any_MSA - ~1.7x
ScaleAddRow_MSA              - ~1.2x
ScaleAddRow_Any_MSA          - ~1.15x

Performance Gain (vs C non-vectorized)
ScaleRowDown2_MSA            - ~22.4x
ScaleRowDown2_Any_MSA        - ~19.8x
ScaleRowDown2Linear_MSA      - ~31.6x
ScaleRowDown2Linear_Any_MSA  - ~29.4x
ScaleRowDown2Box_MSA         - ~20.1x
ScaleRowDown2Box_Any_MSA     - ~19.6x
ScaleRowDown4_MSA            - ~11.7x
ScaleRowDown4_Any_MSA        - ~11.2x
ScaleRowDown4Box_MSA         - ~15.1x
ScaleRowDown4Box_Any_MSA     - ~15.1x
ScaleRowDown38_MSA           - ~3.2x
ScaleRowDown38_Any_MSA       - ~3.2x
ScaleRowDown38_2_Box_MSA     - ~2.4x
ScaleRowDown38_2_Box_Any_MSA - ~2.3x
ScaleRowDown38_3_Box_MSA     - ~2.9x
ScaleRowDown38_3_Box_Any_MSA - ~2.8x
ScaleAddRow_MSA              - ~8x
ScaleAddRow_Any_MSA          - ~7.46x

Review-Url: https://codereview.chromium.org/2559683002 .
2016-12-21 13:39:44 +05:30
Frank Barchard
bd10875846 modified libyuv.gyp so that it no longer depends on libjpeg.gyp, which does not exist anymore.
BUG=libyuv:666
TESTED= unittests built and passed with jpeg disabled.
R=kjellander@chromium.org

Review-Url: https://codereview.chromium.org/2585373002 .
2016-12-19 11:57:49 -08:00
Manojkumar Bhosale
a899dea251 Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBAttenuateRow_MSA          - ~1.1x
ARGBAttenuateRow_Any_MSA      - ~1.1x
ARGBToRGB565DitherRow_MSA     - ~6.4x
ARGBToRGB565DitherRow_Any_MSA - ~6.2x
ARGBShuffleRow_MSA            - ~5.1x
ARGBShuffleRow_Any_MSA        - ~1.9x
ARGBShadeRow_MSA              - ~1.1x
ARGBGrayRow_MSA               - ~2.6x
ARGBSepiaRow_MSA              - ~11.6x

Performance Gain (vs C non-vectorized)
ARGBAttenuateRow_MSA          - ~2.46x
ARGBAttenuateRow_Any_MSA      - ~2.45x
ARGBToRGB565DitherRow_MSA     - ~9.4x
ARGBToRGB565DitherRow_Any_MSA - ~12.5x
ARGBShuffleRow_MSA            - ~5.2x
ARGBShuffleRow_Any_MSA        - ~1.9x
ARGBShadeRow_MSA              - ~4.3x
ARGBGrayRow_MSA               - ~10.5x
ARGBSepiaRow_MSA              - ~12.2x

Review-Url: https://codereview.chromium.org/2559693002 .
2016-12-15 12:06:02 +05:30
Manojkumar Bhosale
6fa5e4eb78 Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
TransposeWx8_MSA          - ~2.7x
TransposeWx8_Any_MSA      - ~2.1x
TransposeUVWx8_MSA        - ~2.5x
TransposeUVWx8_Any_MSA    - ~2.7x

Performance Gain (vs C non-vectorized)
TransposeWx8_MSA          - ~4.6x
TransposeWx8_Any_MSA      - ~2.9x
TransposeUVWx8_MSA        - ~4.4x
TransposeUVWx8_Any_MSA    - ~3.7x

Review URL: https://codereview.chromium.org/2553403002 .
2016-12-15 10:06:01 +05:30
Frank Barchard
b18fd21d3c Android420ToI420 - use ptrdiff_t for difference of u and v pointers
The difference was assigned to an int, causing a warning on Visual C.

BUG=662
TEST=tested with try bots.
R=devangelakos@google.com

Review-Url: https://codereview.chromium.org/2574373002 .
2016-12-14 11:53:55 -08:00
Frank Barchard
dde8ba7009 ConvertFromI420: use halfstride instead of halfwidth
BUG=libyuv:660
TEST=try bots
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2554213003 .
2016-12-07 10:16:16 -08:00
Manojkumar Bhosale
56b5bbb0be Add MSA optimized ARGB scaling functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ScaleARGBRowDown2_MSA           - ~2.6x
ScaleARGBRowDown2Linear_MSA     - ~7.9x
ScaleARGBRowDown2Box_MSA        - ~3.7x
ScaleARGBRowDownEven_MSA        - ~1.2x
ScaleARGBRowDownEvenBox_MSA     - ~3.5x

ScaleARGBRowDown2_Any_MSA       - ~2.6x
ScaleARGBRowDown2Linear_Any_MSA - ~7.9x
ScaleARGBRowDown2Box_Any_MSA    - ~3.6x
ScaleARGBRowDownEven_Any_MSA    - ~1.2x
ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x

Performance Gain (vs C non-vectorized)
ScaleARGBRowDown2_MSA           - 2.6x
ScaleARGBRowDown2Linear_MSA     - 13.5x
ScaleARGBRowDown2Box_MSA        - 5.8x
ScaleARGBRowDownEven_MSA        - 1.2x
ScaleARGBRowDownEvenBox_MSA     - 3.7x

ScaleARGBRowDown2_Any_MSA       - 2.6x
ScaleARGBRowDown2Linear_Any_MSA - 13.5x
ScaleARGBRowDown2Box_Any_MSA    - 5.3x
ScaleARGBRowDownEven_Any_MSA    - 1.2x
ScaleARGBRowDownEvenBox_Any_MSA - 3.7x

Review URL: https://codereview.chromium.org/2527983002 .
2016-12-07 11:47:15 +05:30
Manojkumar Bhosale
83f460be33 Add MSA optimized ARGB Multiply/Add/Subtract row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBMultiplyRow_MSA       - 1.4x
ARGBAddRow_MSA            - 8.6x
ARGBSubtractRow_MSA       - 8.6x

ARGBMultiplyRow_Any_MSA   - 1.35x
ARGBAddRow_Any_MSA        - 7.3x
ARGBSubtractRow_Any_MSA   - 7.2x

Performance Gain (vs C non-vectorized)
ARGBMultiplyRow_MSA       - 4.4x
ARGBAddRow_MSA            - 27x
ARGBSubtractRow_MSA       - 22x

ARGBMultiplyRow_Any_MSA   - 3.5x
ARGBAddRow_Any_MSA        - 23x
ARGBSubtractRow_Any_MSA   - 18x

Review URL: https://codereview.chromium.org/2529983002 .
2016-12-02 15:21:10 +05:30
Frank Barchard
da0c29dada Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBToRGB565Row_MSA       - ~1.6x
ARGBToRGB565Row_Any_MSA   - ~1.6x
ARGBToARGB1555Row_MSA     - ~1.3x
ARGBToARGB1555Row_Any_MSA - ~1.3x
ARGBToARGB4444Row_MSA     - ~3.8x
ARGBToARGB4444Row_Any_MSA - ~3.8x
ARGBToUV444Row_MSA        - ~2.4x
ARGBToUV444Row_Any_MSA    - ~2.4x

Performance Gain (vs C non-vectorized)
ARGBToRGB565Row_MSA       - ~2.8x
ARGBToRGB565Row_Any_MSA   - ~2.8x
ARGBToARGB1555Row_MSA     - ~2.2x
ARGBToARGB1555Row_Any_MSA - ~2.2x
ARGBToARGB4444Row_MSA     - ~6.8x
ARGBToARGB4444Row_Any_MSA - ~6.6x
ARGBToUV444Row_MSA        - ~6.7x
ARGBToUV444Row_Any_MSA    - ~6.7x

Review URL: https://codereview.chromium.org/2520003004 .
2016-11-22 10:47:55 -08:00
Frank Barchard
b1504a8e48 Add MSA optimized ARGBToRGB24Row_MSA and ARGBToRAWRow_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2487913004 .
2016-11-18 15:05:10 -08:00
Frank Barchard
97fb18b846 disable I422AlphaToARGBRow_SSSE3 for 32 bit fpic
BUG=libyuv:658
TEST=g++ -I include  -fPIC -m32 -msse2 -Os -fno-omit-frame-pointer -c source/row_gcc.cc -o row_gcc.o
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2482263003 .
2016-11-08 16:09:09 -08:00
Frank Barchard
e62309f259 clang-format libyuv
BUG=libyuv:654
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
f2c27dafa2 HalfFloat neon armv7 fix for destination pointer.
Improved unittests detect different in arm64 rounding.

TEST=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*Half* -a "--libyuv_width=640 --libyuv_height=360"
BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2478313004 .
2016-11-07 12:13:04 -08:00
Frank Barchard
eca08525cb HalfFloat Neon for ARMv7.
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.

TEST=out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2467723002 .
2016-11-01 11:36:51 -07:00
Frank Barchard
10ce829bad Add MSA optimized I422ToRGB565Row_MSA, I422ToARGB4444Row_MSA and I422ToARGB1555Row_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422ToRGB565Row_MSA             : ~1.5x
I422ToRGB565Row_Any_MSA         : ~1.5x
I422ToARGB4444Row_MSA           : ~1.4x
I422ToARGB4444Row_Any_MSA       : ~1.4x
I422ToARGB1555Row_MSA           : ~1.4x
I422ToARGB1555Row_Any_MSA       : ~1.4x

Performance Gain (vs C non-vectorized)
I422ToRGB565Row_MSA             : ~6.8x
I422ToRGB565Row_Any_MSA         : ~6.8x
I422ToARGB4444Row_MSA           : ~6.6x
I422ToARGB4444Row_Any_MSA       : ~6.6x
I422ToARGB1555Row_MSA           : ~6.6x
I422ToARGB1555Row_Any_MSA       : ~6.6x

Review URL: https://codereview.chromium.org/2445343007 .
2016-10-27 10:47:35 -07:00
Frank Barchard
532f5708a9 Add MSA optimized I422AlphaToARGBRow_MSA and I422ToRGB24Row_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422AlphaToARGBRow_MSA      : ~1.4x
I422AlphaToARGBRow_Any_MSA  : ~1.4x
I422ToRGB24Row_MSA          : ~4.8x
I422ToRGB24Row_Any_MSA      : ~4.8x

Performance Gain (vs C non-vectorized)
I422AlphaToARGBRow_MSA      : ~7.0x
I422AlphaToARGBRow_Any_MSA  : ~7.0x
I422ToRGB24Row_MSA          : ~7.9x
I422ToRGB24Row_Any_MSA      : ~7.7x

Review URL: https://codereview.chromium.org/2454433003 .
2016-10-26 11:12:17 -07:00
Frank Barchard
02ae8b60c5 Line continuation at end of line with NOLINT before that.
BUG=libyuv:634
TEST=git cl lint
TBR=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2453013003 .
2016-10-26 10:42:52 -07:00
Frank Barchard
2488b3105b White spaces, comments and lint fixes for msa.
no functional changes.

TBR=kjellander@chromium.org
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2446313002 .
2016-10-25 11:36:54 -07:00
Frank Barchard
c2073823b4 use __OPTIMIZE__ macro to determine debug vs release.
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.

BUG=libyuv:602
TEST=untested
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2451503002 .
2016-10-24 18:02:48 -07:00
Frank Barchard
f5d5bd88d6 Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gains :- (vs C vectorized)

I422ToARGBRow_MSA     : ~1.6x
I422ToRGBARow_MSA     : ~1.6x

I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x

Performance Gains :- (vs C non-vectorized)

I422ToARGBRow_MSA     : ~7x
I422ToRGBARow_MSA     : ~7x

I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x

Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.

Review URL: https://codereview.chromium.org/2430313005 .
2016-10-24 15:37:08 -07:00
Frank Barchard
451af5e922 scale by 1 for neon implemented
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fcvtn      v4.4h, v2.4s                   \n"  // 8 floatsgit
    "fcvtn2     v4.8h, v1.4s                   \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  :
  : "cc", "memory", "v1", "v2", "v4"
  );
}

void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fmul       v2.4s, v2.4s, %3.s[0]          \n"  // adjust exponent
    "fmul       v1.4s, v1.4s, %3.s[0]          \n"
    "uqshrn     v4.4h, v2.4s, #13              \n"  // isolate halffloat
    "uqshrn2    v4.8h, v1.4s, #13              \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  : "w"(scale * 1.9259299444e-34f)    // %3
  : "cc", "memory", "v1", "v2", "v4"
  );
}

TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2430313008 .
2016-10-21 14:30:03 -07:00
Frank Barchard
550cf829fb HalfFloat avx2 unpack bug fix.
AVX unpack parameters were reverse ordered causing incorrect results
on AVX2 hardware.

TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2438893002 .
2016-10-20 15:49:00 -07:00
Frank Barchard
f553db2d30 HalfFloatPlane unittest for denormal half floats
Halffloats have a limited range.  It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero.  This test ensures they behave the same in C and SIMD and tests the performance of denormals.

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2424233004 .
2016-10-19 18:13:01 -07:00
Frank Barchard
78c58ab8aa Add MSA optimized ARGB4444ToI420 and ARGB4444ToARGB functions
R=fbarchard@google.com
BUG=libyuv:634

Performance gains : (Auto-vectorized C vs MSA SIMD)

ARGB4444ToYRow_MSA        : ~3.0x
ARGB4444ToUVRow_MSA       : ~1.8x
ARGB4444ToARGBRow_MSA     : ~3.4x

ARGB4444ToYRow_Any_MSA    : ~2.8x
ARGB4444ToUVRow_Any_MSA   : ~1.7x
ARGB4444ToARGBRow_Any_MSA : ~3.2x

Review URL: https://codereview.chromium.org/2421843002 .
2016-10-19 11:10:51 -07:00
Frank Barchard
2d80fc3133 Port HalfFloatRow_SSE2 to AVX2 but not using F16C.
R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2421993002 .
2016-10-14 19:01:41 -07:00
Frank Barchard
fdcf524aac Add f16c (halffloat) cpuid
R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2418763006 .
2016-10-14 16:34:08 -07:00
Frank Barchard
5333e94e70 Port ARGBExtractAlpha_AVX2 function to windows.
BUG=libyuv:572
TEST=try bots
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2416783004 .
2016-10-13 23:20:57 -07:00
Frank Barchard
a5e93766a2 Add ARGBExtractAlpha_AVX2 function
Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2420553002 .
2016-10-13 16:03:43 -07:00
Frank Barchard
198bce3959 Cast for clang-cl 64 bit build warnings in unittests
R=kjellander@chromium.org
BUG=libyuv:649

Review URL: https://codereview.chromium.org/2414763002 .
2016-10-12 13:09:57 -07:00
Frank Barchard
d363ea6527 Remove I411 support.
YUV 411 is very uncommon format.  Remove support.

Update documentation to reflect that 411 is deprecated.

Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.

BUG=libyuv:645
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2406123002 .
2016-10-11 11:14:16 -07:00
Frank Barchard
0071f46a1f Side by side 420 test
I420 output can be slow due to multi channel write.
Putting the U and V into a single side by side buffer can improve performance.

TBR=wangcheng@google.com
BUG=None

Review URL: https://codereview.chromium.org/2403223003 .
2016-10-10 19:28:33 -07:00
Frank Barchard
edd3a84d05 libyuv::YUY2ToY for isolating Y channel of YUY2.
This function is the first step of YUY2 To I420.
Provided primarily for diagnostics.

TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToY_Opt

Review URL: https://codereview.chromium.org/2399153004 .
2016-10-07 17:20:30 -07:00
Frank Barchard
a2891ec77c Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions
R=fbarchard@google.com
BUG=libyuv:634

Performance gains as below,

YUY2ToI422, YUY2ToI420 :-

YUY2ToYRow_MSA          : ~10x
YUY2ToUVRow_MSA         : ~11x
YUY2ToUV422Row_MSA      : ~9x
YUY2ToYRow_Any_MSA      : ~6x
YUY2ToUVRow_Any_MSA     : ~5x
YUY2ToUV422Row_Any_MSA  : ~4x

UYVYToI422, UYVYToI420 :-

UYVYToYRow_MSA          : ~10x
UYVYToUVRow_MSA         : ~11x
UYVYToUV422Row_MSA      : ~9x
UYVYToYRow_Any_MSA      : ~6x
UYVYToUVRow_Any_MSA     : ~5x
UYVYToUV422Row_Any_MSA  : ~4x

Review URL: https://codereview.chromium.org/2397693002 .
2016-10-07 10:37:22 -07:00
Frank Barchard
7018f5be0f Add MSA optimized I422ToYUY2Row, I422ToUYVYRow functions
R=fbarchard@google.com
BUG=libyuv:634

Performance gains :-

I422ToYUY2Row_MSA     - ~12x
I422ToYUY2Row_Any_MSA - ~7x

I422ToUYVYRow_MSA     - ~12x
I422ToUYVYRow_Any_MSA - ~7x

Review URL: https://codereview.chromium.org/2378753004 .
2016-10-03 18:21:31 -07:00
Frank Barchard
aa197ee1a3 HalfFloat_SSE2 for Visual C
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows
R=hubbe@chromium.org, wangcheng@google.com

Review URL: https://codereview.chromium.org/2387713002 .
2016-10-03 10:33:38 -07:00
Frank Barchard
4a14cb2e81 HalfFloat_SSE2 port from C algorithm to SSE2
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2381493006 .
2016-09-30 09:47:16 -07:00
Frank Barchard
7fc932ddd3 Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560,chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2371293002 .
2016-09-29 15:06:30 -07:00
Frank Barchard
c11e9b7fb7 bt709 coefficients for video constrained space
Original bt709 color space coefficients were full range yuv for higher
quality.  This change makes the coefficients use the video constrained
color space the same as bt601 which is 16 to 240 for Y and 16 to 235 for
chroma channels.

BUG=libyuv:639
TEST=libyuv unittests run locally
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2367253003 .
2016-09-28 15:07:46 -07:00
Frank Barchard
6732bcbde9 ShortToHalfFloat_AVX2 function
BUG=libyuv:560
TEST=local compile for windows
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2364293002 .
2016-09-27 14:18:32 -07:00
Frank Barchard
bcd823805c remove guard nolints from all headers
Remove NOLINT from guards

TEST=git cl lint
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2374653002 .
2016-09-26 18:02:09 -07:00
Frank Barchard
51a3500c18 Remove unused macros for msa.
Signed vectors are rarely used in libyuv... remove macros for now.
Remove word shuffler, use byte shuffler in row_msa.
Bump version number.

TBR=manojkumar.bhosale@imgtec.com
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2375553002 .
2016-09-26 17:34:58 -07:00
Frank Barchard
618149084e Add MIPS SIMD Arch (MSA) optimized ARGBMirrorRow function
This patch adds MSA optimized ARGBMirrorRow function in libYUV project.

Performance gain ~3x

R=fbarchard@google.com
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2368313003 .
2016-09-26 16:28:01 -07:00
Frank Barchard
c5323b0fdc Add MIPS SIMD Arch (MSA) optimized MirrorRow function
As per the preparation patch added in Chromium sources at,
2150943003: Add MIPS SIMD Arch (MSA) build flags for GYP/GN builds

This patch adds first MSA optimized function in libYUV project.

BUG=libyuv:634
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/2285683002 .
2016-09-22 16:12:22 -07:00
Frank Barchard
5da918b48d Enable NEON for unittests on ios 64 bit.
TBR=kjellander@chromium.org
BUG=libyuv:637, chromium:646279

Review URL: https://codereview.chromium.org/2340933005 .
2016-09-16 16:46:46 -07:00
Frank Barchard
8279df963e Scale by 3/8 only if source is multiple of 8 tall.
BUG=libyuv:635
TEST=try bots
R=harryjin@google.com

Review URL: https://codereview.chromium.org/2347733002 .
2016-09-16 14:57:47 -07:00
Frank Barchard
742be44654 Remove references to svn version control.
BUG=libyuv:636
TESTED=try bots
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2339813002 .
2016-09-14 14:54:47 -07:00
Frank Barchard
de944ed8c7 YuvConstants declare alignment for externs as well as declarations
On visual c 2013 and earlier a warning is generated if externs
are not declared with the same alignment as the declaration, when
using /ltcg

BUG=libyuv:633
TEST=standalong test built with cl /Bv /GL /Ox /nologo a.cc b.cc /link /ltcg
R=skal@google.com

Review URL: https://codereview.chromium.org/2291533004 .
2016-08-30 11:06:46 -07:00
Frank Barchard
dc3a1295be add mergeuv test
Add test for SplitUVPlane and MergeUVPlane

Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists.

TEST=SplitUVPlane unittest
BUG=libyuv:629
R=braveyao@chromium.org

Review URL: https://codereview.chromium.org/2279603002 .
2016-08-25 10:29:16 -07:00
Frank Barchard
c244a3e9a0 Add SplitUVPlanes and MergeUVPlanes
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists. Also, de-duplicate the
CPU dispatching code for these functions by moving them to helper
functions.

BUG=libyuv:629
R=braveyao@chromium.org

Review URL: https://codereview.chromium.org/2277603004 .
2016-08-24 16:47:24 -07:00
Frank Barchard
17d31e6a4a NV12 allow NULL for Y
The conversion from NV12 and other Bi or Tri planar formats, differs only in the UV handling.  The helper function supports passing a NULL for the dst_y channel indicating you only want to do the UV conversion.

TBR=harryjin@google.com
TEST=LibYUVConvertTest.NV12ToI420_NullY (601 ms)
BUG=libyuv:626

Review URL: https://codereview.chromium.org/2276703002 .
2016-08-23 19:05:25 -07:00
Frank Barchard
d58297a2df NV12ToI420 use SplitPlane function
TBR=magjed@chromium.org
BUG=libyuv:629
TEST=LibYUVConvertTest.NV12ToI420_Opt

Review URL: https://codereview.chromium.org/2267303002 .
2016-08-22 18:35:55 -07:00
Frank Barchard
920151f2b5 Change basic_types.h for fixing build failure
BUG=libyuv:630

TBR=harryjin@google.com
TEST=android build locally tested.

Review URL: https://codereview.chromium.org/2225763003 .

Review URL: https://codereview.chromium.org/2269793002 .
2016-08-22 16:16:49 -07:00
Frank Barchard
74491ba0c5 add blank lines to getting started
BUG=libyuv:626

Review URL: https://codereview.chromium.org/2225763003 .
2016-08-08 15:23:38 -07:00
Frank Barchard
e74086bfe3 Remove DISABLE_X86 from build.gn
Fix for duplicate define
../../third_party/libyuv/include/libyuv/scale_row.h:29:9: error: 'LIBYUV_DISABLE_X86' macro redefined [-Werror,-Wmacro-redefined]
        ^

GYP version relys on headers disabling the optimization.
This CL does the same for BUILD.gn
TBR=kjellander@chromium.org
BUG=libyuv:625

Review URL: https://codereview.chromium.org/2149823003 .
2016-07-14 12:14:22 -07:00
Frank Barchard
46a8eaaf0c fix typo in YUV
R=braveyao@chromium.org
BUG=None

Review URL: https://codereview.chromium.org/2152623002 .
2016-07-13 17:17:19 -07:00
Frank Barchard
1aa4ddd21c Attribute aligned 32 for YUV conversion structure on Intel
Fix for unaligned memory exception.

R=braveyao@chromium.org
BUG=libyuv:616

Review URL: https://codereview.chromium.org/2152553002 .
2016-07-13 12:19:26 -07:00
Frank Barchard
a7a6d8cc2e Duplicate prototype for I420ToABGR for remoting
Add alias prototype in convert_argb.h for remoting to build without the header convert_from.h

BUG=libyuv:622
TBR=harryjin@google.com

Review URL: https://codereview.chromium.org/2141923005 .
2016-07-12 19:12:28 -07:00
Frank Barchard
abcb70f183 Test nv21 layout of Android420ToI420 function.
to Y,U,V and a pixel stride for U and V.  The pixel stride is expected to be 1 or 2.

[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Any
[       OK ] LibYUVConvertTest.Android420ToI420_1_Any (253 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Unaligned
[       OK ] LibYUVConvertTest.Android420ToI420_1_Unaligned (250 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Invert
[       OK ] LibYUVConvertTest.Android420ToI420_1_Invert (254 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Opt
[       OK ] LibYUVConvertTest.Android420ToI420_1_Opt (247 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Any
[       OK ] LibYUVConvertTest.Android420ToI420_2_Any (132 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Unaligned
[       OK ] LibYUVConvertTest.Android420ToI420_2_Unaligned (122 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Invert
[       OK ] LibYUVConvertTest.Android420ToI420_2_Invert (124 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Opt
[       OK ] LibYUVConvertTest.Android420ToI420_2_Opt (119 ms)

TEST=LibYUVConvertTest.Android420ToI420_Opt
BUG=libyuv:604
R=braveyao@chromium.org

Review URL: https://codereview.chromium.org/2146733002 .
2016-07-12 18:34:04 -07:00
Frank Barchard
84e04699c2 Add libyuv:Android420ToI420 function which takes 3 pointers
to Y,U,V and a pixel stride for U and V.  The pixel stride is expected to be 1 or 2.

TEST=LibYUVConvertTest.Android420ToI420_Opt
BUG=libyuv:604
R=braveyao@chromium.org

Review URL: https://codereview.chromium.org/2114843002 .
2016-07-12 16:23:51 -07:00
Frank Barchard
4d9146bbb1 include planar functions and convert_argb for webrtc
webrtc doesnt include the headers that the functions are prototyped in.
This CL makes the convert.h include those headers to allow webrtc to
update to the head libyuv.

TBR=harryjin@google.com
BUG=libyuv:620,webrtc:6091,webrtc:6094
TESTED=local build and try bots

Review URL: https://codereview.chromium.org/2141683002 .
2016-07-11 11:37:51 -07:00
Frank Barchard
8b55286ed5 duplicate I420Rect prototype into convert for webrtc
TBR=harryjin@google.com
BUG=libyuv:618

Review URL: https://codereview.chromium.org/2132993003 .
2016-07-08 16:03:38 -07:00
Frank Barchard
303b9f03c8 Avoid gcc 4.4 indexing a vector_size(32) array error.
Mking color conversion use simple arrays within structure, which will be referenced via register pointer.

R=harryjin@google.com
BUG=libyuv:616
TEST=CC=gcc-4.4 CXX=g++-4.4 LD=ld-4.4 make -f linux.mk

Review URL: https://codereview.chromium.org/2127863003 .
2016-07-06 15:14:29 -07:00
Frank Barchard
2f101fdbda mingw64 fix - guard row_win.cc against mingw build.
The old guard only checked for defined(_M_X64) which is defined by mingw64.  Add a test for defined(_MSC_VER) which is defined for clangcl and visual c but not mingw.  mingw should use row_gcc.cc for both 32 and 64 bit.

R=harryjin@google.com
BUG=webm:1252,libyuv:613
TEST=local gcc/clang builds on linux tested and try bots for others.

Review URL: https://codereview.chromium.org/2105603002 .
2016-06-28 10:21:27 -07:00
Frank Barchard
b8ddb5a2a7 rounding for arm filter
R=wangcheng@google.com, harryjin@google.com
BUG=libyuv:607

Review URL: https://codereview.chromium.org/2093913004 .
2016-06-24 16:07:49 -07:00
Frank Barchard
cc88adc620 YUV scale filter columns improved filtering accuracy
upscale a YUV image.  observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.

was ScaleFrom1280x720_Bilinear (1620 ms)
now ScaleFrom1280x720_Bilinear (1907 ms)

BUG=libyuv:605
TEST=try bots
R=harryjin@google.com, wangcheng@google.com

Review URL: https://codereview.chromium.org/2084533006 .
2016-06-23 20:16:55 -07:00
Frank Barchard
24b9fa6671 use vectorsize on clangcl
the ScaleFilterCols_SSSE3 function fails at runtime if vectorsize is not used.

BUG=libyuv:610,libyuv:605
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2080223007 .
2016-06-23 20:14:22 -07:00
Frank Barchard
e376b06d6a Disable ScaleFilterCols_SSSE3 which produces color shift
upscale a YUV image.  observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.

disable HAS_SCALEFILTERCOLS_SSSE3

R=harryjin@google.com
BUG=libyuv:605

Review URL: https://codereview.chromium.org/2080663003 .
2016-06-20 10:43:09 -07:00
Frank Barchard
fd3e676e91 android_full_debug x86 fix - use +rm for width count
Work around for android full debug build runnign out of registers.
5 functions were running out of registers causing the compiler error
error: 'asm' operand has impossible constraints
These functions mostly have 4 pointers, a counter (width) and a tempory
eax register.  With fpic and debug using stackframes, 2 registers are
unavailable.  So a total of 8 registers are used.
Although fpic and stack frame dont apply to assembly, the compiler
reserves 2 registers.  The optimized version builds, so its likely
freeing up the registers once it knows they are not used.
These functions used to build, so compile options and/or compiler may
have updated.. likely fpic was turned on.
An attribute can be done to disable each, and will avoid using the
2 GPR registers, but they are still reserved and unavailable in debug
builds on current compilers (gcc 4.9 and clang 3.8).

R=dhrosa@google.com
BUG=libyuv:602

Review URL: https://codereview.chromium.org/2066933002 .
2016-06-14 15:25:28 -07:00
Frank Barchard
e2611a7349 document cpuid command line behavior
cpu_info_ is zero for uninitialized state and all bits are off, disabling all cpu optimizations.
the 1 bit indicates cpu_info_ is initialized avoiding calling the detection code again for performance.

MaskCpuFlags initializes the cpu ignoring existing flags, then masks with the supplied flags and stores to cpu_info_.
As a mask, -1 has no effect, enabling all cpu features that were detected, but nothing that wasnt detected.
Setting to 0 will cause the next call to re-initialize the cpu, which is same as enabling all features.
Setting mask to 1 will turn off all cpu features but keep the initialized bit on, so the next detection call wont reinitialize and the cpu features are all disabled.

So normal behavior for command line and programatic masking is:
1 = C
-1 = SIMD

TBR=harryjin@google.com
BUG=libyuv:600
TESTED=out64/Release/bin/run_libyuv_unittest -s libyuv_unittest --verbose --release --gtest_filter=*ARGBExtractAlpha* -a "--libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=1 --libyuv_cpu_info=1"

Review URL: https://codereview.chromium.org/2042933002 .
2016-06-08 10:38:09 -07:00
Frank Barchard
026be3cd85 neon64 use width int directly.
width %w size modifier the int width can be passed directly to arm assembly.
For functions that take input constants, the outputs are declared as early
write using &, meaning the outputs use used before all inputs are consumed.

R=harryjin@google.com
BUG=libyuv:598

Review URL: https://codereview.chromium.org/2043073003 .
2016-06-08 10:26:53 -07:00
Frank Barchard
6546096269 ARGBExtractAlpha 16 pixels at a time for ARM
arm64   8     TestARGBExtractAlpha (10019 ms) <-original 64 bit code
arm64   8 x2  TestARGBExtractAlpha (7639 ms)
arm64   16    TestARGBExtractAlpha (7369 ms) <- new 64 bit code
thumb32 8     TestARGBExtractAlpha (9505 ms) <- original 32 bit code
thumb32 8 x2  TestARGBExtractAlpha (7400 ms)
thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code
arm32   8     TestARGBExtractAlpha (10002 ms)

BUG=libyuv:572
TESTED=local test on nexus 9
R=harryjin@google.com, wangcheng@google.com

Review URL: https://codereview.chromium.org/2035573002 .
2016-06-07 10:44:28 -07:00
Frank Barchard
462be27ec8 j422 now uses j420 source code so increase error threshold to match.
R=harryjin@google.com
BUG=libyuv:597

Review URL: https://codereview.chromium.org/2024213003 .
2016-05-31 19:45:34 -07:00
Frank Barchard
b00d40160a make unittest allocator align to 64 bytes.
blur requires memory be aligned.  change the unittest allocator to guarantee 64 byte alignment.
re-enable blur any test that fails if memory is unaligned.

TBR=harryjin@google.com
BUG=libyuv:596,libyuv:594
TESTED=local build passes with row.h removed from tests.

Review URL: https://codereview.chromium.org/2019753002 .
2016-05-27 18:02:47 -07:00
Magnus Jedvert
942db3016a Add ARGBExtractAlpha function
BUG=libyuv:572
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/1995293002 .
2016-05-26 10:30:57 +02:00
Frank Barchard
74a69522da white space fixes for MIPS
TBR=kjellander@chromium.org
BUG=None

Review URL: https://codereview.chromium.org/2005053004 .
2016-05-24 14:17:18 -07:00
Frank Barchard
60abed3a47 add SIMD_ALIGNED to unit_test.h
avoids need for row.h for some unittests;

R=harryjin@google.com
BUG=libyuv:594
TESTED=try bots tested.

Review URL: https://codereview.chromium.org/2004313004 .
2016-05-24 13:56:25 -07:00
Frank Barchard
7edf572e28 remove includes for duplicate functions
R=harryjin@google.com
BUG=libyuv:592
TESTED=local builds work with fewer headers

Review URL: https://codereview.chromium.org/2006943002 .
2016-05-23 17:38:26 -07:00
Frank Barchard
fbdc43a03c fix wrong HAS_ARGBCOPYALPHAROW_SSE2 ifdef
TBR=kjellander@chromium.org
BUG=libyuv:593
TESTED=try bots pass.

Review URL: https://codereview.chromium.org/2000393002 .
2016-05-23 16:26:02 -07:00
Frank Barchard
07cb92272f If image sizes are greater than 32768, fixed point stepping will overflow an int. This CL changes the max size to 32768 and disables the test if larger.
BUG=libyuv:590
TESTED=LIBYUV_FLAGS=-1 LIBYUV_WIDTH=8192 LIBYUV_HEIGHT=16 out/Release/libyuv_unittest --gtest_filter=*
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1947783002 .
2016-05-05 19:09:02 -07:00
Frank Barchard
6924590212 Add all library source files to linux.mk
Allows arm and mips linux builds.
Add psnr and cpuid utility targets.

BUG=libyuv:586
TESTED=make -f linux.mk
TBR=kjellander@chromium.org

Review URL: https://codereview.chromium.org/1906653003 .
2016-04-20 16:48:53 -07:00
Frank Barchard
cf101116c9 Remove initialize to zero on output variables for inline.
Inline that uses temporary variables is currently initializing them
to 0 and passing in as output "+r".
This CL replaces the output constraint to "=&r" for most meaning an
output with early write (before inputs).  This allows the initialize
to zero step to be removed, saving 1 instruction.

BUG=libyuv:580
TESTED=local libyuv build on gcc/linux and try bots
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1895743008 .
2016-04-18 16:24:26 -07:00
Frank Barchard
9c53ff2c57 Fix temporary stride for ConvertToARGB with rotation.
BUG=libyuv:578
TESTED=local unittests pass
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1879783002 .
2016-04-11 15:21:04 -07:00
Frank Barchard
3c862e3d29 Fix stride bug for msan on I420Interpolate.
When using C version of I420Interpolate for msan, a 50% interpolation
would cause stride to be cast to int, which could cause erroneous
memory reads on 64 bit build.
This CL makes the stride use ptrdiff_t for HalfRow_C

BUG=libyuv:582
TESTED=try bots tests
R=dhrosa@google.com

Review URL: https://codereview.chromium.org/1872953002 .
2016-04-08 15:58:53 -07:00
Frank Barchard
ddbc63f7b9 Add //build/config/BUILD.gn to exec whitelist for GN.
Affected Linux GN build, not Windows.

R=kjellander@chromium.org
BUG=libyuv:583
TESTED=gn gen out/Debug --args=is_debug=true

Review URL: https://codereview.chromium.org/1866743002 .
2016-04-06 11:23:28 -07:00
Frank Barchard
ef79a9938b cmake move libyuv_unittest target into the if(TEST) condition
BUG=libyuv:579
TESTED=mkdir build && cd build && cmake .. && cmake --build . --config Release
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/1847233002 .
2016-04-01 16:15:34 -07:00
Frank Barchard
837aa1e2af disable assembly in header for msan=1
GYP_DEFINES="target_arch=x64 msan=1" ./gyp_libyuv
ninja -j7 -C out/Release

R=impjdi@google.com
BUG=libyuv:575

Review URL: https://codereview.chromium.org/1805683003 .
2016-03-15 18:45:38 -07:00
Frank Barchard
ee99b85126 Port ARGBToRGB565 from aarch64 neon to 32 bit
The 64 bit version of ARGBToRGB565 to 32 bit. 64 bit is using sri which shifts and inserts, saving some masking.  The instruction is available for neon 32 bit as well.

R=magjed@chromium.org, harryjin@google.com
BUG=libyuv:571

Review URL: https://codereview.chromium.org/1724393002 .
2016-02-29 12:22:25 -08:00
Frank Barchard
ab0dfdd4ff Documentation fix for android aarch64 disassembly.
Name of objdump tool updated.

TBR=kjellander@chromium.org
BUG=none

Review URL: https://codereview.chromium.org/1715743003 .

Review URL: https://codereview.chromium.org/1727993002 .
2016-02-23 18:30:35 -08:00
Frank Barchard
127ff512b3 add perf data files to ignores
document play services update

R=jkellander@chromium.org
BUG=none

Review URL: https://codereview.chromium.org/1712463002 .
2016-02-17 21:37:09 -08:00
Frank Barchard
cc33dc68c7 Port I411ToARGBRow to AVX2.
An SSSE3 version already exists, and an AVX2 version is available for
Visual C.  This ports the function to AVX2 completing the AVX2 ports of
all YUV to RGB functions for AVX2 on gcc.

TBR=harryjin@google.com
BUG=libyuv:555

Review URL: https://codereview.chromium.org/1687253002 .
2016-02-12 10:26:10 -08:00
Frank Barchard
0e554b18fe port NV12ToRGB565Row_AVX2 to gcc
NV12ToRGB565Row for Intel is implemented as a 2 step conversion:
NV12ToARGBRow_SSSE3 and ARGBToRGB565Row_SSE2

NV12ToARGBRow has an AVX2 version, so this CL implements
NV12ToRGB565Row_AVX2 with call to NV12ToARGBRow_AVX2 and
ARGBToRGB565Row_SSE2.

R=harryjin@google.com
BUG=libyuv:554

Review URL: https://codereview.chromium.org/1687953002 .
2016-02-10 11:13:41 -08:00
Frank Barchard
c39509c8e5 add avx2 wrappers for functions that can call I422ToARGBRow_AVX2
R=harryjin@google.com
BUG=libyuv:557

Review URL: https://codereview.chromium.org/1687713002 .
2016-02-09 17:14:29 -08:00
Frank Barchard
6ea3755330 add 'LIBYUV_DISABLE_X86' to msan for unittests
R=harryjin@google.com
BUG=libyuv:564

Review URL: https://codereview.chromium.org/1685723002 .
2016-02-09 11:57:03 -08:00
Frank Barchard
fc2adcfa42 fix for msan builds which set -DLIBYUV_DISABLE_X86=1
TBR=harryjin@google.com
BUG=libyuv:566

Review URL: https://codereview.chromium.org/1673313003 .
2016-02-09 10:51:20 -08:00
Frank Barchard
0d880e5bc0 rename MIPS_DSPR2 to DSPR2 for consistency
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor.  This should be named consistently.

Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.

TBR=harryjin@google.com
BUG=libyuv:562

Review URL: https://codereview.chromium.org/1677633002 .
2016-02-05 14:49:54 -08:00
Frank Barchard
903c91cc2e fix for ubsan on unittest.h fastrand()
internal math of the fastrand function uses a multiply
and add that overflows a signed int.  This triggers a
ubsan failure:

../../unit_test/../unit_test/unit_test.h:60:33: runtime error: signed integer overflow: 56248274 * 214013 cannot be represented in type 'int'

This change casts the intermediate math to unsigned
int to avoid the overflow.

For more info on ubsan, see
http://dev.chromium.org/developers/testing/undefinedbehaviorsanitizer

TESTED=Passing compilation using:
GYP_DEFINES="ubsan=1"
GYP_DEFINES="ubsan_vptr=1"

R=harryjin@google.com, pbos@webrtc.org
BUG=libyuv:563

Review URL: https://codereview.chromium.org/1662453003 .
2016-02-02 14:32:12 -08:00
Frank Barchard
9e39c1f271 ubsan overflow fix for multiply by 0x01010101
This is an UBSan error reported by libjingle

[ RUN      ] WebRtcVideoFrameTest.ConvertToYUY2BufferStride
[000:000] (videoframe.cc:375): Validate frame passed. format: I420 bpp: 12 size: 1280x720 bytes: 1382400 expected: 1382400 sample[0..3]: 73, 73, 73, 73
../../chromium/src/third_party/libyuv/source/row_gcc.cc:2903:25: runtime error: signed integer overflow: 128 * 16843009 cannot be represented in type 'int'
[8/614] WebRtcVideoFrameTest.ConvertToYUY2BufferStride returned/aborted with exit code 1 (32 ms)
[9/614] WebRtcVideoFrameTest.ConvertToYUY2BufferInverted (29 ms)
Note: Google Test filter = WebRtcVideoFrameTest.ConvertToYUY2BufferInverted

The source is uint8 and the multiply is by 0x01010101 to replicate the byte to 4 bytes.
Changing the constant to 0x01010101u should avoid overflow.

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:563

Review URL: https://codereview.chromium.org/1657533005 .
2016-02-01 12:29:04 -08:00
Frank Barchard
1cc0177669 Remove duplicate prototype for MJPGToARGB
MJPGToARGB prototype is in both convert_argb.h and planar_functions.h
Remove the duplicate prototype from planar_functions.h

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:561

Review URL: https://codereview.chromium.org/1638133002 .
2016-01-26 17:02:45 -08:00
Frank Barchard
ad71738f6a Remove svn version build and unittest.
R=harryjin@google.com
TBR=harryjin@google.com, kjellander@google.com
BUG=libyuv:551

Review URL: https://codereview.chromium.org/1612123002 .
2016-01-21 11:22:11 -08:00
Frank Barchard
8c196f4d4c Fix testi420 unittest for odd height
When the image height for unittests was set to an
odd height, the TestI420 unittest would not fill
the complete source buffer.  This change handles
the odd height test case.
No change to library code.

TBR=harryjin@google.com
BUG=libyuv:549

Review URL: https://codereview.chromium.org/1609103002 .
2016-01-19 16:16:39 -08:00
Frank Barchard
58cb534962 Fix memory overwrite in YUY2ToNV12 odd wdiths
When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination.  Performance
is slower.

Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)

Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545

Review URL: https://codereview.chromium.org/1593833002 .
2016-01-19 11:28:09 -08:00
Frank Barchard
8377c798fb Fix I420ToNV21 for wrong dst_stride_y parameter.
I420ToNV21 passes the wrong dst_stride_y when it calls I420ToNV12; parameter 8 (convert_from.cc:448) is src_stride_y but should be dst_stride_y.  This causes image corruption when converting I420 -> NV21 with mismatched luminance strides.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:547

Review URL: https://codereview.chromium.org/1582793008 .
2016-01-14 17:38:54 -08:00
Frank Barchard
081475b3c8 refactor ARGBToI422 using ARGBToI420 internally
R=harryjin@google.com
BUG=libyuv:546

Review URL: https://codereview.chromium.org/1574253004 .
2016-01-12 17:05:49 -08:00
Frank Barchard
54bbea1701 Disable I420Blend_Any test that uses C
Also renames Inverted to Invert in test name for consistency.

TBR=harryjin@google.com
BUG=libyuv:543

Review URL: https://codereview.chromium.org/1577973004 .
2016-01-11 18:23:04 -08:00
Frank Barchard
8030a711aa Rename rotate tests to include _Opt and disable _Odd tests
TBR=harryjin@google.com
BUG=libyuv:543

Review URL: https://codereview.chromium.org/1577723003 .
2016-01-11 17:30:27 -08:00
Frank Barchard
23c6a83561 Fix ifdef mismatch for mirroruv
Macro define and macro ifdef didnt match, leading to C code
being used.  Make macro match function name.

TBR=harryjin@google.com
BUG=libyuv:543

Review URL: https://codereview.chromium.org/1579023002 .
2016-01-11 16:33:36 -08:00
Frank Barchard
fc52d8ded2 Odd width variation of scale down by 2 for subsampling
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:538

Review URL: https://codereview.chromium.org/1558093003 .
2016-01-06 15:12:17 -08:00
Frank Barchard
2560df9513 add clang variable for other apps to use
R=dhrosa@google.com
BUG=libyuv:539

Review URL: https://codereview.chromium.org/1557923005 .
2016-01-05 11:47:55 -08:00
Frank Barchard
36615d62a0 fix for InterpolateRow_AVX2
port scaledownby4_avx2 to gcc

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1546763002 .
2015-12-22 12:29:54 -08:00
Frank Barchard
71deb7ba3a bug fix - remove shift from InterpolateRow_AVX2
TBR=harryjin@google.com
BUG=libyuv:537

Review URL: https://codereview.chromium.org/1547703002 .
2015-12-22 10:28:48 -08:00
Frank Barchard
2cb2e9e1ad fix for InterpolateRow_AVX2
TBR=harryjin@google.com
BUG=libyuv:535

Review URL: https://codereview.chromium.org/1543773002 .
2015-12-21 18:35:12 -08:00
Frank Barchard
3f4d86053e avx2 interpolate use 8 bit
BUG=libyuv:535
R=dhrosa@google.com

Review URL: https://codereview.chromium.org/1535833003 .
2015-12-21 10:57:32 -08:00
Frank Barchard
f4447745ae Add rounding to InterpolateRow for improved quality and consistency.
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly.  Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.

BUG=libyuv:535
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1533643005 .
2015-12-17 15:24:06 -08:00
Frank Barchard
029f926a14 add NDEBUG for release chromium buids
BUG=libyuv:533

TBR=harryjin@google.com

Review URL: https://codereview.chromium.org/1531143002 .
2015-12-16 16:23:09 -08:00
Frank Barchard
216e93b4e8 Fix MIPS DSPR2 build failure.
Fixing the failure:
 'TransposeWx8_Fast_MIPS_DSPR2' was not declared in this scope

BUG=none
R=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/1527243002 .
2015-12-16 10:37:42 -08:00
Frank Barchard
80ca4514ef change scale down by 4 to use rounding.
TBR=harryjin@google.com
BUG=libyuv:447

Review URL: https://codereview.chromium.org/1525033005 .
2015-12-15 21:25:18 -08:00
Frank Barchard
70445ef2ef avx2 scale down by 2 for gcc
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1520423003 .
2015-12-15 10:59:20 -08:00
Frank Barchard
77346fcb4a disable I411ToARGB assembly if _DEBUG for chromium, as well as DEBUG for other builds.
TBR=harryjin@google.com
BUG=libyuv:533

Review URL: https://codereview.chromium.org/1527903002 .
2015-12-14 21:36:12 -08:00
Frank Barchard
ae55e41851 use rounding in scaledown by 2
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:447,libyuv:527

Review URL: https://codereview.chromium.org/1513183004 .
2015-12-14 17:25:36 -08:00
Frank Barchard
8bca9fc178 remove unused var in a test
remove include from unittest.cc that is already done by unittest.h

TBR=harryjin@google.com
BUG=libyuv:530

Review URL: https://codereview.chromium.org/1513263004 .
2015-12-10 18:39:36 -08:00
Frank Barchard
44373d8fbb Add check for DEBUG to functions disabled on 386
Some functions run out of registers when compiled for debug,
fpic, with stack frames on 32 bit x86 with clang.
Previously they were enabled based on _DEBUG but that macro
is not set in some build systems.  This CL adds DEBUG macro as
well to cover those environments.

R=harryjin@google.com
BUG=libyuv:532

Review URL: https://codereview.chromium.org/1517693005 .
2015-12-10 15:42:46 -08:00
Frank Barchard
a2ea905679 BlendPlane any width.
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms

Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)

Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)

which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1513443002 .
2015-12-08 18:59:48 -08:00
Frank Barchard
fae1a10545 Work around bug in xgetbv for Visual Studio.
xgetbv is generating bad code, falsely disabling AVX2 and AVX512.
disable optimization for the function affected on older versions of Visual C 32 bit.

R=brucedawson@chromium.org, dhrosa@google.com, harryjin@google.com
BUG=libyuv:529

Review URL: https://codereview.chromium.org/1503393004 .
2015-12-08 18:13:32 -08:00
Frank Barchard
2657688e70 Add support for odd height YUVA alpha blending.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1507683003 .
2015-12-07 12:03:20 -08:00
Frank Barchard
bea690b3e0 AVX2 YUV alpha blender and improved unittests
AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions.

unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1505433002 .
2015-12-05 22:23:29 -08:00
Frank Barchard
fa2618ee26 Port BlendPlaneRow_SSSE3 to GCC
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1490273006 .
2015-12-04 11:19:41 -08:00
Frank Barchard
8af0ebf816 planar blend use signed images
R=dhrosa@google.com, harryjin@google.com, jzern@chromium.org
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1491533002 .
2015-12-02 14:20:17 -08:00
Frank Barchard
b6f37bd8ec Interpolate plane initial implementation.
YUV version of interpolation between two images.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:526

Review URL: https://codereview.chromium.org/1479593002 .
2015-11-25 16:11:42 -08:00
Frank Barchard
88552486f1 disable 411 on x86 due to compile error
TBR=harryjin@google.com
BUG=libyuv:524

Review URL: https://codereview.chromium.org/1468523002 .
2015-11-20 11:21:39 -08:00
Frank Barchard
526558b2d8 disable debug build of 411 to work around compiler bug
TBR=harryjin@google.com
BUG=libyuv:524

Review URL: https://codereview.chromium.org/1461013002 .
2015-11-19 02:25:00 -08:00
Frank Barchard
b7dfb72559 fix for I411 build error on 32 bit x86
TBR=harrjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1461693004 .
2015-11-19 01:45:14 -08:00
Frank Barchard
528356a128 syntax fix for gcc movzwl
TBR=harryjin@google.com
BUG=libtyv:525

Review URL: https://codereview.chromium.org/1460723003 .
2015-11-18 13:14:15 -08:00
Frank Barchard
50f8cb2db3 port I411 movzx 2 byte reader to gcc
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)

TBR=harryjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1457783004 .
2015-11-18 13:05:39 -08:00
Frank Barchard
5eefbe2330 Fix for drmemory failure on I411ToARGB
Before
I420ToARGB_Opt (594 ms)
I422ToARGB_Opt (483 ms)
I411ToARGB_Opt (748 ms) ***
I444ToARGB_Opt (452 ms)
I400ToARGB_Opt (218 ms)

After
I420ToARGB_Opt (591 ms)
I422ToARGB_Opt (454 ms)
I411ToARGB_Opt (502 ms)  ***
I444ToARGB_Opt (441 ms)
I400ToARGB_Opt (216 ms)

TBR=harryjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1459513002 .
2015-11-17 18:00:52 -08:00
Frank Barchard
ec4b258d4e free src_a in unittest to fix leak
TBR=harryjin@google.com
BUG=libyuv:524

Review URL: https://codereview.chromium.org/1452083002 .
2015-11-17 00:29:53 -08:00
Frank Barchard
0815568a50 test for unaligned vs aligned for CopyRow_SSE2
improves performance on older CPUs where movdqa is faster.
TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1455463002 .
2015-11-17 00:04:03 -08:00
Frank Barchard
1019e4537f port I444ToARGB avx2 code from Visual C to GCC.
SSSE3
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms)
[----------] 8 tests from LibYUVConvertTest (3389 ms total)

AVX2
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms)
[----------] 8 tests from LibYUVConvertTest (2615 ms total)

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1445893002 .
2015-11-13 18:31:22 -08:00
Frank Barchard
60adcbaf32 scale with conversion using 2 steps with unittest
a prototype function to implement the yuv to rgb with conversion and scale.
replace with 1 step function in future version, using same API.

R=harryjin@google.com
BUG=libyuv:471

Review URL: https://codereview.chromium.org/1421553016 .
2015-11-13 11:25:56 -08:00
Frank Barchard
6100f50f13 fix yvu constants for avx2 yuv to rgb
the yvu matrix for yuv to rgb had an incorrect entry, affecting yuv to bgra,
yuv to abgr and yuv to raw.
fix the matrix and reenable avx2 functions.

R=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1411763004 .
2015-11-10 10:45:44 -08:00
Frank Barchard
72a9e282ec disable more avx2 functions that dont link in chrome
libyuv builds/runs, but when integrated into chromium, produces link errors.  unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.

TBR=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1427683004 .
2015-11-09 17:20:02 -08:00
Frank Barchard
fb5ed1f4c5 disable 4 AVX2 YUV to RGB conversions which fails tests.
disable I422ALPHATOARGBROW_AVX2 I422TOARGBROW_AVX2 I422TORGB24ROW_AVX2 I422TORGBAROW_AVX2 in row.h.
SSSE3 versions will be used instead.
Short term fix until issue can be resolved.

R=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1419513009 .
2015-11-09 14:40:08 -08:00
Frank Barchard
98eb102bea set d19 alpha on inner loop
TBR=harryjin@google.com
BUG=libyuv:521

Review URL: https://codereview.chromium.org/1429263004 .
2015-11-06 11:38:21 -08:00
Frank Barchard
431cb3667a YUV to RGB for x64 use registers instead of memory.
On Arm the YVU to RGB conversions move constants into registers.
This change does the same for 64 bit intel builds where additional
registers are available.
The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided.

x64 builds using memory:
AVX2  I420ToARGB_Opt (3059 ms)
SSSE3 I420ToARGB_Opt (3959 ms)

Now using registers
AVX2  I420ToARGB_Opt (2906 ms)
SSSE3 I420ToARGB_Opt (3928 ms)

TBR=harryjin@google.com
BUG=libyuv:520

Review URL: https://codereview.chromium.org/1407353010 .
2015-11-04 16:16:18 -08:00
Frank Barchard
c2bff1a1af add .gn file for gn builds
using a stripped down gn file from webrtc.

BUG=libyuv:411,libyuv:519
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/1417613007 .
2015-11-04 11:09:00 -08:00
Frank Barchard
860cc0357a Neon versions of I420AlphaToARGB
Add alpha version of YUV to RGB to neon code for ARMv7 and aarch64.
For other YUV to RGB conversions, hoist alpha set to 255 out of loop.

TBR=harryjin@google.com
BUG=libyuv:516

Review URL: https://codereview.chromium.org/1413763017 .
2015-11-03 19:21:36 -08:00
Frank Barchard
d95d2169d9 rename yuv matrix constants to be more clear about what they are
R=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1429693006 .
2015-11-03 17:09:53 -08:00
Frank Barchard
1f1d140bb6 remove mips dsp detect
DSP code is not actually used, only DSPR2.  Remove the detect.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1405043008 .
2015-11-03 16:57:40 -08:00
Frank Barchard
ce4c2fad1d Raw 24 bit RGB to RGB24 (bgr)
Add unittests that do 1 step conversion vs 2 step conversion.

Tests end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1419103007 .
2015-11-03 10:30:30 -08:00
Frank Barchard
87926cec8b remove store bgra, abgr, raw unused macros
TBR=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1420033004 .
2015-11-02 10:40:03 -08:00
Frank Barchard
2c7aa0070a remove I422ToBGRA and use I422ToRGBA internally
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.

Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
5d97b93369 refactor I420ToABGR to use I420ToARGBRow
Using a transposed conversion matrix, I420ToARGB can output ABGR.

R=harryjin@google.com, xhwang@chromium.org
BUG=libyuv:473

Review URL: https://codereview.chromium.org/1413573010 .
2015-10-30 11:56:57 -07:00
Frank Barchard
254ef01551 disable I420AlphaToARGB for 32 bit intel debug build
R=harryjin@google.com
BUG=libyuv:517

Review URL: https://codereview.chromium.org/1428843003 .
2015-10-29 11:21:36 -07:00
Frank Barchard
cdbdf5b723 Fix debug compilation problems for gcc and 32 bit x86.
In some methods with 7 arguments gcc fails to find enough registers
to compile the assembler code when compiling debug. Simplest solution
is to skip the assembler version in debug of those particular functions

(I422Alpha -> ARBG/ABGR)

R=harryjin@google.com,bratell@opera.com
BUG=libyuv:517

Review URL: https://codereview.chromium.org/1423283002 .
2015-10-28 14:27:29 -07:00
Frank Barchard
811a5ec446 pass clangcl compile options to ignore warnings in gflags.cc
R=ajm@chromium.org, ajm@google.com
BUG=libyuv:513,webrtc:760

Review URL: https://codereview.chromium.org/1427643003 .
2015-10-28 10:58:19 -07:00
Frank Barchard
b86dbf24d3 refactor I420AlphaToABGR to use I420AlphaToARGB internally
swap U and V and transpose conversion matrix, so I420AlphaToARGB and
I420AlphaToABGR share low level code.

Having less code with same performance allows more focused
optimization for future ARM versions.

R=harryjin@google.com
TBR=harryjin@chromium.org
BUG=libyuv:473,libyuv:516

Review URL: https://codereview.chromium.org/1422263002 .
2015-10-27 14:17:21 -07:00
Frank Barchard
cf160cdbaa implement I444ToABGR by swapping uv and transpose matrix
U contributes to B and G.  V contributes to R and G.
By swapping U and V, they contribute to the opposite channels.  Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the
new B channel and vice versa.
This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers.

As a result the existing I444ToABGRRow functions are no longer needed and are removed.

Previously this function was only Intel AVX2 optimized for Windwos.  Now it is also optimized for Arm and GCC.

ARMv7 Neon
Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms)
Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms)
20.6 times faster.

R=xhwang@chromium.org
BUG=libyuv:515

Review URL: https://codereview.chromium.org/1414133006 .
2015-10-27 10:21:21 -07:00
Frank Barchard
e8ee175549 add unittest that compares ABGR to ARGB
TBR=harryjin@google.com
BUG=libyuv:515

Review URL: https://codereview.chromium.org/1423663007 .
2015-10-26 17:51:03 -07:00
Frank Barchard
2844662e1c Add avx512bw detection code
R=harryjin@google.com
BUG=libyuv:514

Review URL: https://codereview.chromium.org/1413463004 .
2015-10-26 14:42:49 -07:00
Frank Barchard
1502832a70 switch cpu flags to 0 for unitialized to avoid compare
R=harryjin@google.com
BUG=libyuv:512

Review URL: https://codereview.chromium.org/1418253002 .
2015-10-23 10:57:42 -07:00
Frank Barchard
ad36ba5c48 initialize cpu flags to fix compile error on windows
R=harryjin@google.com
BUG=libyuv:512

Review URL: https://codereview.chromium.org/1422733003 .
2015-10-22 15:16:31 -07:00
Frank Barchard
00f15e3c6c color unittest allow j420 error of 5 for arm
R=harryjin@google.com
BUG=libyuv:511

Review URL: https://codereview.chromium.org/1412683005 .
2015-10-22 11:25:04 -07:00
Frank Barchard
430bb0a0f0 odd width 444 fix
TBR=harryjin@google.com
BUG=libyuv:510

Review URL: https://codereview.chromium.org/1415583003 .
2015-10-21 20:03:19 -07:00
Frank Barchard
ba4b409d51 Fix ARGBToI411 odd width bug.
The any function for handling ARGBToI411 was not handling the pixel
replication correctly.  On 422 and odd width was handled by duplicating
a pixel of source.  411 needs replication for remainders of 1, 2 or 3
pixels.

The C version was handling odd width but with an average of the remainder
pixels, which does not match the SIMD 'any' handling off remainder.
This changes the odd width handling to mimic the any version.

TBR=harryjin@google.com
BUG=libyuv:491

Review URL: https://codereview.chromium.org/1411733004 .
2015-10-21 12:22:24 -07:00
Frank Barchard
9daa550a2e Move cpu_info variable outside ifdef
Fix compile error on arm, mips etc due to undefined variable.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1403373008 .
2015-10-20 16:32:44 -07:00
Frank Barchard
9be6d21ae7 write to cpu_flags once
To make init cpu flags thread safe, there can only be one write to the variable.

R=richard.winterton@intel.com, harryjin@google.com
BUG=libyuv:508

Review URL: https://codereview.chromium.org/1412793006 .
2015-10-20 16:24:01 -07:00
Frank Barchard
e6a54f223a Call AllowCommandLineReparsing in unit tests
Allows us to ignore flags passed on to us by Chromium build bots
without having to explicitly disable them. (Thanks pbos!)

TESTED=webrtc ran modules_unittests with a bogus flag did not result in an
error.

R=kjellander@chromium.org
BUG=libyuv:507

Review URL: https://codereview.chromium.org/1417573002 .
2015-10-19 16:30:41 -07:00
Frank Barchard
94312b695a add gflags support files from webrtc
files needed for command line support with gtest.
These files are copied directly from webrtc.

TBR=kjellander@chromium.org
BUG=libyuv:507

Review URL: https://codereview.chromium.org/1414483002 .
2015-10-16 18:53:25 -07:00
Henrik Kjellander
8dcec019b6 Add gflags dependency
Unit tests currently use environment variables to change behavior.
Using gflags this can be done via command line.

BUG=libyuv:507
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/1413723002 .
2015-10-16 22:08:43 +02:00
Henrik Kjellander
f80cc26da7 Revert "add gflags to deps to allow command line parameters."
This reverts commit 2dd3d9230ee663e71ed4ad9164033ed672e571de.

Reason: chromium_git is a missing variable, and to properly
add gflags, we need to check in GYP files in third_party/gflags
first, then add the DEPS entry.

BUG=libyuv:507
TBR=fbarchard@chromium.org

Review URL: https://codereview.chromium.org/1406323002 .
2015-10-16 21:46:56 +02:00
Frank Barchard
2dd3d9230e add gflags to deps to allow command line parameters.
unittests currently use environment variables to change behavior.
using gflags this can be done via command line.

R=kjellander@chromium.org
BUG=libyuv:507

Review URL: https://codereview.chromium.org/1402313002 .
2015-10-16 10:57:51 -07:00
Frank Barchard
5d0a871d37 remove have jpeg test
This test is just a printf, not a real test, but somehow
fails on arm.

TBR=harryjin@google.com
BUG=libyuv:506

Review URL: https://codereview.chromium.org/1409913002 .
2015-10-15 19:13:07 -07:00
Frank Barchard
cf19a0c9a2 nv21 any fix
R=harryjin@google.com
BUG=libyuv:507

Review URL: https://codereview.chromium.org/1410643002 .
2015-10-15 16:24:51 -07:00
Frank Barchard
52a5504950 fix for C version of YUV to RGB for Arm
YuvPixel for arm was miscomputing YG.

TBR=harryjin@google.com
BUG=libyuv:506

Review URL: https://codereview.chromium.org/1402333002 .
2015-10-15 12:43:37 -07:00
Frank Barchard
e2417df4cb create color test category of unittests to narrow down arm bug
A hang in color conversion on arm occurs somewhere in yuv to rgb.
Breaking the color test into its own category of test will help
run selective tests to narrow down the issue.

R=harryjin@google.com
BUG=libyuv:506

Review URL: https://codereview.chromium.org/1405543003 .
2015-10-14 16:58:55 -07:00
Frank Barchard
c7c188379b avoid vectors for pnacl which cause linker failure.
R=sergeyu@chromium.org
BUG=chromium:538243

Review URL: https://codereview.chromium.org/1396363004 .
2015-10-14 14:49:48 -07:00
Frank Barchard
26db4de2ae break up unittests into categories
R=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1399523004 .
2015-10-13 16:01:07 -07:00
Frank Barchard
4abd096548 fix for yuv to rgb on arm64.
fill in aarch64 yuv constants to match how the code expects them.

TBR=harryjin@google.com
BUG=libyuv:502

Review URL: https://codereview.chromium.org/1396253004 .
2015-10-12 12:02:54 -07:00
Frank Barchard
2e4466e282 change all pix parameters to width for consistency
TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1398633002 .
2015-10-07 22:30:36 -07:00
Frank Barchard
2d601aaf34 merge neon source files back into single libyuv library
previously the neon source code was broken into a separate
library built with -mfpu=neon for the neon assembly, while
the C code was built without neon.

In this change, the neon code is added to the main library
and all code built with neon.

TBR=harryjin@google.com
BUG=libyuv:371

Review URL: https://codereview.chromium.org/1392043003 .
2015-10-07 21:16:51 -07:00
Frank Barchard
76a599ec3b fix jpeg and bt.709 yuvconstants for neon64.
yuv constants for bt.601 were previously ported to neon64, as well
as the code to respect other color spaces.  But the jpeg and bt.709
colour conversion constants were still in armv7 form.  This changes
the constants for aarch64 builds to be compatible with the code.

yuv constants are now passed as const *

Remove Yvu constants which were used for older version on nv21 but not new code.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1398623002 .
2015-10-07 19:46:56 -07:00
Frank Barchard
8f0cadede4 port ARGB to 565 dithering AVX2 code to GCC.
Previously the assembly code was only available to Windows.
This CL ports the AVX2 code to GCC syntax.

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1391273003 .
2015-10-07 19:13:59 -07:00
Frank Barchard
cc89e3a77b port ARGB to 565 dithering SSE2 code to GCC.
Previously the assembly code was only available to Windows.
This CL ports the SSE2 code to GCC syntax.

When running a profiler on all the unittests, this function
was the slowest of all functions that still ran in C code.
   3.71%  libyuv_unittest  libyuv_unittest      [.] ARGBToRGB565DitherRow_C

Was
ARGBToRGB565Dither_Opt (2894 ms)
Now
ARGBToRGB565Dither_Opt (432 ms)

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1397673002 .
2015-10-07 18:24:50 -07:00
Frank Barchard
3e38762d6b fix avx2 box filter bug for yuv down sampling.
offset to second group of pixels was off by 16.
should have been 32, not 16.
requires avx2 hardware and wide image for test.

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:492,libyuv:501

Review URL: https://codereview.chromium.org/1395603002 .
2015-10-07 11:02:33 -07:00
Frank Barchard
013080f2d2 Pass yuvconstants to YUV conversions for neon 64 bit
SETUP provided by zhongwei.yao@linaro.org

Previously the 64 bit Neon code had hard coded constants in the setup macro
for YUV conversion, while 32 bit Neon code supported the yuvconstants
parameter.

This change accepts the constants passed to the YUV conversion row function,
allowing different color spaces to be respected - naming JPEG and BT.709.
As well as the existing BT.601.

TBR=harryjin@google.com
BUG=libyuv:472

Review URL: https://codereview.chromium.org/1384323002 .
2015-10-06 22:19:14 -07:00
Frank Barchard
914a9856c7 Reimplement NV21ToARGB to allow different color matrix.
Low level for NV21ToARGB written to accept yuv matrix used by
other YUV to ARGB functions.
Previously NV21 was implemented for Windows using NV12 with a different
matrix that swapped U and V.  But the Arm version of the low level does
not allow the matrix U and V contributions to be swapped.
Using a new low level function that reads NV21 and uses the same
yuvconstants as other YUV conversion functions allows an Arm port of
this function.

TBR=harryjin@google.com
BUG=libyuv:500

Review URL: https://codereview.chromium.org/1388273002 .
2015-10-06 20:34:44 -07:00
Frank Barchard
f00bc9ef46 Add J444ToARGB conversion function.
J444 is JPeg YUV color space with 444 subsampling.
This implementation uses the existing I444ToARGB conversion, which is
BT.601 color space with 444 subsampling, but passing in the jpeg
color matrix constants.

TBR=harryjin@google.com
BUG=449

Review URL: https://codereview.chromium.org/1387313002 .
2015-10-06 18:46:53 -07:00
Frank Barchard
d70293993f port scale box filter sse2 to gcc
TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1393653002 .
2015-10-06 16:54:26 -07:00
Frank Barchard
f4c1ac10f0 Speed up rounding to byte test
R=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1367403007 .
2015-10-02 15:27:13 -07:00
Frank Barchard
3eefeaeb69 test xsave before calling xgetbv.
R=agl@chromium.org, harryjin@google.com
BUG=libyuv:497

Review URL: https://codereview.chromium.org/1382803002 .
2015-09-30 17:25:41 -07:00
Frank Barchard
2cc1a2b233 Remove sse2 functions that also have ssse3
ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2
Since vast majority of CPUs have SSSE3 now, removing the SSE2
improves the performance of CPU dispatching.

R=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1377053003 .
2015-09-30 14:24:44 -07:00
Frank Barchard
febc26a2c9 win64 version of I422AlphaToARGB.
Was
I420AlphaToARGB_Premult (8861 ms)
I420AlphaToARGB_Opt (7119 ms)
Now
I420AlphaToABGR_Premult (2840 ms)
I420AlphaToARGB_Opt (484 ms)

C function switched to 1 step.
Was
I420AlphaToARGB_Premult (8862 ms)
I420AlphaToABGR_Opt (6718 ms)

Now
I420AlphaToARGB_Premult (8706 ms)
I420AlphaToARGB_Opt (6541 ms)

R=harryjin@google.com
BUG=libyuv:496, libyuv:473

Review URL: https://codereview.chromium.org/1359183003 .
2015-09-25 15:06:41 -07:00
Frank Barchard
9a0e12f5f1 AVX2 1 step I422AlphaToARGB for gcc and win.
C     I420AlphaToARGB_Opt (5169 ms)
SSSE3 I420AlphaToARGB_Opt (432 ms)
AVX2  I420AlphaToARGB_Opt (358 ms)

and with premultiplication as 2 step process:
I420AlphaToARGB_Premult (7029 ms)
I420AlphaToARGB_Premult (757 ms)
I420AlphaToARGB_Premult (508 ms)

R=harryjin@google.com
BUG=libyuv:496,libyuv:473

Review URL: https://codereview.chromium.org/1372653003 .
2015-09-25 13:37:42 -07:00
Frank Barchard
e365cdde3b I420Alpha row function in 1 pass.
API change - I420AlphaToARGB takes flag indicating if RGB should be
premultiplied by alpha.

This version implements an efficient SSSE3 version for Windows.
C version done in 2 steps.

Was
libyuvTest.I420AlphaToARGB_Any (1136 ms)
libyuvTest.I420AlphaToARGB_Unaligned (1210 ms)
libyuvTest.I420AlphaToARGB_Invert (966 ms)
libyuvTest.I420AlphaToARGB_Opt (1031 ms)
libyuvTest.I420AlphaToABGR_Any (1020 ms)
libyuvTest.I420AlphaToABGR_Unaligned (1359 ms)
libyuvTest.I420AlphaToABGR_Invert (1082 ms)
libyuvTest.I420AlphaToABGR_Opt (986 ms)

R=harryjin@google.com
BUG=libyuv:496

Review URL: https://codereview.chromium.org/1367093002 .
2015-09-25 10:29:20 -07:00
Frank Barchard
8fb2048e9f Fix nv12 64 bit gcc increment.
Should be 16 bytes, but was 0x16 causing memory corruption.

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1368693002 .
2015-09-24 10:19:17 -07:00
Frank Barchard
accc04e6d8 NV12ToARGB_AVX2 ported to gcc
TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1364913002 .
2015-09-23 15:54:16 -07:00
Frank Barchard
000cf89ca8 YUY2ToARGB avx2 in 1 step conversion.
Includes UYVYToARGB ssse3 fix.

Was
YUY2ToARGB_Opt (433 ms)
69.79%  libyuv_unittest  libyuv_unittest      [.] I422ToARGBRow_AVX2
20.73%  libyuv_unittest  libyuv_unittest      [.] YUY2ToUV422Row_AVX2
 6.04%  libyuv_unittest  libyuv_unittest      [.] YUY2ToYRow_AVX2
 0.77%  libyuv_unittest  libyuv_unittest      [.] YUY2ToARGBRow_AVX2

Now
YUY2ToARGB_Opt (280 ms)
95.66%  libyuv_unittest  libyuv_unittest      [.] YUY2ToARGBRow_AVX2

BUG=libyuv:494
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1364813002 .
2015-09-23 11:15:18 -07:00
Frank Barchard
2b92ec8d0f Fix git markers introduced on landing previous CL
BUG=none

Review URL: https://codereview.chromium.org/1359023003 .
2015-09-22 15:00:57 -07:00
Frank Barchard
5f3d4270d1 yuy2 to rgb gcc versions
read in read function for yuv conversion

R=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1355393002 .
2015-09-22 14:27:33 -07:00
Frank Barchard
03cd8584e7 Read Y channel in read function for yuv conversion.
Allows reader to support YUY2 format.
Also contains fix for win64 build for yuv conversion.

TBR=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1355333002 .
2015-09-22 12:05:16 -07:00
Frank Barchard
f96890a0be yuvconstants for all YUV to RGB conversion functions.
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1363503002 .
2015-09-22 10:26:03 -07:00
Frank Barchard
62c49dc811 move constants into common
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1359443005 .
2015-09-18 16:28:44 -07:00
Frank Barchard
0381673d19 port I444 to ARGB to matrix. Add I444 to ABGR.
R=harryjin@google.com
BUG=libyuv:488,libyuv:490

Review URL: https://codereview.chromium.org/1348763005 .
2015-09-18 14:36:15 -07:00
Frank Barchard
28427a53e2 I444ToABGR for android
Reimplements I444ToARGB as a matrix function.
new I444ToABGR as matrix functions with wrappers and any functions.
Allows for future J444 and H444 versions.
I444ToABGR user level function added.

BUG=libyuv:490, libyuv:449
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1355733002 .
2015-09-18 11:20:58 -07:00
Frank Barchard
158d4079a3 NEON J422ToABGR and H422ToABGR missing prototypes
TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1351993003 .
2015-09-17 15:47:55 -07:00
Frank Barchard
bdfd59a728 NEON constants
TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1351553005 .
2015-09-17 15:28:29 -07:00
Frank Barchard
28ce7d94f5 j422toabgr neon port using i422toabgr matrix function.
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1353923003 .
2015-09-17 15:20:55 -07:00
Frank Barchard
6fcbae1409 J422ToARGB Neon but not aarch64
TBR=harryjin@google.com
BUG=libyuv:493

Review URL: https://codereview.chromium.org/1348203004 .
2015-09-17 12:43:05 -07:00
Frank Barchard
6a6b67e7a9 Add H422ToARGB armv7 neon version.
Patch provided by zhongwei.yao@linaro.org

R=fbarchard@chromium.org, fbarchard@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1344393002 .
2015-09-17 10:38:15 -07:00
Frank Barchard
bb0a521c52 j422 not available on aarch64
The aarch64 version does not have I422ToARGBMatrix yet,
so adding this to the ifdef section of row.h

R=harryjin@google.com
TBR=harryjin@google.com, zhongwei.yao@linaro.org
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1347853002 .
2015-09-15 15:26:01 -07:00
Frank Barchard
509c644245 Add J422ToARGB armv7 neon version.
R=fbarchard@chromium.org, fbarchard@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1334173005 .
2015-09-15 15:01:48 -07:00
Frank Barchard
fcacbfb27f validate scan EOI from end for better coverage
R=tpsiaki@google.com
BUG=libyuv:478

Review URL: https://codereview.chromium.org/1344623003 .
2015-09-14 10:58:51 -07:00
Frank Barchard
67a9e30225 neon yuv matrix function
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://codereview.chromium.org/1337973002 .
2015-09-11 11:12:30 -07:00
Frank Barchard
316e1ab996 avx2 width parameter bug fix
R=harryjin@google.com
BUG=libyuv:489

Review URL: https://codereview.chromium.org/1321773004 .
2015-09-09 11:56:35 -07:00
Frank Barchard
8467f14ebb disable avx2
R=harryjin@google.com
BUG=libyuv:489

Review URL: https://codereview.chromium.org/1318893003 .
2015-09-08 11:55:52 -07:00
Frank Barchard
ed55d24d9f H420 functionality
R=harryjin@google.com
BUG=libyuv:488

Review URL: https://webrtc-codereview.appspot.com/54869004 .
2015-09-06 11:01:40 -07:00
Frank Barchard
67b06e66cb I422ToABGR for win64. Moves any functions to accomidate win64 subset of formats.
TBR=harryjin@google.com
BUG=libyuv:488

Review URL: https://webrtc-codereview.appspot.com/57679004 .
2015-09-03 11:00:18 -07:00
Frank Barchard
7060e0d826 I420ToABGRMatrix functions with J420ToABGR wrapper.
Allows direct conversion from JPeg to ABGR for android.

BUG=libyuv:488
R=harryjin@google.com

Review URL: https://webrtc-codereview.appspot.com/55719004 .
2015-09-03 10:42:36 -07:00
Frank Barchard
fbc3d595e9 define yuvconstants structure all the time, so its can be referred to on all builds.
currently only intel code uses this structure, but the prototypes are there for neon and lack of a structure cases a compile error on arm.

R=tpsiaki@google.com
BUG=none

Review URL: https://webrtc-codereview.appspot.com/52799004 .
2015-09-02 14:55:11 -07:00
Frank Barchard
925c3d9e26 I420ToARGB conversion with matrix.
Take color conversion constants as a parameter to row function for I420ToARGBMatrixRow_SSSE3.
Allows future variations of color space using a single low level.

R=harryjin@google.com
BUG=libyuv:488

Review URL: https://webrtc-codereview.appspot.com/56669004 .
2015-09-02 10:45:42 -07:00
Frank Barchard
be11f500f0 Use ebp to point to conversion table.
Proof of concept that conversions can table color matrix as a parameter.

R=harryjin@google.com

BUG=libyuv:472, libyuv:488

Review URL: https://webrtc-codereview.appspot.com/58489004.
2015-08-28 12:00:49 -07:00
Frank Barchard
3c4f5735ce use pointer to inverse table for clangcl
R=harryjin@google.com
TBR=harryjin@google.com
BUG=none

Review URL: https://webrtc-codereview.appspot.com/54859004.
2015-08-26 12:53:03 -07:00
Frank Barchard
d317a70c1d llvm64 link error fix.
R=harryjin@google.com
BUG=libyuv:485

Review URL: https://webrtc-codereview.appspot.com/58479004.
2015-08-24 14:21:04 -07:00
Frank Barchard
4dfdabb552 I420AlphaToABGR for android version of yuva conversion
Same as I420AlphaToARGB but first step converts to ABGR instead of ARGB.

TBR=harryjin@google.com
BUG=libyuv:473

Review URL: https://webrtc-codereview.appspot.com/52779004.
2015-08-20 19:36:59 -07:00
Frank Barchard
2fb6fd74be [Android] Remove reference to third_party/android_testrunner.
Deleting in https://codereview.chromium.org/1290173003

BUG=chromium:267773
R=harryjin@google.com

Review URL: https://webrtc-codereview.appspot.com/54849004.
2015-08-19 16:13:27 -07:00
Frank Barchard
ee9aaea02f i422torgb565 is asm for clangcl as well
Merge branch 'master' of https://chromium.googlesource.com/libyuv/libyuv into convertcl

allow lto for llvm but not gcc

R=harryjin@google.com
BUG=libyuv:469

Review URL: https://webrtc-codereview.appspot.com/52769004.
2015-08-19 10:46:30 -07:00
Frank Barchard
bb66c021ff Re-enable LLVM LTO on Neon targets.
LTO was disabled due to a GCC compiler bug that does not affect LLVM.
This fixes the build in the cfi_vptr==1 configuration, which requires LLVM LTO.

R=pcc@google.com
BUG=chromium:469376

Review URL: https://webrtc-codereview.appspot.com/57659004.
2015-08-18 15:26:52 -07:00
Frank Barchard
94d4269936 clang use scalewin
R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:469

Review URL: https://webrtc-codereview.appspot.com/51329004.
2015-08-18 14:50:27 -07:00
Frank Barchard
cda9d38a4e xmmword cast for clang
clangcl use compare_win for 32 bit, allowing fallback and enabling avx2 code for clang.
move defines/protos to compare_row.h
fix issue with odd width ARGBCopyAlpha functions by copying destination to temp buffer, then doing alpha copy, then copy back to destination.

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:484

Review URL: https://webrtc-codereview.appspot.com/59379004.
2015-08-18 11:13:12 -07:00
Frank Barchard
baf6a3c1bd Using the visual C source allows clangcl to fallback seamlessly to visual c, and supports SSE41 and AVX2 versions.
R=harryjin@google.com
BUG=libyuv:469

Review URL: https://webrtc-codereview.appspot.com/58469004.
2015-08-17 10:47:43 -07:00
Frank Barchard
278d88f872 Copy Alpha odd width support
R=harryjin@google.com
BUG=none

Review URL: https://webrtc-codereview.appspot.com/59369004.
2015-08-13 15:05:14 -07:00
Frank Barchard
8e7a62f22a I420AlphaToARGB conversion for planar YUV with Alpha to ARGB.
R=brucedawson@chromium.org, harryjin@google.com
BUG=libyuv:473

Review URL: https://webrtc-codereview.appspot.com/54829004.
2015-08-12 17:01:24 -07:00
Frank Barchard
58f0020137 use visual c 32 bit code for clangcl
R=harryjin@google.com
BUG=libyuv:483

Review URL: https://webrtc-codereview.appspot.com/54819004.
2015-08-11 10:10:45 -07:00
Frank Barchard
9425c4b01a rotate nv12 any width
BUG=libyuv:464
R=harryjin@google.com

Review URL: https://webrtc-codereview.appspot.com/55709004.
2015-08-07 23:48:38 -07:00