1466 Commits

Author SHA1 Message Date
Frank Barchard
a2142148e9 Remove x64 native_client macros.
Bug: libyuv:702
Test: try bots pass
Change-Id: I76d74b5f02fe9843418108b84742e2f714d1ab0a
Reviewed-on: https://chromium-review.googlesource.com/855656
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 01:27:22 +00:00
Frank Barchard
00d526d4ea H010ToARGB_AVX2 optimized conversion
AVX2 optimized 10 bit YUV to ARGB.

Bug: libyuv:751
Test: H010ToARGB unittest
Change-Id: I705630beb62714b52042c2a5dcdb8b7859e734ae
Reviewed-on: https://chromium-review.googlesource.com/852563
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-09 03:17:33 +00:00
Frank Barchard
55310f92bc Remove NACL_R14 macro
Bug: libyuv:702
Test: try bots still build
Change-Id: I05317e45c885955fcda233bdddbd11ce1d246d90
Reviewed-on: https://chromium-review.googlesource.com/854770
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-08 22:41:15 +00:00
Frank Barchard
50f9e618fa Add H010ToABGR, I010ToABGR and I010ToARGB functions
ABGR output is implemented using the same source code as ARGB, by swapping
the u and v and supplying the mirrored conversion matrix.
ABGR format (RGBA in memory) is popular on Android.

Bug: libyuv:751
Test: H010ToABGR, I010ToABGR and I010ToARGB unittests

Change-Id: I0b5103628c58dcb22a6442c03814d4d5972e0339
Reviewed-on: https://chromium-review.googlesource.com/852985
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-08 17:40:33 +00:00
Frank Barchard
9d2cd6a3ef H010ToAR30 optimized to 2 step conversion
Previously H010ToAR30 was done in a 3 step conversion:
H010ToH420, H420ToARGB, ARGBToAR30.
This CL merges the first 2 steps into H010ToARGB, to
improve performance.
Caveat - only 10 bit YUV is supported at this time.
Previously the low level code supported different numbers
of bits - 9, 10, 12 or 16.

Was 3 step conversion:
LibYUVConvertTest.H010ToAR30_Any (1263 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
LibYUVConvertTest.H010ToAR30_Invert (913 ms)
LibYUVConvertTest.H010ToAR30_Opt (901 ms)

Now 2 step conversion:
LibYUVConvertTest.H010ToAR30_Any (853 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
LibYUVConvertTest.H010ToAR30_Invert (781 ms)
LibYUVConvertTest.H010ToAR30_Opt (755 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
Reviewed-on: https://chromium-review.googlesource.com/853288
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-07 08:36:57 +00:00
Frank Barchard
a64658593e I210ToARGB conversion from 10 bit YUV to RGB
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.

Bug: libyuv:751
Test:  I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-05 02:43:38 +00:00
Frank Barchard
2ed2402fa0 I420ToI010 for 8 to 10 bit YUV conversion.
Convert planar 8 bit formats to planar 16 bit formats.

Includes msan fix for Convert8To16Row_Opt unittest.

I420 is YUV bt.601 8 bits per channel with 420 subsampling.
I010 is YUV bt.601 10 bits per channel with 420 subsampling.
I is color space - bt.601.  The function does no color space
 conversion so H420ToI010 is aliased to this function as well.
0 = 420 subsampling.  The chroma channels are half width / height.
10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.

For SSSE3 version:
out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
[ RUN      ] LibYUVConvertTest.I420ToI010_Opt
[       OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.I420ToI010_Opt
Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
Reviewed-on: https://chromium-review.googlesource.com/846421
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-02 21:09:39 +00:00
Frank Barchard
140fc0a261 Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.

Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-02 18:57:39 +00:00
Frank Barchard
768f103b8b Convert8To16 for better H010 support
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.

Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-28 22:27:24 +00:00
Frank Barchard
c67db60534 HalfFloat_SSE2 use movd from memory
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned

Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-21 19:37:50 +00:00
Frank Barchard
790054ff03 Add AR30ToARGB function
Initial AR30ToARGB function to allow converion
from AR30 to other formats if necessary and/or
for testing.
Not optimized at this point.

Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: I38ef192315240f3caa7aee0218b38d5e88a2849f
Reviewed-on: https://chromium-review.googlesource.com/833025
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-19 01:54:42 +00:00
Frank Barchard
5336217f11 H010Copy function to copy 16 bit planar formats
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToH010_Opt
Change-Id: I996d309040a14193a97d05b62ac0b3e1ad1ee74b
Reviewed-on: https://chromium-review.googlesource.com/823445
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-15 03:34:34 +00:00
Frank Barchard
3b81288ece Remove Mips DSPR2 code
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-14 18:22:16 +00:00
Frank Barchard
bb3180ae80 Add I420ToAR30 10 bit RGB
For more complete support of AR30 format, add I420ToAR30 allowing
the new RGB 10 bit format to be used from standard 8 bit I420 format.

Bug: libyuv:751
Test: I420ToAR30 unittest added
Change-Id: Ia8b0857447408bd6adab485158ce5f38d6dc2faa
Reviewed-on: https://chromium-review.googlesource.com/823243
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 23:40:58 +00:00
Frank Barchard
c367751430 ARGBToAR30 SSSE3 use pmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 20:12:58 +00:00
Frank Barchard
11dd1b956f ARGBToAR30 use vpmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. vpmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. vpshufb is used to shift and mask 2 channels of R and B

Red Blue
With the 8 bit value in the upper bits, vpmulhuw by (1024+4) will produce a 10
bit value in the low 10 bits of each 16 bit value. This is whats wanted for the
blue channel. The red needs to be shifted 4 left, so multiply by (1024+4)*16 for
red.

Alpha Green
Alpha and Green are already in the high bits so vpand can zero out the other
bits, keeping just 2 upper bits of alpha and 8 bit green. The same multiplier
could be used for Green - (1024+4) putting the 10 bit green in the lsb.  Alpha
would be a simple multiplier to shift it into position.  It wants a gap of 10
above the green.  Green is 10 bits, so there are 6 bits in the low short.  4
more are needed, so a multiplier of 4 gets the 2 bits into the upper 16 bits,
and then a shift of 4 is a multiply of 16, so (4*16) = 64.  Then shift the
result left 10 to position the A and G channels.

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: Ie4f20dce18203bae7b75acb1fd5232db8a8a4f11
Reviewed-on: https://chromium-review.googlesource.com/820046
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-12-12 02:57:54 +00:00
Frank Barchard
0f98c3c1df Add ARGBToAR30Row_SSE2 to speed up H010ToAR30
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.

Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-09 00:11:20 +00:00
Frank Barchard
aabe380890 H010ToAR30 and H010ToARGB optimized YUV buffering
Reduce allocations of row buffers to 1 alloc/free.
Do 2 rows at a time to avoid converting U and V planes twice.

Bug: libyuv:715
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I2f3a03b4875df5e3b969112a78a1a0b28399fa2f
Reviewed-on: https://chromium-review.googlesource.com/816021
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-12-08 18:55:03 +00:00
Frank Barchard
3541e46a7e Add H010ToARGB for 10 bit YUV to ARGB
Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToARGB_Opt
Change-Id: I668d3f3810e59a4fb6611503aae1c8edc7d596e7
Reviewed-on: https://chromium-review.googlesource.com/815015
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-07 20:17:50 +00:00
Frank Barchard
49d9b1039b NV21ToABGR for Android camera conversions
Bug: libyuv:762
Test: NV21ToABGR unittest
Change-Id: I71448ab83930339083f07eeafccf240c6cb41c48
Reviewed-on: https://chromium-review.googlesource.com/795212
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-11-30 20:29:28 +00:00
Frank Barchard
324fa32739 Convert16To8Row_SSSE3 port from AVX2
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used.  This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.

Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-28 19:22:39 +00:00
Lei Zhang
8445617191 Mark a bunch of kArray variables as const.
This allows the linker to move the variables from the .data section to
the .rodata section.

Bug: libyuv:254
Test: out/Release/libyuv_unittest --gtest_filter=* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1

Change-Id: I6998570f1af4337d7b80313d9e18e36aa20d6ec0
Reviewed-on: https://chromium-review.googlesource.com/777033
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-27 23:38:44 +00:00
Frank Barchard
26173eb73e H010ToAR30 for 10 bit bt.709 YUV to 30 bit RGB
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2

Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-22 23:58:30 +00:00
Frank Barchard
a98d6cdb17 ARGBToAR30 AVX2 conversion function
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: I09c13eb53ba5f1ce1740c013dc587f8300f1d9e0
Reviewed-on: https://chromium-review.googlesource.com/780437
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-21 20:37:01 +00:00
Frank Barchard
12c904a97c H420ToRAW and H420ToRGB24 added for bt.709 support.
Bug: libyuv:760
Test: LibYUVConvertTest.H420ToRAW_Opt
Change-Id: I050385f477309d5db02bb2218088f224c83392ed
Reviewed-on: https://chromium-review.googlesource.com/775785
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-17 01:20:05 +00:00
Frank Barchard
46594be758 add ScalePlane_16 unit tests
Tests ScalePlane vs ScalePlane_16 match.

Bug: libyuv:749
Test: LibYUVScaleTest.ScalePlaneDownBy4_Box_16
Change-Id: I3f71748da404982d5d48bfb11bbd3ae95a1d021c
Reviewed-on: https://chromium-review.googlesource.com/765045
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-16 01:40:48 +00:00
Frank Barchard
630c8ed1e0 Fix for ScaleDownBy4_Linear_16
The unittest compares the results of 8 and 16 bit scaling
and expects them to be the same.  This CL makes the 16 bit
scaling filter logic match.

Bug: libyuv:749
Test: LibYUVScaleTest.DISABLED_ScaleDownBy4_Linear_16
Change-Id: Ifb3ca4d770ef38f9f16abe9b9aeb843b779bf371
Reviewed-on: https://chromium-review.googlesource.com/772370
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-15 23:05:22 +00:00
Frank Barchard
49d1e3b036 MultiplyRow_16_AVX2 for converting 10 bit YUV
When converting from lsb 10 bit formats to msb, the values
need to be shifted to the top 10 bits.  Using a multiply
allows the different numbers of bits to be copied:
// 128 = 9 bits
// 64 = 10 bits
// 16 = 12 bits
// 1 = 16 bits
Bug: libyuv:751
Test: LibYUVPlanarTest.MultiplyRow_16_Opt
Change-Id: I9cf226053a164baa14155215cb175065b1c4f169
Reviewed-on: https://chromium-review.googlesource.com/762951
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 22:02:32 +00:00
Frank Barchard
2f58d126b9 MergeUV10Row_AVX2 use multiply to handle different bit depths
Instead of hardcoded shift, use a multiply by a parameter.
128 = 9 bits
64 = 10 bits
16 = 12 bits
1 = 16 bits

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: Id925edfdbf91243370c90641b50eb8e7625ec329
Reviewed-on: https://chromium-review.googlesource.com/762523
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 03:38:07 +00:00
Frank Barchard
e26b0a7e0e casting for c89 compatibility and lint cleanup
Bug: libyuv:756
Test: CFLAGS="-m32 -static -std=gnu89 -mno-sse -O2" CXXFLAGS="-m32 -x c -static -std=gnu99 -mno-sse -O2" make -f linux.mk libyuv.a
Change-Id: Ic362f93e01ccbb0bea14f361a58585e79297e7d2
Reviewed-on: https://chromium-review.googlesource.com/759423
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Patrik Höglund <phoglund@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 18:22:17 +00:00
Frank Barchard
735ace2ed3 Re-enable x86 assembly without requiring -msse2
clang does not require -msse2 or -msse for inline, except
the "x" parameter.  So change this to "m" for 32 bit.  64 bit
requires sse2 so use "x" for 64 bit.

gcc requires -msse for xmm registers in clobber list.
Reduce compiler requirement from -msse2 to -msse for enabling
assembly.

Bug: libyuv:754, libyuv:757
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I86df72cfee80b7d349561c1fd7c97ad360767255
Reviewed-on: https://chromium-review.googlesource.com/759303
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 00:51:06 +00:00
Frank Barchard
68f852d835 Remove DISABLE_CLANG_MSA
cleanup to remove ifdefs around functions affected by
a clang bug.

gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest

Bug: libyuv:634
Test: build for mips with clang
Change-Id: I278b368dbb2fe89082240e280267d0a27a214c78
Reviewed-on: https://chromium-review.googlesource.com/757980
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-08 19:55:14 +00:00
Frank Barchard
d997ac287d Revert "Enable SSE2 code without -msse"
This reverts commit 01e994d74e4e3937ee1a3efdc048320a1e51f818.

Change-Id: Ie76710d0f4e641e071889c5125fd3be23cdcdb59
Reviewed-on: https://chromium-review.googlesource.com/758499
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 19:33:09 +00:00
Frank Barchard
01e994d74e Enable SSE2 code without -msse
Bug: libyuv:754
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I74bf8d032013694e65ea7637bc38d3253db53ff2
Reviewed-on: https://chromium-review.googlesource.com/758043
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 02:54:41 +00:00
Frank Barchard
522fd699e6 AVX512 feature detects for cnl and icl
Key instruction sets added for each microarchitecture:

AVX512BW, AVX512VL, AVX512DQ - skylake server or later
AVX512_VBMI, AVX512_IFMA - cannon lake or later
AVX512_BITALG, AVX512_VBMI2, AVX512_VPOPCNTDQ, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ - ice lake or later

Bug: libyuv:752
Test: ~/intelsde/sde -icl -- out/Release/libyuv_unittest --gtest_filter=*Cpu*
Change-Id: I9ee28904c90009d66721b9f805a440c5fc2da122
Reviewed-on: https://chromium-review.googlesource.com/755617
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-07 00:56:37 +00:00
Frank Barchard
afa98e1f08 HammingDistance_SSSE3 use movd not vmovd
vmovd is an AVX instruction.  This will crash on an older CPU with
only SSSE3 but not AVX.  Use movd instead.

Bug: libyuv:753
Test: ~/intelsde/sde -mrm -- out/Release/libyuv_unittest --gtest_filter=LibYUVCompareTest.BenchmarkHammingDistance_Opt
Change-Id: I1fb0026039d5f83d124f5d03fed7dc0d2d723e49
Reviewed-on: https://chromium-review.googlesource.com/756200
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-07 00:48:52 +00:00
Frank Barchard
a0c32b9e49 MergeUV10Row_AVX2 for converting H010 to P010
H010 is 10 bit planar format with 10 bits in lower bits.
P010 is 10 bit biplanar format with 10 bits in upper bits.
This function weaves the U and V channels and shifts the bits
into the upper bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: I4a0bac0ef1ff95aa1b8d68261ec8e8e86f2d1fbf
Reviewed-on: https://chromium-review.googlesource.com/752692
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-03 18:55:36 +00:00
Frank Barchard
de8e9ae10b HammingDistance_SSE42 register optimized to avoid push
Bug: libyuv:701
Test: objdump to confirm code gen
Change-Id: Ibdcb2cc6bc9bf14b4ccb874c49fc9ff664650e1a
Reviewed-on: https://chromium-review.googlesource.com/745390
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-10-31 21:12:32 +00:00
Frank Barchard
ffc4811863 clang-format fixes
Bug: None
Test: lint passes
Change-Id: I1fd40d3506bab1f4f9100902f633a9c9e7b96337
Reviewed-on: https://chromium-review.googlesource.com/745038
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-10-31 02:25:01 +00:00
Frank Barchard
80077a80c2 HammingDistance_X86 using popcnt assembly
popcnt has a fake dependency on the destination.
This assembly avoids the dependency by using a different
register for each popcnt.

Bug: libyuv:701
Test: LIBYUV_DISABLE_SSSE3=1 out/Release/libyuv_unittest --gtest_filter=*Ham*Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: Ie1d202e2613b7fa8a3c02acd433940e92c80eafa
Reviewed-on: https://chromium-review.googlesource.com/731826
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-23 21:15:12 +00:00
Frank Barchard
8fa02df3c0 mingw fix ifdefs to use gcc source
mingw gcc sets the macro _M_IX86 which is normally only set
by Visual C and clangcl which are Visual C style source code
style for assembly, but gcc is not Visual C compatible.
Add _MSC_VER to most ifdefs to detect that its really Visual C
or clangcl and not mingw gcc so the gcc source code will be used.

Bug: libyuv:744
Test: CXXFLAGS=-m32 CXX=~/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-g++ make -f linux.mk
Change-Id: I3431aa486eb769b145faa8d5eb75ed639f9d6f5e
Reviewed-on: https://chromium-review.googlesource.com/722319
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-17 17:36:35 +00:00
Frank Barchard
e23b27d040 Reduce HammingDistance block size to 32k to avoid overflow
Bug: libyuv:701
Test: HammingDistance unittest with large size
Change-Id: Id41a2c27eb8922d03b3a21dab32fa2e7b015ba38
Reviewed-on: https://chromium-review.googlesource.com/708335
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-10 18:42:47 +00:00
Frank Barchard
60f433fbd9 Revert "ComputeHammingDistance reduce SIMD loop to 1 call when possible."
This reverts commit ec75df5894845b8d6b1341885a78db1de83decd8.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> ComputeHammingDistance reduce SIMD loop to 1 call when possible.
> 
> 32 bit x86 has high overhead due to -fpic.  So this reduces the
> number of calls by 1.
> 
> TBR=kjellander@chromium.org
> Bug: libyuv:701
> Test: BenchmarkHammingDistance
> Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
> Reviewed-on: https://chromium-review.googlesource.com/701755
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Reviewed-by: Cheng Wang <wangcheng@google.com>

TBR=rrwinterton@gmail.com,fbarchard@google.com,wangcheng@google.com

Change-Id: Ia61e8558a8f083c14be5f51e0e141550b6f2b5c1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: libyuv:701
Reviewed-on: https://chromium-review.googlesource.com/707823
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-10 01:16:15 +00:00
Frank Barchard
ec75df5894 ComputeHammingDistance reduce SIMD loop to 1 call when possible.
32 bit x86 has high overhead due to -fpic.  So this reduces the
number of calls by 1.

TBR=kjellander@chromium.org
Bug: libyuv:701
Test: BenchmarkHammingDistance
Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
Reviewed-on: https://chromium-review.googlesource.com/701755
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-10-09 22:51:23 +00:00
Frank Barchard
1734712a6f Fix odd length HammingDistance
If length of HammingDistance was not a multiple of 4,
the result was incorrect.  The old tests did not catch this
so a new test is done to count 1s.

Bug: libyuv:740
Test: LibYUVCompareTest.TestHammingDistance
Change-Id: I93db5437821c597f1f162ac263d4a594bb83231f
Reviewed-on: https://chromium-review.googlesource.com/699614
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-04 22:21:36 +00:00
Frank Barchard
fecd741794 Port HammingDistance to SSSE3
Bug: libyuv:701
Test: BenchmarkHammingDistance_Opt
Change-Id: Ibdd5d382677ebef4f82a62e0d5c3b88614a3b6e4
Reviewed-on: https://chromium-review.googlesource.com/696290
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-10-03 19:11:05 +00:00
Frank Barchard
bde789b176 Hamming Distance SSE2 and AVX2 optimized
Bug: None
Test: None
Change-Id: Id52663f9c957aac3172fba92d888ad1b041d5cf0
Reviewed-on: https://chromium-review.googlesource.com/692981
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-02 22:32:54 +00:00
Frank Barchard
311add63c2 CopyRow_NEON use ldp instead of ld1 for better performance.
Under cache thrashing circumstances, ldp/stp perform better than
ld1/st1 on QC820/QC821 CPUs.  Same performance when hitting cache.

Bug: libyuv:738
Test: LibYUVPlanarTest.TestCopySamples_Opt (445 ms)
Change-Id: Ib6a0a5d5e6a1b7ef667b9bb2edb39d681cf3614c
Reviewed-on: https://chromium-review.googlesource.com/691281
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-29 01:52:29 +00:00
Frank Barchard
efbf15754a Step thru full color test by increments of 5 for better test speed.
Full color test is the slowest of the unittests, and not catching any
additional bugs at the moment.  Step thru range of 0 to 255 in steps of
5 to speed up the test.  255 is 3 * 5 * 17, so any of those primes would
hit 0 and 255 exactly.

Was LibYUVColorTest.TestFullYUV (896 ms)
Now LibYUVColorTest.TestFullYUV (212 ms)

TBR=kjellander@chromium.org
Bug: libyuv:736
Test: LibYUVColorTest.TestFullYUV
Change-Id: I5b55fb07ada0dc7bdc3c3c20569d36bf09bb3804
Reviewed-on: https://chromium-review.googlesource.com/672064
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-19 02:01:53 +00:00
Frank Barchard
00c501fe43 Cast xgetbv from int64 to int to avoid Visual C warning.
TBR=kjellander@chromium.org
Bug: libyuv:735
Test: try bots
Change-Id: I00dc06689cd0a23847865c0c8edeb538b0cc81ac
Reviewed-on: https://chromium-review.googlesource.com/669142
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-15 22:00:52 +00:00
Frank Barchard
0a3d23c898 fix clang-format-ing for row arm functions
TBR=kjellander@chromium.org
BUG=None
TEST=git cl lint

Change-Id: I45ecd7f8279981ba037dc051f521f6b6d5506f64
Reviewed-on: https://chromium-review.googlesource.com/664345
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-14 21:35:06 +00:00
Frank Barchard
753a91cbcb fix fmov build error on gcc 4.7 for neon64
TBR=kjellander@chromium.org
BUG=libyuv:732
TEST=LibYUVPlanarTest.TestScaleSumSamples_Opt

Change-Id: If80e9510ad5668b080b9384e656c0bd73cf5b4a6
Reviewed-on: https://chromium-review.googlesource.com/663764
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-12 22:46:33 +00:00
Frank Barchard
1e16cb5c38 SplitRGBPlane and MergeRGBPlane functions added
Converts packed RGB to planar and back.

TBR=kjellander@chromium.org
BUG=libyuv:728
TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added

Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc
Reviewed-on: https://chromium-review.googlesource.com/658069
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-11 21:02:04 +00:00
Frank Barchard
8f5e9cd9eb ScaleRowUp2_16_C port of NEON to C
Single pass upsample with bilinear filter.
NEON version optimized - Pixel Sailfish QC821

Was TestScaleRowUp2_16 (5741 ms)
Now TestScaleRowUp2_16 (4484 ms)
C   TestScaleRowUp2_16 (6555 ms)

TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowUp2_16 (709 ms)

Change-Id: Ib04ceb53e0ab644a392c39c3396e313530161d92
Reviewed-on: https://chromium-review.googlesource.com/646701
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-05 21:40:39 +00:00
Manojkumar Bhosale
2621c91bf1 Add MSA optimized HammingDistance and SumSquareError functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Id0126ba5aff38817525b1efa6044f1dc2cfa1a36
Reviewed-on: https://chromium-review.googlesource.com/625739
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-05 21:32:33 +00:00
Frank Barchard
0acc67712f clang format / lint cleanup for arm scale functions
TBR=kjellander@chromium.org
BUG=libyuv:725
TEST=lint

Change-Id: I76f777427f9b1458faba12796fb0011d8e3228d5
Reviewed-on: https://chromium-review.googlesource.com/646586
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-31 22:41:08 +00:00
Frank Barchard
a826dd7112 ARGBScaleDown by 2 with nearest neighbor optimized
TBR=kjellander@chromium.org
BUG=libyuv:723
TEST=ScaleDownBy2_None

Change-Id: I6861e62d3a67dde916b87fdc46eb02f2b4ee9f17
Reviewed-on: https://chromium-review.googlesource.com/644149
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-30 23:22:14 +00:00
Frank Barchard
1c85f98846 Scale down by 2 linear use 'half add' to average pixels.
Use ld2 to load even and odd pixels into different registers
and hadd to half add them to each other.

Previously used paired and shift.

TBR=kjellander@chromium.org
BUG=libyuv:723
TEST=ScaleDownBy2_Linear

Change-Id: I3ec72bcf7d4c746837217496c301eb4e4ad963cf
Reviewed-on: https://chromium-review.googlesource.com/644113
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-30 22:10:32 +00:00
Frank Barchard
e200738d82 Scale Down by 2 use ld2 and urhadd
urhadd is a rounded average.  Linear filter wants to average
horizontally, so use ld2 to separate even and odd pixels.

TBR=jkellander@chromium.org
BUG=None
TEST=LibYUVScaleTest.*ScaleDownBy2*

Change-Id: Id667288a030e72ce8e1c1d6719b69c555c0db063
Reviewed-on: https://chromium-review.googlesource.com/642448
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-30 01:18:11 +00:00
Manojkumar Bhosale
b6e8e9aa97 Add MSA optimized HalfFloatRow function
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: I54a2c57d66093b887c8ba31fd7a21a102165393a
Reviewed-on: https://chromium-review.googlesource.com/628557
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-29 18:40:08 +00:00
Frank Barchard
f0a9d6d206 Gaussian reorder for benefit of A73
Roughly. instead of 4 loads and 8 multiples, use 1 load and 2 multiples
4 times over.  The original code, as with the C code from clang and gcc,
did all the loads, then all the math, then the store.  The new code
does a load, then the math, then the next load, etc.
This schedules better on current arm 64 cpus.
Number of registers also reduced, reusing the same registers.

HiSilicon ARM A73:

Now
TestGaussRow_Opt (890 ms)
TestGaussCol_Opt (571 ms)

Was
TestGaussRow_Opt (1061 ms)
TestGaussCol_Opt (595 ms)

Qualcomm 821 (Pixel):

Now
TestGaussRow_Opt (571 ms)
TestGaussCol_Opt (474 ms)

Was
TestGaussRow_Opt (751 ms)
TestGaussCol_Opt (520 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt

Reviewed-on: https://chromium-review.googlesource.com/627478
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Change-Id: I5ec81191d460801f0d4a89f0384f89925ff036de
Reviewed-on: https://chromium-review.googlesource.com/634448
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-08-25 19:00:05 +00:00
Frank Barchard
ad2409443c GaussRow_NEON from int to short
[ RUN      ] LibYUVPlanarTest.TestGaussRow_Opt
 [       OK ] LibYUVPlanarTest.TestGaussRow_Opt (601 ms)
 [ RUN      ] LibYUVPlanarTest.TestGaussCol_Opt
 [       OK ] LibYUVPlanarTest.TestGaussCol_Opt (522 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt

Change-Id: I1242b98672538e889f3ab48f215d6dabc7144ea7
Reviewed-on: https://chromium-review.googlesource.com/627478
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-24 01:09:23 +00:00
Frank Barchard
1cc539f7d6 GaussCol_NEON resample from short to int
Old NEON
LibYUVPlanarTest.TestGaussCol_Opt (916 ms)

New NEON
LibYUVPlanarTest.TestGaussCol_Opt (520 ms)

C vectorized
LibYUVPlanarTest.TestGaussCol_Opt (739 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussCol_Opt

Change-Id: I863b66f700f7a71fcb08a2eabb03240fdaf8a238
Reviewed-on: https://chromium-review.googlesource.com/626938
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-22 23:07:17 +00:00
Frank Barchard
c5bad809b1 Gauss unittest, Scale comments for neon64 half size updated
[ RUN      ] LibYUVPlanarTest.TestGaussRow_Opt
[       OK ] LibYUVPlanarTest.TestGaussRow_Opt (1274 ms)
[ RUN      ] LibYUVPlanarTest.TestGaussCol_Opt
[       OK ] LibYUVPlanarTest.TestGaussCol_Opt (916 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt
Change-Id: Id480f3870c40c2b40dfb9f072cb7118ebad41afc
Reviewed-on: https://chromium-review.googlesource.com/624701
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-21 23:41:46 +00:00
Frank Barchard
0c957d183e Gaussian blur NEON optimized
TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=TestGaussCol_NEON

Change-Id: I52cb6dbfd0cab4a30205c93b6a528ef49e9ab529
Reviewed-on: https://chromium-review.googlesource.com/621708
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-21 21:18:32 +00:00
Frank Barchard
8cd3e4f3f2 Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Ib139b9701fc67e24d27a6886377c0cb8b2773fda
Reviewed-on: https://chromium-review.googlesource.com/620791
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-18 17:23:27 +00:00
Frank Barchard
78e44628c6 Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Ie2342f841f1bb8469fc4631b784eddd804f5d53e
Reviewed-on: https://chromium-review.googlesource.com/616765
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-17 18:39:22 +00:00
Frank Barchard
bb17da97cf Test C vs NEON for ScaleDown2Box_16
TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowDown2Box_16

Change-Id: Ic74d29d6f14983ff26e8af541ef702a0f8bf3f17
Reviewed-on: https://chromium-review.googlesource.com/616189
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-16 22:18:48 +00:00
Frank Barchard
7e59ee4c75 Upsample 8x2 pixels to 16x1 with bilinear filtering
Downsample 16x2 to 8x1 with box filtering

[ RUN      ] LibYUVScaleTest.TestScaleRowUp2_16
[       OK ] LibYUVScaleTest.TestScaleRowUp2_16 (579 ms)
[ RUN      ] LibYUVScaleTest.TestScaleRowDown2Box_16
[       OK ] LibYUVScaleTest.TestScaleRowDown2Box_16 (329 ms)
[----------] 2 tests from LibYUVScaleTest (909 ms total)

TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowUp2_16 and LibYUVScaleTest.TestScaleRowDown2Box_16

Change-Id: I457d44123f2751e5f71bf3935401fff74b8e9db2
Reviewed-on: https://chromium-review.googlesource.com/608876
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-15 22:28:15 +00:00
Frank Barchard
56bbcdf422 Reintroduce the max version of scale
add ScaleMaxSamples_NEON function with max
done on original values.

TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleMaxSamples_Opt

Change-Id: Id99338860782b10ffd24f66242eb42014c2e229e
Reviewed-on: https://chromium-review.googlesource.com/614685
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-14 23:33:56 +00:00
Manojkumar Bhosale
dbd7c1a9c5 Add MSA optimized ARGBExtractAlpha, ARGBBlend, ARGBQuantize and ARGBColorMatrix row functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: I17bd3f87336f613ad363af7d7b9d7af49d725e56
Reviewed-on: https://chromium-review.googlesource.com/613100
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-14 17:38:31 +00:00
Frank Barchard
83ca1abe09 Change ScaleSumSamples to return Sum of Squares
TBR=kjellander@chromium.org
BUG=libyuv:717
TEST=LibYUVPlanarTest.TestScaleSumSamples_Opt

Change-Id: I5208666f3968c5c4b0f1b0c951f24216d78ee3fe
Reviewed-on: https://chromium-review.googlesource.com/607184
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-09 22:19:45 +00:00
Frank Barchard
8676ad7004 scale float samples and return max value
BUG=libyuv:717
TEST=ScaleSum unittest to compare C vs Arm implementation
TBR=kjellander@chromium.org

Change-Id: Iaa7af5547d979aad4722f868d31b405340115748
Reviewed-on: https://chromium-review.googlesource.com/600534
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-04 23:34:30 +00:00
Frank Barchard
27036e33e8 Revert "include <new> header for benefit of new clang builds"
This reverts commit 1dda4cb0b7bd564e646d6ec2efee497fcd7146ca.

Reason for revert: build error on jpeg FILE

Original change's description:
> include <new> header for benefit of new clang builds
> 
> TBR=kjellander@chromium.org
> BUG=libyuv:712
> TEST=local builds still work
> 
> Change-Id: I040e8edc40aafd820d2a29629fe7aec5c049bc6b
> Reviewed-on: https://chromium-review.googlesource.com/576971
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Commit-Queue: Frank Barchard <fbarchard@google.com>

TBR=kjellander@chromium.org,fbarchard@google.com

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: libyuv:712
Change-Id: I4cf4e26eadb476017dc95e6c9578092204f088a3
Reviewed-on: https://chromium-review.googlesource.com/601211
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-03 22:03:47 +00:00
Frank Barchard
6d083e2d12 clang 6 build disable some msa functions
R=kjellander@chromium.org

Bug: libyuv:715
Test: gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
Change-Id: Ia3943b0afc02e05a8bc32350719b296b0b9d5479
Reviewed-on: https://chromium-review.googlesource.com/592720
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-03 17:44:35 +00:00
Frank Barchard
1dda4cb0b7 include <new> header for benefit of new clang builds
TBR=kjellander@chromium.org
BUG=libyuv:712
TEST=local builds still work

Change-Id: I040e8edc40aafd820d2a29629fe7aec5c049bc6b
Reviewed-on: https://chromium-review.googlesource.com/576971
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-07-19 17:47:31 +00:00
Frank Barchard
6c94ad13b5 Remove ARM NaCL macros from source
NaCL has been disabled for awhile, so the code
will still build, but only with C versions.
This change removes the MEMACCESS() macros from
Neon and Neon64 source.

BUG=libyuv:702
TEST=try bots build for arm.
R=kjellander@chromium.org

Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19
Reviewed-on: https://chromium-review.googlesource.com/528332
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-09 22:22:07 +00:00
Frank Barchard
5f94a33e0c Lint fix for C casting for rotation code on arm
instead of casting int to int64, pass the int
and use %w modifier to use the word version of the register.

TBR=kjellander@chromium.org
BUG=libyuv:706
TEST=git cl lint
R=wangcheng@google.com

Change-Id: Iee5a70f04d928903ca8efac00066b8821a465e36
Reviewed-on: https://chromium-review.googlesource.com/528381
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-06-09 00:51:00 +00:00
Frank Barchard
d981495b42 Hamming Distance using 16 bit accumulators
Summing 16 bit hamming codes restricts the maximum length,
but saves an inner loop instruction.  The outer loop can sum the
values.

32 bit Neon
Now BenchmarkHammingDistance_Opt (78 ms)
Was BenchmarkHammingDistance_Opt (92 ms)

64 bit Neon
Now BenchmarkHammingDistance_Opt (85 ms)
Was BenchmarkHammingDistance_Opt (92 ms)

R=wangcheng@google.com
TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance

Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae
Reviewed-on: https://chromium-review.googlesource.com/526932
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-07 23:23:24 +00:00
Frank Barchard
790e0634a8 Port HammingDistance_NEON 32 bit code to 64 bit
The 32 bit version of HammingDistance_NEON accumulates
using vertical add and paired adds, which takes 3 instructions
instead of 4.
The instructions are also portable between 32 and 64 bit.

Was BenchmarkHammingDistance_Opt (105 ms)
Now BenchmarkHammingDistance_Opt (90 ms)

TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance

BenchmarkHammingDistance_Opt (90 ms)

Change-Id: If9e621e0bd2fe2492a1532056f8a1b451ba53d7e
Reviewed-on: https://chromium-review.googlesource.com/526365
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-07 01:04:35 +00:00
Frank Barchard
47d6eaa377 HammingDistance_NEON optimized looping
BenchmarkHammingDistance_Opt (93 ms)
BenchmarkHammingDistance_C (389 ms)

TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance

Change-Id: I4ba920751eb130cac6a276e441a7c309c495554a
Reviewed-on: https://chromium-review.googlesource.com/526401
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-07 00:09:59 +00:00
Frank Barchard
baf5248242 HammingDistance_NEON ported to 32 bit
TBR=kjellander@chromium.org
BUG=libyuv:701
TEST=BenchmarkHammingDistance

Change-Id: I252efd8a27aa11a0fe7d8030d7c8b57f20f04760
Reviewed-on: https://chromium-review.googlesource.com/525232
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-06 17:58:29 +00:00
Frank Barchard
44abf70187 ScaleDown odd functions adjust math so last pixel is half width source.
existing test passes
out/Release/libyuv_unittest --gtest_filter=*Blend* --libyuv_width=33 --libyuv_height=16

new test added
BUG=libyuv:705
TEST=LibYUVScaleTest.TestScaleOdd

Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704
Reviewed-on: https://chromium-review.googlesource.com/524932
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-06 01:37:26 +00:00
Frank Barchard
7bffe5e1c5 lint warning fixes for CpuID
The CpuId function is a wrapper for the intrinsic, or
implemented with inline if unavailable.  It had been
using uint32, but the intrinsics use int, so it was causing
casting and lint warnings.  This change makes the internal
implementation use int.

Casting was also done for xgetbv, and the cast is simply
removed, and is not causing a build error.

MipCpuCaps was doing strlen to check for white space after the
instruction set.  Arm also does this but with a hard coded offset.
This was causing a cast from size_t to int, which produced a lint
warning.  The change removes the white space detect.
In theory the code could be used to detect SSE vs SSE2, and it would
need to check SSE is followed by a space or end of line.  But this
code is only used on Arm and Mips, where there there is one form
of SIMD detected.  e.g. MSA for mips.  If a new instruction set is
added with a similar name, the write space check could be reintroduced.
But its more likely the code can be rewritten to use a better form
of detection by then. Or remove detection and require the instructions

BUG=libyuv:641
TEST=try bots build on all platforms without error and lint is clean

Change-Id: I9f55f8e57bba0f78571bdddbe63b945dea3e8809
Reviewed-on: https://chromium-review.googlesource.com/514524
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Wan-Teh Chang <wtc@chromium.org>
2017-05-25 22:00:17 +00:00
Frank Barchard
8edd2286fd MaskCpuFlags return cpuinfo so InitCpuFlags can call it
Reduce number of atomic references to cpu_info by making
InitCpuFlags call MaskCpuFlags and return the same value.

BUG=libyuv:641
TEST=libyuv_unittests pass

Change-Id: I5dfff8f7a10671bc8ef3ec0ed6f302791e752faa
Reviewed-on: https://chromium-review.googlesource.com/514145
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-24 22:27:03 +00:00
Frank Barchard
651ccc0c3a Fix data races in libyuv::TestCpuFlag().
Detect the compiler's support of C11 atomics, and use C11 atomics when
available.

Note that libyuv::MaskCpuFlags() is still not thread-safe.

BUG=libyuv:641
TEST= cpu_thread_test.cc adds a pthread based test
R=wangcheng@google.com

Change-Id: If05b1e16da833105a0159ed67ef20f4e61bc7abd
Reviewed-on: https://chromium-review.googlesource.com/510079
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-24 02:09:03 +00:00
Frank Barchard
77f6916da2 use __popcnt for visual c HammingDistance_X86
BUG=libyuv:701
TEST=HammingDistance unittest performance is comparable to x64
R=wangcheng@google.com

Change-Id: I8abe861e086e0162ba4c7ba6f1ef7d1c006cd9d4
Reviewed-on: https://chromium-review.googlesource.com/505454
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-05-12 22:59:00 +00:00
Frank Barchard
e0615c0e69 Optimize Hamming Distance C code to do 64 bits at a time.
BUG=libyuv:701
TEST=LibYUVBaseTest.BenchmarkHammingDistance_C
R=wangcheng@google.com

Change-Id: I243003b098bea8ef3809298bbec349ed52a43d8c
Reviewed-on: https://chromium-review.googlesource.com/499487
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-12 17:53:52 +00:00
Frank Barchard
bbbf30eecd Remove volatile from variables to improve performance
BUG=libyuv:703
TEST=compile and disassemble.  see registers used not stack.
R=wangcheng@google.com

Change-Id: Iaa07ee5d0c35252994491bb2868276e161149efd
Reviewed-on: https://chromium-review.googlesource.com/500427
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-09 18:14:21 +00:00
Frank Barchard
2136e349da Hamming code difference of 2 memory blocks
BUG=libyuv:701
TEST=built and disassembled for aarch64
R=kjellander@chromium.org

Change-Id: I7712b1c7934e5dfb55fda1fa7c8405c32d6964ce
Reviewed-on: https://chromium-review.googlesource.com/495327
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-05-08 21:37:51 +00:00
Frank Barchard
945ea1b746 mips switch sgtu to sltu for clang in ndk r14
The verion of clang in ndk r14 (3.9) has a built in llvm assembler
that does not have the sgtu pseudo instruction.
sltu is the actual instruction, so switch the 2 operands and use
the instruction instead of the pseudo op.

BUG=libyuv:700
TEST=try bots build mips without error.

Change-Id: I2d5f94f81acbd56cdedea011e7d9308979e19079
Reviewed-on: https://chromium-review.googlesource.com/494026
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-05-02 21:34:13 +00:00
Vignesh Venkatasubramanian
54289f1bb0 Fix mips build on android ndk r14+
Revert the workaround and fix it properly by passing the
additional necessary flag to the compiler.

BUG=libyuv:700

Change-Id: I1c893a8acb5079decbee6963b689424bf2f99f4f
Reviewed-on: https://chromium-review.googlesource.com/487881
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-04-26 08:39:55 +00:00
Frank Barchard
3b583396bf Disable CopyRow_MIPS
CopyRow_MIPS produces a compile error on some compilers.

TBR=kjellander@chromium.org
BUG=libyuv:700
TEST=try bots

Change-Id: Ie88f2006ef5cf14bffaf80fd4c0dd1caa409c569
Reviewed-on: https://chromium-review.googlesource.com/486127
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-04-25 01:31:13 +00:00
Frank Barchard
fc02cc3806 Add I422ToRGB565
BUG=libyuv:699
TESTED=LibYUVConvertTest.I420ToARGB_RGB565_Opt

Change-Id: I87943bcad056fbbe051301f45c7dc0ae0620c837
Reviewed-on: https://chromium-review.googlesource.com/478578
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-04-17 17:51:17 +00:00
Frank Barchard
8cab2e31d7 I422ToRGB565 fix for odd widths
I422ToRGB565Row_Any_AVX2 uses 2 step row conversion that calls
I422ToARGBRow_AVX2 and then ARGBToRGB565.
I422ToARGBRow_AVX2 expects multiple of 16 pixels.
Adjust the I422ToRGB565Row_Any_AVX2 to do multiple of 16 with AVX2
and then remainder in a buffer.

Bug: libyuv: 657
Test: out/Release/libyuv_unittest --gtest_filter=*Convert*I*To* --libyuv_width=1280 --libyuv_height=720
Change-Id: Ice1cb6c7ff6b2295513e8b4a9f77522e1c659810
Reviewed-on: https://chromium-review.googlesource.com/474232
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-04-11 17:24:05 +00:00
Frank Barchard
2adb84e39e make gflags command line parser optional
BUG=libyuv:691
TEST=gn gen out/Release "--args=is_debug=false target_cpu=\"x64\" libyuv_include_tests=true"

Change-Id: Ib481189be884c34d9bbc30bfcf71c7969c6f4dae
Reviewed-on: https://chromium-review.googlesource.com/452736
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-14 01:52:52 +00:00
Frank Barchard
d59d3fcd18 Change parameter for '_Any' functions to param to avoid misnomer
BUG=None
TEST=None

Change-Id: I6940fc4753783afd25f83868635381bf801c65f5
Reviewed-on: https://chromium-review.googlesource.com/452962
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-10 23:32:39 +00:00
Frank Barchard
e6fec061cf lint cleanup for convert RGB24ToI420
RGB24, RAW, RGB565, ARGB1555 and ARGB4444 have conditional
2 pass versus direct path.  2 pass method requires a buffer that
is conditionally allocated.  ifdef's were confusing lint.
simplifed ifdefs to clean up lint warning

BUG=libyuv:692
TEST=lint source/convert.cc

Change-Id: If868718af30b48824a5e3d28f0d7d01d4609ad55
Reviewed-on: https://chromium-review.googlesource.com/451552
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-09 10:32:23 +00:00
Frank Barchard
73a603e120 clang-format 5.0 applied to libyuv
BUG=None
TEST=try bots and lint test

Change-Id: I1ab462adf2d309117862c5eb4b244a61ae202951
Reviewed-on: https://chromium-review.googlesource.com/450658
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-03-08 18:50:12 +00:00
Frank Barchard
136aa9d37c any11p fix for buffer overrun
BUG=libyuv:686
TESTED=untested

Change-Id: Idfae93349dd78b1b633a596631e5397e11b77d0b
Reviewed-on: https://chromium-review.googlesource.com/448320
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-03 19:57:35 +00:00