libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2026-02-07 02:09:50 +08:00

Author	SHA1	Message	Date
Frank Barchard	de8e9ae10b	HammingDistance_SSE42 register optimized to avoid push Bug: libyuv:701 Test: objdump to confirm code gen Change-Id: Ibdcb2cc6bc9bf14b4ccb874c49fc9ff664650e1a Reviewed-on: https://chromium-review.googlesource.com/745390 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2017-10-31 21:12:32 +00:00
Frank Barchard	ffc4811863	clang-format fixes Bug: None Test: lint passes Change-Id: I1fd40d3506bab1f4f9100902f633a9c9e7b96337 Reviewed-on: https://chromium-review.googlesource.com/745038 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-10-31 02:25:01 +00:00
Frank Barchard	80077a80c2	HammingDistance_X86 using popcnt assembly popcnt has a fake dependency on the destination. This assembly avoids the dependency by using a different register for each popcnt. Bug: libyuv:701 Test: LIBYUV_DISABLE_SSSE3=1 out/Release/libyuv_unittest --gtest_filter=HamOpt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --libyuv_cpu_info=-1 Change-Id: Ie1d202e2613b7fa8a3c02acd433940e92c80eafa Reviewed-on: https://chromium-review.googlesource.com/731826 Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-10-23 21:15:12 +00:00
Frank Barchard	8fa02df3c0	mingw fix ifdefs to use gcc source mingw gcc sets the macro _M_IX86 which is normally only set by Visual C and clangcl which are Visual C style source code style for assembly, but gcc is not Visual C compatible. Add _MSC_VER to most ifdefs to detect that its really Visual C or clangcl and not mingw gcc so the gcc source code will be used. Bug: libyuv:744 Test: CXXFLAGS=-m32 CXX=~/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-g++ make -f linux.mk Change-Id: I3431aa486eb769b145faa8d5eb75ed639f9d6f5e Reviewed-on: https://chromium-review.googlesource.com/722319 Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-10-17 17:36:35 +00:00
Frank Barchard	e23b27d040	Reduce HammingDistance block size to 32k to avoid overflow Bug: libyuv:701 Test: HammingDistance unittest with large size Change-Id: Id41a2c27eb8922d03b3a21dab32fa2e7b015ba38 Reviewed-on: https://chromium-review.googlesource.com/708335 Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-10-10 18:42:47 +00:00
Frank Barchard	60f433fbd9	Revert "ComputeHammingDistance reduce SIMD loop to 1 call when possible." This reverts commit ec75df5894845b8d6b1341885a78db1de83decd8. Reason for revert: <INSERT REASONING HERE> Original change's description: > ComputeHammingDistance reduce SIMD loop to 1 call when possible. > > 32 bit x86 has high overhead due to -fpic. So this reduces the > number of calls by 1. > > TBR=kjellander@chromium.org > Bug: libyuv:701 > Test: BenchmarkHammingDistance > Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051 > Reviewed-on: https://chromium-review.googlesource.com/701755 > Reviewed-by: Frank Barchard <fbarchard@google.com> > Reviewed-by: Cheng Wang <wangcheng@google.com> TBR=rrwinterton@gmail.com,fbarchard@google.com,wangcheng@google.com Change-Id: Ia61e8558a8f083c14be5f51e0e141550b6f2b5c1 No-Presubmit: true No-Tree-Checks: true No-Try: true Bug: libyuv:701 Reviewed-on: https://chromium-review.googlesource.com/707823 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-10-10 01:16:15 +00:00
Frank Barchard	ec75df5894	ComputeHammingDistance reduce SIMD loop to 1 call when possible. 32 bit x86 has high overhead due to -fpic. So this reduces the number of calls by 1. TBR=kjellander@chromium.org Bug: libyuv:701 Test: BenchmarkHammingDistance Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051 Reviewed-on: https://chromium-review.googlesource.com/701755 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-10-09 22:51:23 +00:00
Frank Barchard	1734712a6f	Fix odd length HammingDistance If length of HammingDistance was not a multiple of 4, the result was incorrect. The old tests did not catch this so a new test is done to count 1s. Bug: libyuv:740 Test: LibYUVCompareTest.TestHammingDistance Change-Id: I93db5437821c597f1f162ac263d4a594bb83231f Reviewed-on: https://chromium-review.googlesource.com/699614 Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-10-04 22:21:36 +00:00
Frank Barchard	fecd741794	Port HammingDistance to SSSE3 Bug: libyuv:701 Test: BenchmarkHammingDistance_Opt Change-Id: Ibdd5d382677ebef4f82a62e0d5c3b88614a3b6e4 Reviewed-on: https://chromium-review.googlesource.com/696290 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-10-03 19:11:05 +00:00
Frank Barchard	bde789b176	Hamming Distance SSE2 and AVX2 optimized Bug: None Test: None Change-Id: Id52663f9c957aac3172fba92d888ad1b041d5cf0 Reviewed-on: https://chromium-review.googlesource.com/692981 Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-10-02 22:32:54 +00:00
Frank Barchard	311add63c2	CopyRow_NEON use ldp instead of ld1 for better performance. Under cache thrashing circumstances, ldp/stp perform better than ld1/st1 on QC820/QC821 CPUs. Same performance when hitting cache. Bug: libyuv:738 Test: LibYUVPlanarTest.TestCopySamples_Opt (445 ms) Change-Id: Ib6a0a5d5e6a1b7ef667b9bb2edb39d681cf3614c Reviewed-on: https://chromium-review.googlesource.com/691281 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-09-29 01:52:29 +00:00
Frank Barchard	efbf15754a	Step thru full color test by increments of 5 for better test speed. Full color test is the slowest of the unittests, and not catching any additional bugs at the moment. Step thru range of 0 to 255 in steps of 5 to speed up the test. 255 is 3 * 5 * 17, so any of those primes would hit 0 and 255 exactly. Was LibYUVColorTest.TestFullYUV (896 ms) Now LibYUVColorTest.TestFullYUV (212 ms) TBR=kjellander@chromium.org Bug: libyuv:736 Test: LibYUVColorTest.TestFullYUV Change-Id: I5b55fb07ada0dc7bdc3c3c20569d36bf09bb3804 Reviewed-on: https://chromium-review.googlesource.com/672064 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-09-19 02:01:53 +00:00
Frank Barchard	00c501fe43	Cast xgetbv from int64 to int to avoid Visual C warning. TBR=kjellander@chromium.org Bug: libyuv:735 Test: try bots Change-Id: I00dc06689cd0a23847865c0c8edeb538b0cc81ac Reviewed-on: https://chromium-review.googlesource.com/669142 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-09-15 22:00:52 +00:00
Frank Barchard	0a3d23c898	fix clang-format-ing for row arm functions TBR=kjellander@chromium.org BUG=None TEST=git cl lint Change-Id: I45ecd7f8279981ba037dc051f521f6b6d5506f64 Reviewed-on: https://chromium-review.googlesource.com/664345 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-09-14 21:35:06 +00:00
Frank Barchard	753a91cbcb	fix fmov build error on gcc 4.7 for neon64 TBR=kjellander@chromium.org BUG=libyuv:732 TEST=LibYUVPlanarTest.TestScaleSumSamples_Opt Change-Id: If80e9510ad5668b080b9384e656c0bd73cf5b4a6 Reviewed-on: https://chromium-review.googlesource.com/663764 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-09-12 22:46:33 +00:00
Frank Barchard	1e16cb5c38	SplitRGBPlane and MergeRGBPlane functions added Converts packed RGB to planar and back. TBR=kjellander@chromium.org BUG=libyuv:728 TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc Reviewed-on: https://chromium-review.googlesource.com/658069 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-09-11 21:02:04 +00:00
Frank Barchard	8f5e9cd9eb	ScaleRowUp2_16_C port of NEON to C Single pass upsample with bilinear filter. NEON version optimized - Pixel Sailfish QC821 Was TestScaleRowUp2_16 (5741 ms) Now TestScaleRowUp2_16 (4484 ms) C TestScaleRowUp2_16 (6555 ms) TBR=kjellander@chromium.org BUG=libyuv:718 TEST=LibYUVScaleTest.TestScaleRowUp2_16 (709 ms) Change-Id: Ib04ceb53e0ab644a392c39c3396e313530161d92 Reviewed-on: https://chromium-review.googlesource.com/646701 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-09-05 21:40:39 +00:00
Manojkumar Bhosale	2621c91bf1	Add MSA optimized HammingDistance and SumSquareError functions TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: Id0126ba5aff38817525b1efa6044f1dc2cfa1a36 Reviewed-on: https://chromium-review.googlesource.com/625739 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-09-05 21:32:33 +00:00
Frank Barchard	0acc67712f	clang format / lint cleanup for arm scale functions TBR=kjellander@chromium.org BUG=libyuv:725 TEST=lint Change-Id: I76f777427f9b1458faba12796fb0011d8e3228d5 Reviewed-on: https://chromium-review.googlesource.com/646586 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-31 22:41:08 +00:00
Frank Barchard	a826dd7112	ARGBScaleDown by 2 with nearest neighbor optimized TBR=kjellander@chromium.org BUG=libyuv:723 TEST=ScaleDownBy2_None Change-Id: I6861e62d3a67dde916b87fdc46eb02f2b4ee9f17 Reviewed-on: https://chromium-review.googlesource.com/644149 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-30 23:22:14 +00:00
Frank Barchard	1c85f98846	Scale down by 2 linear use 'half add' to average pixels. Use ld2 to load even and odd pixels into different registers and hadd to half add them to each other. Previously used paired and shift. TBR=kjellander@chromium.org BUG=libyuv:723 TEST=ScaleDownBy2_Linear Change-Id: I3ec72bcf7d4c746837217496c301eb4e4ad963cf Reviewed-on: https://chromium-review.googlesource.com/644113 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-30 22:10:32 +00:00
Frank Barchard	e200738d82	Scale Down by 2 use ld2 and urhadd urhadd is a rounded average. Linear filter wants to average horizontally, so use ld2 to separate even and odd pixels. TBR=jkellander@chromium.org BUG=None TEST=LibYUVScaleTest.ScaleDownBy2 Change-Id: Id667288a030e72ce8e1c1d6719b69c555c0db063 Reviewed-on: https://chromium-review.googlesource.com/642448 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-30 01:18:11 +00:00
Manojkumar Bhosale	b6e8e9aa97	Add MSA optimized HalfFloatRow function TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: I54a2c57d66093b887c8ba31fd7a21a102165393a Reviewed-on: https://chromium-review.googlesource.com/628557 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-29 18:40:08 +00:00
Frank Barchard	f0a9d6d206	Gaussian reorder for benefit of A73 Roughly. instead of 4 loads and 8 multiples, use 1 load and 2 multiples 4 times over. The original code, as with the C code from clang and gcc, did all the loads, then all the math, then the store. The new code does a load, then the math, then the next load, etc. This schedules better on current arm 64 cpus. Number of registers also reduced, reusing the same registers. HiSilicon ARM A73: Now TestGaussRow_Opt (890 ms) TestGaussCol_Opt (571 ms) Was TestGaussRow_Opt (1061 ms) TestGaussCol_Opt (595 ms) Qualcomm 821 (Pixel): Now TestGaussRow_Opt (571 ms) TestGaussCol_Opt (474 ms) Was TestGaussRow_Opt (751 ms) TestGaussCol_Opt (520 ms) TBR=kjellander@chromium.org BUG=libyuv:719 TEST=LibYUVPlanarTest.TestGaussRow_Opt Reviewed-on: https://chromium-review.googlesource.com/627478 Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com> Change-Id: I5ec81191d460801f0d4a89f0384f89925ff036de Reviewed-on: https://chromium-review.googlesource.com/634448 Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-08-25 19:00:05 +00:00
Frank Barchard	ad2409443c	GaussRow_NEON from int to short [ RUN ] LibYUVPlanarTest.TestGaussRow_Opt [ OK ] LibYUVPlanarTest.TestGaussRow_Opt (601 ms) [ RUN ] LibYUVPlanarTest.TestGaussCol_Opt [ OK ] LibYUVPlanarTest.TestGaussCol_Opt (522 ms) TBR=kjellander@chromium.org BUG=libyuv:719 TEST=LibYUVPlanarTest.TestGaussRow_Opt Change-Id: I1242b98672538e889f3ab48f215d6dabc7144ea7 Reviewed-on: https://chromium-review.googlesource.com/627478 Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-24 01:09:23 +00:00
Frank Barchard	1cc539f7d6	GaussCol_NEON resample from short to int Old NEON LibYUVPlanarTest.TestGaussCol_Opt (916 ms) New NEON LibYUVPlanarTest.TestGaussCol_Opt (520 ms) C vectorized LibYUVPlanarTest.TestGaussCol_Opt (739 ms) TBR=kjellander@chromium.org BUG=libyuv:719 TEST=LibYUVPlanarTest.TestGaussCol_Opt Change-Id: I863b66f700f7a71fcb08a2eabb03240fdaf8a238 Reviewed-on: https://chromium-review.googlesource.com/626938 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-22 23:07:17 +00:00
Frank Barchard	c5bad809b1	Gauss unittest, Scale comments for neon64 half size updated [ RUN ] LibYUVPlanarTest.TestGaussRow_Opt [ OK ] LibYUVPlanarTest.TestGaussRow_Opt (1274 ms) [ RUN ] LibYUVPlanarTest.TestGaussCol_Opt [ OK ] LibYUVPlanarTest.TestGaussCol_Opt (916 ms) TBR=kjellander@chromium.org BUG=libyuv:719 TEST=LibYUVPlanarTest.TestGaussRow_Opt Change-Id: Id480f3870c40c2b40dfb9f072cb7118ebad41afc Reviewed-on: https://chromium-review.googlesource.com/624701 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-21 23:41:46 +00:00
Frank Barchard	0c957d183e	Gaussian blur NEON optimized TBR=kjellander@chromium.org BUG=libyuv:719 TEST=TestGaussCol_NEON Change-Id: I52cb6dbfd0cab4a30205c93b6a528ef49e9ab529 Reviewed-on: https://chromium-review.googlesource.com/621708 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-21 21:18:32 +00:00
Frank Barchard	8cd3e4f3f2	Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: Ib139b9701fc67e24d27a6886377c0cb8b2773fda Reviewed-on: https://chromium-review.googlesource.com/620791 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-18 17:23:27 +00:00
Frank Barchard	78e44628c6	Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions. TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: Ie2342f841f1bb8469fc4631b784eddd804f5d53e Reviewed-on: https://chromium-review.googlesource.com/616765 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-17 18:39:22 +00:00
Frank Barchard	bb17da97cf	Test C vs NEON for ScaleDown2Box_16 TBR=kjellander@chromium.org BUG=libyuv:718 TEST=LibYUVScaleTest.TestScaleRowDown2Box_16 Change-Id: Ic74d29d6f14983ff26e8af541ef702a0f8bf3f17 Reviewed-on: https://chromium-review.googlesource.com/616189 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-16 22:18:48 +00:00
Frank Barchard	7e59ee4c75	Upsample 8x2 pixels to 16x1 with bilinear filtering Downsample 16x2 to 8x1 with box filtering [ RUN ] LibYUVScaleTest.TestScaleRowUp2_16 [ OK ] LibYUVScaleTest.TestScaleRowUp2_16 (579 ms) [ RUN ] LibYUVScaleTest.TestScaleRowDown2Box_16 [ OK ] LibYUVScaleTest.TestScaleRowDown2Box_16 (329 ms) [----------] 2 tests from LibYUVScaleTest (909 ms total) TBR=kjellander@chromium.org BUG=libyuv:718 TEST=LibYUVScaleTest.TestScaleRowUp2_16 and LibYUVScaleTest.TestScaleRowDown2Box_16 Change-Id: I457d44123f2751e5f71bf3935401fff74b8e9db2 Reviewed-on: https://chromium-review.googlesource.com/608876 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-15 22:28:15 +00:00
Frank Barchard	56bbcdf422	Reintroduce the max version of scale add ScaleMaxSamples_NEON function with max done on original values. TBR=kjellander@chromium.org BUG=libyuv:717 TEST=LibYUVPlanarTest.TestScaleMaxSamples_Opt Change-Id: Id99338860782b10ffd24f66242eb42014c2e229e Reviewed-on: https://chromium-review.googlesource.com/614685 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-14 23:33:56 +00:00
Manojkumar Bhosale	dbd7c1a9c5	Add MSA optimized ARGBExtractAlpha, ARGBBlend, ARGBQuantize and ARGBColorMatrix row functions TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: I17bd3f87336f613ad363af7d7b9d7af49d725e56 Reviewed-on: https://chromium-review.googlesource.com/613100 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-14 17:38:31 +00:00
Frank Barchard	83ca1abe09	Change ScaleSumSamples to return Sum of Squares TBR=kjellander@chromium.org BUG=libyuv:717 TEST=LibYUVPlanarTest.TestScaleSumSamples_Opt Change-Id: I5208666f3968c5c4b0f1b0c951f24216d78ee3fe Reviewed-on: https://chromium-review.googlesource.com/607184 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-09 22:19:45 +00:00
Frank Barchard	8676ad7004	scale float samples and return max value BUG=libyuv:717 TEST=ScaleSum unittest to compare C vs Arm implementation TBR=kjellander@chromium.org Change-Id: Iaa7af5547d979aad4722f868d31b405340115748 Reviewed-on: https://chromium-review.googlesource.com/600534 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-04 23:34:30 +00:00
Frank Barchard	27036e33e8	Revert "include <new> header for benefit of new clang builds" This reverts commit 1dda4cb0b7bd564e646d6ec2efee497fcd7146ca. Reason for revert: build error on jpeg FILE Original change's description: > include <new> header for benefit of new clang builds > > TBR=kjellander@chromium.org > BUG=libyuv:712 > TEST=local builds still work > > Change-Id: I040e8edc40aafd820d2a29629fe7aec5c049bc6b > Reviewed-on: https://chromium-review.googlesource.com/576971 > Reviewed-by: Frank Barchard <fbarchard@google.com> > Commit-Queue: Frank Barchard <fbarchard@google.com> TBR=kjellander@chromium.org,fbarchard@google.com # Not skipping CQ checks because original CL landed > 1 day ago. Bug: libyuv:712 Change-Id: I4cf4e26eadb476017dc95e6c9578092204f088a3 Reviewed-on: https://chromium-review.googlesource.com/601211 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-03 22:03:47 +00:00
Frank Barchard	6d083e2d12	clang 6 build disable some msa functions R=kjellander@chromium.org Bug: libyuv:715 Test: gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true" Change-Id: Ia3943b0afc02e05a8bc32350719b296b0b9d5479 Reviewed-on: https://chromium-review.googlesource.com/592720 Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-08-03 17:44:35 +00:00
Frank Barchard	1dda4cb0b7	include <new> header for benefit of new clang builds TBR=kjellander@chromium.org BUG=libyuv:712 TEST=local builds still work Change-Id: I040e8edc40aafd820d2a29629fe7aec5c049bc6b Reviewed-on: https://chromium-review.googlesource.com/576971 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-07-19 17:47:31 +00:00
Frank Barchard	6c94ad13b5	Remove ARM NaCL macros from source NaCL has been disabled for awhile, so the code will still build, but only with C versions. This change removes the MEMACCESS() macros from Neon and Neon64 source. BUG=libyuv:702 TEST=try bots build for arm. R=kjellander@chromium.org Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19 Reviewed-on: https://chromium-review.googlesource.com/528332 Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-09 22:22:07 +00:00
Frank Barchard	5f94a33e0c	Lint fix for C casting for rotation code on arm instead of casting int to int64, pass the int and use %w modifier to use the word version of the register. TBR=kjellander@chromium.org BUG=libyuv:706 TEST=git cl lint R=wangcheng@google.com Change-Id: Iee5a70f04d928903ca8efac00066b8821a465e36 Reviewed-on: https://chromium-review.googlesource.com/528381 Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-06-09 00:51:00 +00:00
Frank Barchard	d981495b42	Hamming Distance using 16 bit accumulators Summing 16 bit hamming codes restricts the maximum length, but saves an inner loop instruction. The outer loop can sum the values. 32 bit Neon Now BenchmarkHammingDistance_Opt (78 ms) Was BenchmarkHammingDistance_Opt (92 ms) 64 bit Neon Now BenchmarkHammingDistance_Opt (85 ms) Was BenchmarkHammingDistance_Opt (92 ms) R=wangcheng@google.com TBR=kjellander@chromium.org BUG=libyuv:701 TEST=BenchmarkHammingDistance Change-Id: Ie40f0eac2f3339c33b833b42af5d394b122066ae Reviewed-on: https://chromium-review.googlesource.com/526932 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-07 23:23:24 +00:00
Frank Barchard	790e0634a8	Port HammingDistance_NEON 32 bit code to 64 bit The 32 bit version of HammingDistance_NEON accumulates using vertical add and paired adds, which takes 3 instructions instead of 4. The instructions are also portable between 32 and 64 bit. Was BenchmarkHammingDistance_Opt (105 ms) Now BenchmarkHammingDistance_Opt (90 ms) TBR=kjellander@chromium.org BUG=libyuv:701 TEST=BenchmarkHammingDistance BenchmarkHammingDistance_Opt (90 ms) Change-Id: If9e621e0bd2fe2492a1532056f8a1b451ba53d7e Reviewed-on: https://chromium-review.googlesource.com/526365 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-07 01:04:35 +00:00
Frank Barchard	47d6eaa377	HammingDistance_NEON optimized looping BenchmarkHammingDistance_Opt (93 ms) BenchmarkHammingDistance_C (389 ms) TBR=kjellander@chromium.org BUG=libyuv:701 TEST=BenchmarkHammingDistance Change-Id: I4ba920751eb130cac6a276e441a7c309c495554a Reviewed-on: https://chromium-review.googlesource.com/526401 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-07 00:09:59 +00:00
Frank Barchard	baf5248242	HammingDistance_NEON ported to 32 bit TBR=kjellander@chromium.org BUG=libyuv:701 TEST=BenchmarkHammingDistance Change-Id: I252efd8a27aa11a0fe7d8030d7c8b57f20f04760 Reviewed-on: https://chromium-review.googlesource.com/525232 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-06 17:58:29 +00:00
Frank Barchard	44abf70187	ScaleDown odd functions adjust math so last pixel is half width source. existing test passes out/Release/libyuv_unittest --gtest_filter=Blend --libyuv_width=33 --libyuv_height=16 new test added BUG=libyuv:705 TEST=LibYUVScaleTest.TestScaleOdd Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704 Reviewed-on: https://chromium-review.googlesource.com/524932 Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-06 01:37:26 +00:00
Frank Barchard	7bffe5e1c5	lint warning fixes for CpuID The CpuId function is a wrapper for the intrinsic, or implemented with inline if unavailable. It had been using uint32, but the intrinsics use int, so it was causing casting and lint warnings. This change makes the internal implementation use int. Casting was also done for xgetbv, and the cast is simply removed, and is not causing a build error. MipCpuCaps was doing strlen to check for white space after the instruction set. Arm also does this but with a hard coded offset. This was causing a cast from size_t to int, which produced a lint warning. The change removes the white space detect. In theory the code could be used to detect SSE vs SSE2, and it would need to check SSE is followed by a space or end of line. But this code is only used on Arm and Mips, where there there is one form of SIMD detected. e.g. MSA for mips. If a new instruction set is added with a similar name, the write space check could be reintroduced. But its more likely the code can be rewritten to use a better form of detection by then. Or remove detection and require the instructions BUG=libyuv:641 TEST=try bots build on all platforms without error and lint is clean Change-Id: I9f55f8e57bba0f78571bdddbe63b945dea3e8809 Reviewed-on: https://chromium-review.googlesource.com/514524 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: Wan-Teh Chang <wtc@chromium.org>	2017-05-25 22:00:17 +00:00
Frank Barchard	8edd2286fd	MaskCpuFlags return cpuinfo so InitCpuFlags can call it Reduce number of atomic references to cpu_info by making InitCpuFlags call MaskCpuFlags and return the same value. BUG=libyuv:641 TEST=libyuv_unittests pass Change-Id: I5dfff8f7a10671bc8ef3ec0ed6f302791e752faa Reviewed-on: https://chromium-review.googlesource.com/514145 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-05-24 22:27:03 +00:00
Frank Barchard	651ccc0c3a	Fix data races in libyuv::TestCpuFlag(). Detect the compiler's support of C11 atomics, and use C11 atomics when available. Note that libyuv::MaskCpuFlags() is still not thread-safe. BUG=libyuv:641 TEST= cpu_thread_test.cc adds a pthread based test R=wangcheng@google.com Change-Id: If05b1e16da833105a0159ed67ef20f4e61bc7abd Reviewed-on: https://chromium-review.googlesource.com/510079 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-05-24 02:09:03 +00:00
Frank Barchard	77f6916da2	use __popcnt for visual c HammingDistance_X86 BUG=libyuv:701 TEST=HammingDistance unittest performance is comparable to x64 R=wangcheng@google.com Change-Id: I8abe861e086e0162ba4c7ba6f1ef7d1c006cd9d4 Reviewed-on: https://chromium-review.googlesource.com/505454 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-05-12 22:59:00 +00:00

1 2 3 4 5 ...

1379 Commits