libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2025-12-06 16:56:55 +08:00

Author	SHA1	Message	Date
Frank Barchard	6c94ad13b5	Remove ARM NaCL macros from source NaCL has been disabled for awhile, so the code will still build, but only with C versions. This change removes the MEMACCESS() macros from Neon and Neon64 source. BUG=libyuv:702 TEST=try bots build for arm. R=kjellander@chromium.org Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19 Reviewed-on: https://chromium-review.googlesource.com/528332 Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-09 22:22:07 +00:00
Frank Barchard	44abf70187	ScaleDown odd functions adjust math so last pixel is half width source. existing test passes out/Release/libyuv_unittest --gtest_filter=Blend --libyuv_width=33 --libyuv_height=16 new test added BUG=libyuv:705 TEST=LibYUVScaleTest.TestScaleOdd Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704 Reviewed-on: https://chromium-review.googlesource.com/524932 Reviewed-by: Cheng Wang <wangcheng@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-06-06 01:37:26 +00:00
Frank Barchard	000d2fa91a	Libyuv MIPS DSPR2 optimizations. Optimized functions: I444ToARGBRow_DSPR2 I422ToARGB4444Row_DSPR2 I422ToARGB1555Row_DSPR2 NV12ToARGBRow_DSPR2 BGRAToUVRow_DSPR2 BGRAToYRow_DSPR2 ABGRToUVRow_DSPR2 ARGBToYRow_DSPR2 ABGRToYRow_DSPR2 RGBAToUVRow_DSPR2 RGBAToYRow_DSPR2 ARGBToUVRow_DSPR2 RGB24ToARGBRow_DSPR2 RAWToARGBRow_DSPR2 RGB565ToARGBRow_DSPR2 ARGB1555ToARGBRow_DSPR2 ARGB4444ToARGBRow_DSPR2 ScaleAddRow_DSPR2 Bug-fixes in functions: ScaleRowDown2_DSPR2 ScaleRowDown4_DSPR2 BUG= Review-Url: https://codereview.chromium.org/2626123003 .	2017-01-11 12:19:13 -08:00
Manojkumar Bhosale	288bfbefb5	Add MSA optimized remaining scale row functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) ScaleRowDown2_MSA - ~22.3x ScaleRowDown2_Any_MSA - ~19.9x ScaleRowDown2Linear_MSA - ~31.2x ScaleRowDown2Linear_Any_MSA - ~29.4x ScaleRowDown2Box_MSA - ~20.1x ScaleRowDown2Box_Any_MSA - ~19.6x ScaleRowDown4_MSA - ~11.7x ScaleRowDown4_Any_MSA - ~11.2x ScaleRowDown4Box_MSA - ~15.1x ScaleRowDown4Box_Any_MSA - ~15.1x ScaleRowDown38_MSA - ~1x ScaleRowDown38_Any_MSA - ~1x ScaleRowDown38_2_Box_MSA - ~1.7x ScaleRowDown38_2_Box_Any_MSA - ~1.7x ScaleRowDown38_3_Box_MSA - ~1.7x ScaleRowDown38_3_Box_Any_MSA - ~1.7x ScaleAddRow_MSA - ~1.2x ScaleAddRow_Any_MSA - ~1.15x Performance Gain (vs C non-vectorized) ScaleRowDown2_MSA - ~22.4x ScaleRowDown2_Any_MSA - ~19.8x ScaleRowDown2Linear_MSA - ~31.6x ScaleRowDown2Linear_Any_MSA - ~29.4x ScaleRowDown2Box_MSA - ~20.1x ScaleRowDown2Box_Any_MSA - ~19.6x ScaleRowDown4_MSA - ~11.7x ScaleRowDown4_Any_MSA - ~11.2x ScaleRowDown4Box_MSA - ~15.1x ScaleRowDown4Box_Any_MSA - ~15.1x ScaleRowDown38_MSA - ~3.2x ScaleRowDown38_Any_MSA - ~3.2x ScaleRowDown38_2_Box_MSA - ~2.4x ScaleRowDown38_2_Box_Any_MSA - ~2.3x ScaleRowDown38_3_Box_MSA - ~2.9x ScaleRowDown38_3_Box_Any_MSA - ~2.8x ScaleAddRow_MSA - ~8x ScaleAddRow_Any_MSA - ~7.46x Review-Url: https://codereview.chromium.org/2559683002 .	2016-12-21 13:39:44 +05:30
Manojkumar Bhosale	56b5bbb0be	Add MSA optimized ARGB scaling functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) ScaleARGBRowDown2_MSA - ~2.6x ScaleARGBRowDown2Linear_MSA - ~7.9x ScaleARGBRowDown2Box_MSA - ~3.7x ScaleARGBRowDownEven_MSA - ~1.2x ScaleARGBRowDownEvenBox_MSA - ~3.5x ScaleARGBRowDown2_Any_MSA - ~2.6x ScaleARGBRowDown2Linear_Any_MSA - ~7.9x ScaleARGBRowDown2Box_Any_MSA - ~3.6x ScaleARGBRowDownEven_Any_MSA - ~1.2x ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x Performance Gain (vs C non-vectorized) ScaleARGBRowDown2_MSA - 2.6x ScaleARGBRowDown2Linear_MSA - 13.5x ScaleARGBRowDown2Box_MSA - 5.8x ScaleARGBRowDownEven_MSA - 1.2x ScaleARGBRowDownEvenBox_MSA - 3.7x ScaleARGBRowDown2_Any_MSA - 2.6x ScaleARGBRowDown2Linear_Any_MSA - 13.5x ScaleARGBRowDown2Box_Any_MSA - 5.3x ScaleARGBRowDownEven_Any_MSA - 1.2x ScaleARGBRowDownEvenBox_Any_MSA - 3.7x Review URL: https://codereview.chromium.org/2527983002 .	2016-12-07 11:47:15 +05:30
Frank Barchard	e62309f259	clang-format libyuv BUG=libyuv:654 R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2469353005 .	2016-11-07 17:37:23 -08:00
Frank Barchard	fc52d8ded2	Odd width variation of scale down by 2 for subsampling R=dhrosa@google.com, harryjin@google.com BUG=libyuv:538 Review URL: https://codereview.chromium.org/1558093003 .	2016-01-06 15:12:17 -08:00
Frank Barchard	80ca4514ef	change scale down by 4 to use rounding. TBR=harryjin@google.com BUG=libyuv:447 Review URL: https://codereview.chromium.org/1525033005 .	2015-12-15 21:25:18 -08:00
Frank Barchard	ae55e41851	use rounding in scaledown by 2 When scaling down by 2 the formula should round consistently. (a+b+c+d+2)/4 The C version did but the SSE2 version was doing 2 averages. avg(avg(a,b),avg(c,d)) This change uses a sum, then rounds. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:447,libyuv:527 Review URL: https://codereview.chromium.org/1513183004 .	2015-12-14 17:25:36 -08:00
fbarchard@google.com	05416e2d9a	Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows. BUG=425 TESTED=out\release\libyuv_unittest --gtest_filter=ScaleTo1x1 R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/52659004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1428 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-06-09 01:05:18 +00:00
fbarchard@google.com	7c09264ffc	odd width support for scale by even scale factor and box scale down by 4. scale down by 4 uses scale down by 2 internally. BUG=431 TESTED=libyuvTest.ARGBScaleDownBy4_Bilinear Review URL: https://webrtc-codereview.appspot.com/57399004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1412 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-05-26 17:56:51 +00:00
fbarchard@google.com	c38aeec322	scale down by 2 on argb images support odd widths using _any function. BUG=431 TESTED=libyuvTest.ARGBScaleDownBy2_Bilinear Review URL: https://webrtc-codereview.appspot.com/52569004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1410 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-05-22 21:39:21 +00:00
fbarchard@google.com	31806d76d9	scale to 3/4 bug fix for odd widths. multiply to index into source by scale factor should be 4 / 3 not 3 / 4. BUG=433 TESTED=set LIBYUV_WIDTH=1276 out\release\libyuv_unittest.exe --gtest_catch_exceptions=0 --gtest_filter=.Scale R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/49219004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1391 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-30 17:18:13 +00:00
fbarchard@google.com	9f4636e298	AVX2 port of ScaleDownBy4. BUG=314 TESTED=out\release\libyuv_unittest --gtest_filter=.ScaleDownBy4 Review URL: https://webrtc-codereview.appspot.com/46159004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1390 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-30 01:58:32 +00:00
fbarchard@google.com	4e78b8dc2e	scale to 3/4 or 3/8 with odd width destinations efficiently. previously if width was not multiple of what the simd loop would do (24), scaling would fall back on slower C code. This change allows SIMD to be used for most of the scaling and C for the remainder, improving efficiency. BUG=314 TESTED=set LIBYUV_WIDTH=1896 & ScaleDownBy3by4_* R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/48249004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1380 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-27 21:56:08 +00:00
fbarchard@google.com	1ffb04b43e	Allow ScaleRowDown any functions to accept non-power of 2 for destination SIMD multiple. BUG=none TESTED=local unittests pass R=bcornell@google.com Review URL: https://webrtc-codereview.appspot.com/45129004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1379 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-24 22:32:12 +00:00
fbarchard@google.com	2b7f6b7dee	ScaleAddRows_Any_SSE2 functions for handling odd widths. BUG=425 TESTED=out\release\libyuv_unittest_old --gtest_filter=.ScaleDownBy3_ R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/45219004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1377 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-22 00:51:56 +00:00
yang.zhang@arm.com	5f609856de	Add ScaleARGBFilterCols_NEON for ARM32/64 ARM32/64 NEON versions of ScaleARGBFilterCols_NEON are implemented. BUG=319 TESTED=libyuvTest.* on ARM32/64 with Android R=fbarchard@google.com Change-Id: Ifea62bc25d846bf16cb51d13b408de7bf58dccd4 Review URL: https://webrtc-codereview.appspot.com/46699004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1361 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-07 03:45:29 +00:00
fbarchard@google.com	44b6ba91e4	Scale down by 4 for odd number of destination pixels using 'any' that handles SIMD for multiple of 8 pixels, and C for the remainder. BUG=314 TESTED=local test with width odd Review URL: https://webrtc-codereview.appspot.com/49599004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1355 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-03 22:12:53 +00:00
yang.zhang@arm.com	f23d6222ac	Add ScaleARGBCols_NEON for ARM32/64 ARM32/64 NEON versions of ScaleARGBCols_NEON are implemented. BUG=319 TESTED=libyuvTest.* on ARM32/64 with Android R=fbarchard@google.com Change-Id: Id9ad97f7aa5d8a34cd55ace9e648cb6ff028efd9 Review URL: https://webrtc-codereview.appspot.com/47689004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1351 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-03-31 03:03:05 +00:00
fbarchard@google.com	72673ac873	linear and point sample scale to half size for AVX2. BUG=314 TESTED=out\release\libyuv_unittest.exe --gtest_catch_exceptions=0 --gtest_filter=.ScaleDownBy2 R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/44959004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1349 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-03-30 21:46:08 +00:00
fbarchard@google.com	e6ca9cc2a2	Scale down by 2 AVX2 port. Processes twice as many pixels as SSE2 and takes advantage of 3 argument instructions to reduce register usage and number of instructions. BUG=314 TESTED=libyuvTest.ScaleDownBy2_Box R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/42959004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1347 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-03-26 23:21:08 +00:00
fbarchard@google.com	d41fbf40dd	Handle scale down by factor of 2 efficiently by calling SIMD for multiple of 16 destination pixels, and C for remainder. BUG=314 TESTED=out\release\libyuv_unittest.exe --gtest_catch_exceptions=0 --gtest_filter=.ScaleDownBy2 R=bcornell@google.com Review URL: https://webrtc-codereview.appspot.com/48689004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1344 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-03-24 23:25:30 +00:00
yang.zhang@arm.com	d6d7de5742	Add ScaleFilterCols_NEON for ARM32/64 ARM32/64 NEON versions of ScaleFilterCols_NEON are implemented. BUG=319 TESTED=libyuvTest.* on ARM32/64 with Android R=fbarchard@google.com Change-Id: I5b0838769ffb0182155d7cd6bcc520eb81eb5c4e Review URL: https://webrtc-codereview.appspot.com/41349004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1340 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-03-19 03:55:05 +00:00

24 Commits