libyuv

mirror of https://chromium.googlesource.com/libyuv/libyuv synced 2025-12-06 16:56:55 +08:00

Author	SHA1	Message	Date
Frank Barchard	6e6f81b803	Floating point Gaussian kernels On SkylakeX for 720p TestGaussPlane_F32 (657 ms) On Pixel3 TestGaussPlane_F32 (1787 ms) Bug: libyuv:852, b/145611468 Change-Id: I9859af1b9381621067992305727da285f82bdded Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1949667 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Marat Dukhan <maratek@google.com>	2019-12-09 04:45:59 +00:00
Frank Barchard	d82f4baf5f	Upstream minor changes. Faster tests, Faster YUV Rotate180 and Mirror Bug: libyuv:840, libyuv:849: b/144318948 Change-Id: I303c02ac2b838a09d3e623df7a69ffc085fe3cd2 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1914781 Reviewed-by: Miguel Casas <mcasas@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2019-11-13 20:02:40 +00:00
Frank Barchard	c85a7b3ae3	MMI Optimized functions I422ToARGB for 1080p video Improves playback performance for 1080p video on www.youku.com BUG=libyuv:841 Change-Id: Iabe7693fba276162af0290863f46e214ab86fb6c Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1790959 Reviewed-by: Miguel Casas <mcasas@chromium.org>	2019-09-11 21:06:21 +00:00
Frank Barchard	fec9121b67	SwapUV AVX2 and SSSE3 Based on ARGBShuffle but with count adjusted and new shuffle mask BUG=libyuv:809 Change-Id: Idd936ee6bedcf285607a68c2fc54d876b4becc01 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1711882 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2019-07-26 18:41:40 +00:00
Frank Barchard	f1c00932df	NV21 unittest and benchmark BUG=libyuv:809 Change-Id: I75afb5612dcd05820479848a90ad16b07a7981bc Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1707229 Reviewed-by: richard winterton <rrwinterton@gmail.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2019-07-18 02:13:02 +00:00
Frank Barchard	f9aacffa02	Fix arm unittest failure by removing unused FloatDivToByteRow. Apply clang-format to fix jpeg if() for lint fix. Change comments about 4th pixel for open source compliance. Rename UVToVU to SwapUV for consistency with MergeUV. BUG=b/135532289, b/136515133 Change-Id: I9ce377c57b1d4d8f8b373c4cb44cd3f836300f79 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1685936 Reviewed-by: Chong Zhang <chz@google.com> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2019-07-02 20:00:30 +00:00
Frank Barchard	413a8d8041	Add AYUVToNV12 and NV21ToNV12 BUG=libyuv:832 TESTED=out/Release/libyuv_unittest --gtest_filter=ToNV12 --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 R=rrwinterton@gmail.com Change-Id: Id03b4613211fb6a6e163d10daa7c692fe31e36d8 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1560080 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: richard winterton <rrwinterton@gmail.com> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2019-04-12 17:48:45 +00:00
Frank Barchard	20bf569a04	Fix ConvertToI420() for odd crop_y The original src_u calculation of FOURCC_I420 shifted half width if crop_y is odd. This CL fixs the problem and also add a test case for it. Bug: b:115278653 Test: pass libyuv_unittest Change-Id: Ia9732d22e64e13de26df47726ba44ad1c5a06484 Reviewed-on: https://chromium-review.googlesource.com/c/1258743 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2018-10-03 19:14:01 +00:00
Martin Storsjö	9b772abf97	Restore the file mode for source files This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1. Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4 Reviewed-on: https://chromium-review.googlesource.com/1146183 Reviewed-by: Frank Barchard <fbarchard@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2018-08-06 18:53:32 +00:00
lixia zhang	21be9122aa	libyuv:loongson optimize compare/row/scale/rotate files with mmi. Currently, libyuv supports MIPS SIMD Arch(MSA), but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform). In order to improve performance of libyuv on loongson3a platform, this provides optimize 98 functions with mmi. BUG=libyuv:804 Change-Id: I8947626009efad769b3103a867363ece25d79629 Reviewed-on: https://chromium-review.googlesource.com/1122064 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2018-07-20 22:53:04 +00:00
Frank Barchard	85722f5d93	ByteToFloatRow_NEON to convert and scale bytes to floats Each byte is converted to float (0.0 to 255.0) and then multiplied by a scale parameter. Bug: None Test: arm 64 build passes. Change-Id: I04736798540b8d985f60abdf0388e24a209d075b Reviewed-on: https://chromium-review.googlesource.com/930226 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Ian Field <ianfield@google.com>	2018-02-24 00:34:07 +00:00
Frank Barchard	e1f6c1c0b5	tidy applied with readability-inconsistent-declaration-parameter-name Bug: libyuv:750 Test: builds and runs and passes more tidy tests Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281 Reviewed-on: https://chromium-review.googlesource.com/902170 Reviewed-by: Weiyong Yao <braveyao@chromium.org>	2018-02-07 00:24:25 +00:00
Frank Barchard	92e22cf5b6	Lint cleanup after C99 change CL TBR=braveyao@chromium.org Bug: libyuv:774 Test: git cl lint Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743 Reviewed-on: https://chromium-review.googlesource.com/882217 Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2018-01-24 19:16:03 +00:00
Frank Barchard	7e389884a1	Switch to C99 types Append _t to all sized types. uint64 becomes uint64_t etc Bug: libyuv:774 Test: try bots build on all platforms Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac Reviewed-on: https://chromium-review.googlesource.com/879922 Reviewed-by: Miguel Casas <mcasas@chromium.org>	2018-01-23 19:16:05 +00:00
Frank Barchard	2ed2402fa0	I420ToI010 for 8 to 10 bit YUV conversion. Convert planar 8 bit formats to planar 16 bit formats. Includes msan fix for Convert8To16Row_Opt unittest. I420 is YUV bt.601 8 bits per channel with 420 subsampling. I010 is YUV bt.601 10 bits per channel with 420 subsampling. I is color space - bt.601. The function does no color space conversion so H420ToI010 is aliased to this function as well. 0 = 420 subsampling. The chroma channels are half width / height. 10 = 10 bits per channel, stored in low 10 bits of 16 bit samples. For SSSE3 version: out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1 [ RUN ] LibYUVConvertTest.I420ToI010_Opt [ OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms) Bug: libyuv:751 Test: LibYUVConvertTest.I420ToI010_Opt Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2 Reviewed-on: https://chromium-review.googlesource.com/846421 Reviewed-by: Miguel Casas <mcasas@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2018-01-02 21:09:39 +00:00
Frank Barchard	140fc0a261	Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2 LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used. Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3. Bug: libyuv: 769 Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb Reviewed-on: https://chromium-review.googlesource.com/846541 Reviewed-by: Weiyong Yao <braveyao@chromium.org> Reviewed-by: Frank Barchard <fbarchard@chromium.org>	2018-01-02 18:57:39 +00:00
Frank Barchard	5336217f11	H010Copy function to copy 16 bit planar formats Bug: libyuv:751 Test: LibYUVConvertTest.H010ToH010_Opt Change-Id: I996d309040a14193a97d05b62ac0b3e1ad1ee74b Reviewed-on: https://chromium-review.googlesource.com/823445 Commit-Queue: Frank Barchard <fbarchard@chromium.org> Reviewed-by: Cheng Wang <wangcheng@google.com> Reviewed-by: richard winterton <rrwinterton@gmail.com>	2017-12-15 03:34:34 +00:00
Frank Barchard	3b81288ece	Remove Mips DSPR2 code Bug: libyuv:765 Test: build for mips still passes Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37 Reviewed-on: https://chromium-review.googlesource.com/826404 Reviewed-by: Weiyong Yao <braveyao@chromium.org> Commit-Queue: Frank Barchard <fbarchard@chromium.org>	2017-12-14 18:22:16 +00:00
Frank Barchard	1e16cb5c38	SplitRGBPlane and MergeRGBPlane functions added Converts packed RGB to planar and back. TBR=kjellander@chromium.org BUG=libyuv:728 TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc Reviewed-on: https://chromium-review.googlesource.com/658069 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Cheng Wang <wangcheng@google.com>	2017-09-11 21:02:04 +00:00
Manojkumar Bhosale	b6e8e9aa97	Add MSA optimized HalfFloatRow function TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: I54a2c57d66093b887c8ba31fd7a21a102165393a Reviewed-on: https://chromium-review.googlesource.com/628557 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-29 18:40:08 +00:00
Frank Barchard	78e44628c6	Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions. TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: Ie2342f841f1bb8469fc4631b784eddd804f5d53e Reviewed-on: https://chromium-review.googlesource.com/616765 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-17 18:39:22 +00:00
Manojkumar Bhosale	dbd7c1a9c5	Add MSA optimized ARGBExtractAlpha, ARGBBlend, ARGBQuantize and ARGBColorMatrix row functions TBR=kjellander@chromium.org R=fbarchard@google.com Bug:libyuv:634 Change-Id: I17bd3f87336f613ad363af7d7b9d7af49d725e56 Reviewed-on: https://chromium-review.googlesource.com/613100 Reviewed-by: Frank Barchard <fbarchard@google.com>	2017-08-14 17:38:31 +00:00
Frank Barchard	73a603e120	clang-format 5.0 applied to libyuv BUG=None TEST=try bots and lint test Change-Id: I1ab462adf2d309117862c5eb4b244a61ae202951 Reviewed-on: https://chromium-review.googlesource.com/450658 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: Henrik Kjellander <kjellander@chromium.org>	2017-03-08 18:50:12 +00:00
Frank Barchard	136aa9d37c	any11p fix for buffer overrun BUG=libyuv:686 TESTED=untested Change-Id: Idfae93349dd78b1b633a596631e5397e11b77d0b Reviewed-on: https://chromium-review.googlesource.com/448320 Reviewed-by: Frank Barchard <fbarchard@google.com> Reviewed-by: Henrik Kjellander <kjellander@chromium.org> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-03-03 19:57:35 +00:00
Manojkumar Bhosale	45b176d153	Add MSA optimized Interpolate/MergeUV/Misc functions BUG=libyuv:634 Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126 Performance Gain (vs C auto-vectorized) InterpolateRow_MSA - ~3.3x InterpolateRow_Any_MSA - ~2.5x ARGBSetRow_MSA - ~1.0x ARGBSetRow_Any_MSA - ~1.0x ARGBToRGB24Row_MSA - ~1.9x ARGBToRGB24Row_Any_MSA - ~1.6x MergeUVRow_MSA - ~1.6x MergeUVRow_Any_MSA - ~1.2x Performance Gain (vs C non-vectorized) InterpolateRow_MSA - ~11.3x InterpolateRow_Any_MSA - ~ 7.9x ARGBSetRow_MSA - ~ 6.2x ARGBSetRow_Any_MSA - ~ 4.0x ARGBToRGB24Row_MSA - ~ 9.9x ARGBToRGB24Row_Any_MSA - ~ 8.4x MergeUVRow_MSA - ~12.7x MergeUVRow_Any_MSA - ~ 8.0x Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126 Reviewed-on: https://chromium-review.googlesource.com/445817 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-02-23 01:42:22 +00:00
Frank Barchard	20ecb366ea	I420ToI400 fix for unused u and v arguments TBR=kjellander@chromium.org BUG=libyuv:679 TEST=None Change-Id: I830f31e0de58c967cb02c71f422df4c19c94c367 Reviewed-on: https://chromium-review.googlesource.com/441949 Reviewed-by: Frank Barchard <fbarchard@google.com> Commit-Queue: Frank Barchard <fbarchard@google.com>	2017-02-14 02:54:58 +00:00
Manojkumar Bhosale	54ce8f23d6	Add MSA optimized ARGB/ABGR/BGRA/RGBA To Y/UV row functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C auto-vectorized) ARGBToYJRow_MSA - ~3.2x ARGBToYJRow_Any_MSA - ~2.7x BGRAToYRow_MSA - ~3.2x BGRAToYRow_Any_MSA - ~2.7x ABGRToYRow_MSA - ~3.2x ABGRToYRow_Any_MSA - ~2.6x RGBAToYRow_MSA - ~3.1x RGBAToYRow_Any_MSA - ~2.7x ARGBToUVJRow_MSA - ~5.5x ARGBToUVJRow_Any_MSA - ~4.5x BGRAToUVRow_MSA - ~2.1x BGRAToUVRow_Any_MSA - ~2.0x ABGRToUVRow_MSA - ~2.1x ABGRToUVRow_Any_MSA - ~1.9x RGBAToUVRow_MSA - ~2.2x RGBAToUVRow_Any_MSA - ~1.9x Performance Gain (vs C non-vectorized) ARGBToYJRow_MSA - ~10.9x ARGBToYJRow_Any_MSA - ~9.2x BGRAToYRow_MSA - ~10.9x BGRAToYRow_Any_MSA - ~9.3x ABGRToYRow_MSA - ~11.0x ABGRToYRow_Any_MSA - ~9.3x RGBAToYRow_MSA - ~10.9x RGBAToYRow_Any_MSA - ~9.1x ARGBToUVJRow_MSA - ~12.4x ARGBToUVJRow_Any_MSA - ~10.5x BGRAToUVRow_MSA - ~4.7x BGRAToUVRow_Any_MSA - ~4.4x ABGRToUVRow_MSA - ~4.7x ABGRToUVRow_Any_MSA - ~4.5x RGBAToUVRow_MSA - ~4.8x RGBAToUVRow_Any_MSA - ~4.4x Review-Url: https://codereview.chromium.org/2641153003 .	2017-02-01 10:31:28 +05:30
Manojkumar Bhosale	09b8c971b3	Add MSA optimized NV12/21 To RGB row functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C auto-vectorized) NV12ToARGBRow_MSA - ~1.5x NV12ToARGBRow_Any_MSA - ~1.4x NV12ToRGB565Row_MSA - ~1.4x NV12ToRGB565Row_Any_MSA - ~1.4x NV21ToARGBRow_MSA - ~1.5x NV21ToARGBRow_Any_MSA - ~1.5x SobelRow_MSA - ~4.3x SobelRow_Any_MSA - ~3.4x SobelToPlaneRow_MSA - ~8.0x SobelToPlaneRow_Any_MSA - ~4.7x SobelXYRow_MSA - ~3.0x SobelXYRow_Any_MSA - ~2.5x Performance Gain (vs C non-vectorized) NV12ToARGBRow_MSA - ~6.5x NV12ToARGBRow_Any_MSA - ~6.5x NV12ToRGB565Row_MSA - ~6.2x NV12ToRGB565Row_Any_MSA - ~6.1x NV21ToARGBRow_MSA - ~6.5x NV21ToARGBRow_Any_MSA - ~6.5x SobelRow_MSA - ~14.5x SobelRow_Any_MSA - ~11.3x SobelToPlaneRow_MSA - ~34.2x SobelToPlaneRow_Any_MSA - ~19.4x SobelXYRow_MSA - ~11.1x SobelXYRow_Any_MSA - ~9.1x Review-Url: https://codereview.chromium.org/2636483002 .	2017-01-18 09:24:39 +05:30
Manojkumar Bhosale	a899dea251	Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) ARGBAttenuateRow_MSA - ~1.1x ARGBAttenuateRow_Any_MSA - ~1.1x ARGBToRGB565DitherRow_MSA - ~6.4x ARGBToRGB565DitherRow_Any_MSA - ~6.2x ARGBShuffleRow_MSA - ~5.1x ARGBShuffleRow_Any_MSA - ~1.9x ARGBShadeRow_MSA - ~1.1x ARGBGrayRow_MSA - ~2.6x ARGBSepiaRow_MSA - ~11.6x Performance Gain (vs C non-vectorized) ARGBAttenuateRow_MSA - ~2.46x ARGBAttenuateRow_Any_MSA - ~2.45x ARGBToRGB565DitherRow_MSA - ~9.4x ARGBToRGB565DitherRow_Any_MSA - ~12.5x ARGBShuffleRow_MSA - ~5.2x ARGBShuffleRow_Any_MSA - ~1.9x ARGBShadeRow_MSA - ~4.3x ARGBGrayRow_MSA - ~10.5x ARGBSepiaRow_MSA - ~12.2x Review-Url: https://codereview.chromium.org/2559693002 .	2016-12-15 12:06:02 +05:30
Manojkumar Bhosale	83f460be33	Add MSA optimized ARGB Multiply/Add/Subtract row functions R=fbarchard@google.com BUG=libyuv:634 Performance Gain (vs C vectorized) ARGBMultiplyRow_MSA - 1.4x ARGBAddRow_MSA - 8.6x ARGBSubtractRow_MSA - 8.6x ARGBMultiplyRow_Any_MSA - 1.35x ARGBAddRow_Any_MSA - 7.3x ARGBSubtractRow_Any_MSA - 7.2x Performance Gain (vs C non-vectorized) ARGBMultiplyRow_MSA - 4.4x ARGBAddRow_MSA - 27x ARGBSubtractRow_MSA - 22x ARGBMultiplyRow_Any_MSA - 3.5x ARGBAddRow_Any_MSA - 23x ARGBSubtractRow_Any_MSA - 18x Review URL: https://codereview.chromium.org/2529983002 .	2016-12-02 15:21:10 +05:30
Frank Barchard	e62309f259	clang-format libyuv BUG=libyuv:654 R=kjellander@chromium.org Review URL: https://codereview.chromium.org/2469353005 .	2016-11-07 17:37:23 -08:00
Frank Barchard	f5d5bd88d6	Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions R=fbarchard@google.com BUG=libyuv:634 Performance Gains :- (vs C vectorized) I422ToARGBRow_MSA : ~1.6x I422ToRGBARow_MSA : ~1.6x I422ToARGBRow_Any_MSA : ~1.58x I422ToRGBARow_Any_MSA : ~1.6x Performance Gains :- (vs C non-vectorized) I422ToARGBRow_MSA : ~7x I422ToRGBARow_MSA : ~7x I422ToARGBRow_Any_MSA : ~6.9x I422ToRGBARow_Any_MSA : ~6.8x Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA. Review URL: https://codereview.chromium.org/2430313005 .	2016-10-24 15:37:08 -07:00
Frank Barchard	451af5e922	scale by 1 for neon implemented void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fcvtn v4.4h, v2.4s \n" // 8 floatsgit "fcvtn2 v4.8h, v1.4s \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : : "cc", "memory", "v1", "v2", "v4" ); } void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) { asm volatile ( "1: \n" MEMACCESS(0) "ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts "subs %w2, %w2, #8 \n" // 8 pixels per loop "uxtl v2.4s, v1.4h \n" // 8 int's "uxtl2 v1.4s, v1.8h \n" "scvtf v2.4s, v2.4s \n" // 8 floats "scvtf v1.4s, v1.4s \n" "fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent "fmul v1.4s, v1.4s, %3.s[0] \n" "uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat "uqshrn2 v4.8h, v1.4s, #13 \n" MEMACCESS(1) "st1 {v4.16b}, [%1], #16 \n" // store 8 shorts "b.gt 1b \n" : "+r"(src), // %0 "+r"(dst), // %1 "+r"(width) // %2 : "w"(scale * 1.9259299444e-34f) // %3 : "cc", "memory", "v1", "v2", "v4" ); } TEST=LibYUVPlanarTest.TestHalfFloatPlane_One BUG=libyuv:560 R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2430313008 .	2016-10-21 14:30:03 -07:00
Frank Barchard	f553db2d30	HalfFloatPlane unittest for denormal half floats Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals. TEST=TestHalfFloatPlane_denormal BUG=libyuv:560 R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2424233004 .	2016-10-19 18:13:01 -07:00
Frank Barchard	2d80fc3133	Port HalfFloatRow_SSE2 to AVX2 but not using F16C. R=wangcheng@google.com, hubbe@chromium.org BUG=libyuv:560 Review URL: https://codereview.chromium.org/2421993002 .	2016-10-14 19:01:41 -07:00
Frank Barchard	fdcf524aac	Add f16c (halffloat) cpuid R=wangcheng@google.com, hubbe@chromium.org BUG=libyuv:560 Review URL: https://codereview.chromium.org/2418763006 .	2016-10-14 16:34:08 -07:00
Frank Barchard	a5e93766a2	Add ARGBExtractAlpha_AVX2 function Port SSE2 version to AVX2. BUG=libyuv:572 TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=Extract R=wangcheng@google.com, magjed@chromium.org Review URL: https://codereview.chromium.org/2420553002 .	2016-10-13 16:03:43 -07:00
Frank Barchard	af87c11c9a	YUY2ToI422 coalesce rows for small images TBR=wangcheng@google.com BUG=libyuv:647 TESTED=LibYUVConvertTest.YUY2ToI422_Opt Review URL: https://codereview.chromium.org/2393393006 .	2016-10-07 18:35:42 -07:00
Frank Barchard	edd3a84d05	libyuv::YUY2ToY for isolating Y channel of YUY2. This function is the first step of YUY2 To I420. Provided primarily for diagnostics. TBR=wangcheng@google.com BUG=libyuv:647 TESTED=LibYUVConvertTest.YUY2ToY_Opt Review URL: https://codereview.chromium.org/2399153004 .	2016-10-07 17:20:30 -07:00
Frank Barchard	a2891ec77c	Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions R=fbarchard@google.com BUG=libyuv:634 Performance gains as below, YUY2ToI422, YUY2ToI420 :- YUY2ToYRow_MSA : ~10x YUY2ToUVRow_MSA : ~11x YUY2ToUV422Row_MSA : ~9x YUY2ToYRow_Any_MSA : ~6x YUY2ToUVRow_Any_MSA : ~5x YUY2ToUV422Row_Any_MSA : ~4x UYVYToI422, UYVYToI420 :- UYVYToYRow_MSA : ~10x UYVYToUVRow_MSA : ~11x UYVYToUV422Row_MSA : ~9x UYVYToYRow_Any_MSA : ~6x UYVYToUVRow_Any_MSA : ~5x UYVYToUV422Row_Any_MSA : ~4x Review URL: https://codereview.chromium.org/2397693002 .	2016-10-07 10:37:22 -07:00
Frank Barchard	3b88a19ab1	YUY2ToI422_Any_Neon clean up to not require 16 pixels YUY2ToI422_Any_Neon previously required 16 pixels and duplicated the last pixel. The replication was not necessary after a previous change to treat YUY2 to 4 byte macro pixels. TBR=harryjin@google.com BUG=libyuv:648 TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=YUY2ToI422 -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1" Review URL: https://codereview.chromium.org/2399143002 .	2016-10-06 12:11:40 -07:00
Frank Barchard	aa197ee1a3	HalfFloat_SSE2 for Visual C Low level support for 12 bit 420, 422 and 444 YUV video frame conversion. BUG=libyuv:560, chromium:445071 TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows R=hubbe@chromium.org, wangcheng@google.com Review URL: https://codereview.chromium.org/2387713002 .	2016-10-03 10:33:38 -07:00
Frank Barchard	4a14cb2e81	HalfFloat_SSE2 port from C algorithm to SSE2 Low level support for 12 bit 420, 422 and 444 YUV video frame conversion. BUG=libyuv:560, chromium:445071 TEST=untested R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2381493006 .	2016-09-30 09:47:16 -07:00
Frank Barchard	7fc932ddd3	Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion. BUG=libyuv:560,chromium:445071 TEST=untested R=hubbe@chromium.org Review URL: https://codereview.chromium.org/2371293002 .	2016-09-29 15:06:30 -07:00
Frank Barchard	618149084e	Add MIPS SIMD Arch (MSA) optimized ARGBMirrorRow function This patch adds MSA optimized ARGBMirrorRow function in libYUV project. Performance gain ~3x R=fbarchard@google.com BUG=libyuv:634 Review URL: https://codereview.chromium.org/2368313003 .	2016-09-26 16:28:01 -07:00
Frank Barchard	c5323b0fdc	Add MIPS SIMD Arch (MSA) optimized MirrorRow function As per the preparation patch added in Chromium sources at, 2150943003: Add MIPS SIMD Arch (MSA) build flags for GYP/GN builds This patch adds first MSA optimized function in libYUV project. BUG=libyuv:634 R=fbarchard@google.com Review URL: https://codereview.chromium.org/2285683002 .	2016-09-22 16:12:22 -07:00
Frank Barchard	c244a3e9a0	Add SplitUVPlanes and MergeUVPlanes Add public methods SplitUVPlanes and MergeUVPlanes based on the optimized assembly functions that already exists. Also, de-duplicate the CPU dispatching code for these functions by moving them to helper functions. BUG=libyuv:629 R=braveyao@chromium.org Review URL: https://codereview.chromium.org/2277603004 .	2016-08-24 16:47:24 -07:00
Frank Barchard	161e5c4569	Allow NULL for dst_y in planar formats. BUG=libyuv:631 TEST=unittests build/pass BUG=libyuv:631 TEST=unittests build/pass R=harryjin@google.com Review URL: https://codereview.chromium.org/2271053003 .	2016-08-24 10:19:14 -07:00
Frank Barchard	6546096269	ARGBExtractAlpha 16 pixels at a time for ARM arm64 8 TestARGBExtractAlpha (10019 ms) <-original 64 bit code arm64 8 x2 TestARGBExtractAlpha (7639 ms) arm64 16 TestARGBExtractAlpha (7369 ms) <- new 64 bit code thumb32 8 TestARGBExtractAlpha (9505 ms) <- original 32 bit code thumb32 8 x2 TestARGBExtractAlpha (7400 ms) thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code arm32 8 TestARGBExtractAlpha (10002 ms) BUG=libyuv:572 TESTED=local test on nexus 9 R=harryjin@google.com, wangcheng@google.com Review URL: https://codereview.chromium.org/2035573002 .	2016-06-07 10:44:28 -07:00
Frank Barchard	b00d40160a	make unittest allocator align to 64 bytes. blur requires memory be aligned. change the unittest allocator to guarantee 64 byte alignment. re-enable blur any test that fails if memory is unaligned. TBR=harryjin@google.com BUG=libyuv:596,libyuv:594 TESTED=local build passes with row.h removed from tests. Review URL: https://codereview.chromium.org/2019753002 .	2016-05-27 18:02:47 -07:00
Frank Barchard	ade85fb55c	remove row.h from unittests add SIMD_ALIGNED to unittest header. BUG=libyuv:594 TESTED=local build passes with row.h removed from tests. R=harryjin@google.com Review URL: https://codereview.chromium.org/2001373002 .	2016-05-27 10:57:49 -07:00
Magnus Jedvert	942db3016a	Add ARGBExtractAlpha function BUG=libyuv:572 R=fbarchard@google.com Review URL: https://codereview.chromium.org/1995293002 .	2016-05-26 10:30:57 +02:00
Frank Barchard	3c862e3d29	Fix stride bug for msan on I420Interpolate. When using C version of I420Interpolate for msan, a 50% interpolation would cause stride to be cast to int, which could cause erroneous memory reads on 64 bit build. This CL makes the stride use ptrdiff_t for HalfRow_C BUG=libyuv:582 TESTED=try bots tests R=dhrosa@google.com Review URL: https://codereview.chromium.org/1872953002 .	2016-04-08 15:58:53 -07:00
Frank Barchard	0d880e5bc0	rename MIPS_DSPR2 to DSPR2 for consistency When attempting to normalize function names to end in Row_SIMD it was made harder with MIPS_DSPR2 naming convention. Other CPUs do not include the vendor. This should be named consistently. Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other processors. TBR=harryjin@google.com BUG=libyuv:562 Review URL: https://codereview.chromium.org/1677633002 .	2016-02-05 14:49:54 -08:00
Frank Barchard	58cb534962	Fix memory overwrite in YUY2ToNV12 odd wdiths When width was odd Y channel wrote an extra pixel. This change splits the Y from UV into a temporary buffer and memcpy's to the destination. Performance is slower. Was YUY2ToNV12_Any (307 ms) YUY2ToNV12_Unaligned (213 ms) TestYUY2ToNV12 (181 ms) YUY2ToNV12_Opt (177 ms) YUY2ToNV12_Invert (177 ms) Npw YUY2ToNV12_Any (300 ms) YUY2ToNV12_Unaligned (226 ms) YUY2ToNV12_Invert (206 ms) TestYUY2ToNV12 (184 ms) YUY2ToNV12_Opt (181 ms) TBR=harryjin@google.com BUG=libyuv:545 Review URL: https://codereview.chromium.org/1593833002 .	2016-01-19 11:28:09 -08:00
Frank Barchard	fc52d8ded2	Odd width variation of scale down by 2 for subsampling R=dhrosa@google.com, harryjin@google.com BUG=libyuv:538 Review URL: https://codereview.chromium.org/1558093003 .	2016-01-06 15:12:17 -08:00
Frank Barchard	f4447745ae	Add rounding to InterpolateRow for improved quality and consistency. Remove inaccurate specializations for 1/4 and 3/4, since they round incorrectly. Specialize for 100% and 50% are kept due to performance. Make C and ARM code match SSSE3. Make unittests expect zero difference. BUG=libyuv:535 R=harryjin@google.com Review URL: https://codereview.chromium.org/1533643005 .	2015-12-17 15:24:06 -08:00
Frank Barchard	ae55e41851	use rounding in scaledown by 2 When scaling down by 2 the formula should round consistently. (a+b+c+d+2)/4 The C version did but the SSE2 version was doing 2 averages. avg(avg(a,b),avg(c,d)) This change uses a sum, then rounds. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:447,libyuv:527 Review URL: https://codereview.chromium.org/1513183004 .	2015-12-14 17:25:36 -08:00
Frank Barchard	a2ea905679	BlendPlane any width. Benchmark out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=Blend \| sortms Was I420Blend_Any (2321 ms) I420Blend_Unaligned (1684 ms) I420Blend_Opt (1675 ms) I420Blend_Invert (1653 ms) BlendPlane_Invert (1556 ms) BlendPlane_Any (1552 ms) BlendPlane_Unaligned (1548 ms) BlendPlane_Opt (1535 ms) ARGBBlend_Unaligned (659 ms) ARGBBlend_Any (596 ms) ARGBBlend_Invert (591 ms) ARGBBlend_Opt (508 ms) BlendPlaneRow_Unaligned (186 ms) BlendPlaneRow_Opt (171 ms) Now ARGBBlend_Any (621 ms) ARGBBlend_Unaligned (585 ms) ARGBBlend_Invert (564 ms) ARGBBlend_Opt (512 ms) I420Blend_Unaligned (347 ms) I420Blend_Invert (345 ms) I420Blend_Any (337 ms) I420Blend_Opt (327 ms) BlendPlane_Unaligned (187 ms) BlendPlaneRow_Unaligned (187 ms) BlendPlane_Invert (186 ms) BlendPlane_Any (186 ms) BlendPlaneRow_Opt (173 ms) BlendPlane_Opt (171 ms) which is comparable to aligned case out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=Blend \| sortms ARGBBlend_Any (625 ms) ARGBBlend_Unaligned (602 ms) ARGBBlend_Invert (508 ms) ARGBBlend_Opt (506 ms) I420Blend_Any (353 ms) I420Blend_Unaligned (322 ms) I420Blend_Invert (304 ms) I420Blend_Opt (301 ms) BlendPlaneRow_Unaligned (188 ms) BlendPlane_Unaligned (186 ms) BlendPlane_Invert (185 ms) BlendPlane_Any (184 ms) BlendPlaneRow_Opt (173 ms) BlendPlane_Opt (169 ms) R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1513443002 .	2015-12-08 18:59:48 -08:00
Frank Barchard	dee77a4ebe	Optimize yuv alpha blend AVX2 code to do 32 pixels at time. out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt Was LibYUVPlanarTest.I420Blend_Opt (2335 ms) Now LibYUVPlanarTest.I420Blend_Opt (1937 ms) vs SSSE3 LibYUVPlanarTest.I420Blend_Opt (2599 ms) BUG=libyuv:527 R=dhrosa@google.com Review URL: https://codereview.chromium.org/1505673003 .	2015-12-08 18:20:30 -08:00
Frank Barchard	2657688e70	Add support for odd height YUVA alpha blending. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1507683003 .	2015-12-07 12:03:20 -08:00
Frank Barchard	48a919d86e	Bug fix for UYVYToNV12 odd height TBR=harryjin@google.com BUG=libyuv:528 Review URL: https://codereview.chromium.org/1506973002 .	2015-12-07 11:39:48 -08:00
Frank Barchard	bea690b3e0	AVX2 YUV alpha blender and improved unittests AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions. unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:527 Review URL: https://codereview.chromium.org/1505433002 .	2015-12-05 22:23:29 -08:00
Frank Barchard	b6f37bd8ec	Interpolate plane initial implementation. YUV version of interpolation between two images. R=dhrosa@google.com, harryjin@google.com BUG=libyuv:526 Review URL: https://codereview.chromium.org/1479593002 .	2015-11-25 16:11:42 -08:00
Frank Barchard	d95d2169d9	rename yuv matrix constants to be more clear about what they are R=harryjin@google.com BUG=none Review URL: https://codereview.chromium.org/1429693006 .	2015-11-03 17:09:53 -08:00
Frank Barchard	ce4c2fad1d	Raw 24 bit RGB to RGB24 (bgr) Add unittests that do 1 step conversion vs 2 step conversion. Tests end swapping versions match direct conversions. R=harryjin@google.com BUG=libyuv:518 Review URL: https://codereview.chromium.org/1419103007 .	2015-11-03 10:30:30 -08:00
Frank Barchard	2c7aa0070a	remove I422ToBGRA and use I422ToRGBA internally Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix. Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions. R=harryjin@google.com BUG=libyuv:518 Review URL: https://codereview.chromium.org/1427993004 .	2015-11-02 10:24:12 -08:00
Frank Barchard	5d97b93369	refactor I420ToABGR to use I420ToARGBRow Using a transposed conversion matrix, I420ToARGB can output ABGR. R=harryjin@google.com, xhwang@chromium.org BUG=libyuv:473 Review URL: https://codereview.chromium.org/1413573010 .	2015-10-30 11:56:57 -07:00
Frank Barchard	76a599ec3b	fix jpeg and bt.709 yuvconstants for neon64. yuv constants for bt.601 were previously ported to neon64, as well as the code to respect other color spaces. But the jpeg and bt.709 colour conversion constants were still in armv7 form. This changes the constants for aarch64 builds to be compatible with the code. yuv constants are now passed as const * Remove Yvu constants which were used for older version on nv21 but not new code. TBR=harryjin@google.com BUG=none Review URL: https://codereview.chromium.org/1398623002 .	2015-10-07 19:46:56 -07:00
Frank Barchard	914a9856c7	Reimplement NV21ToARGB to allow different color matrix. Low level for NV21ToARGB written to accept yuv matrix used by other YUV to ARGB functions. Previously NV21 was implemented for Windows using NV12 with a different matrix that swapped U and V. But the Arm version of the low level does not allow the matrix U and V contributions to be swapped. Using a new low level function that reads NV21 and uses the same yuvconstants as other YUV conversion functions allows an Arm port of this function. TBR=harryjin@google.com BUG=libyuv:500 Review URL: https://codereview.chromium.org/1388273002 .	2015-10-06 20:34:44 -07:00
Frank Barchard	2cc1a2b233	Remove sse2 functions that also have ssse3 ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2 Since vast majority of CPUs have SSSE3 now, removing the SSE2 improves the performance of CPU dispatching. R=harryjin@google.com BUG=none Review URL: https://codereview.chromium.org/1377053003 .	2015-09-30 14:24:44 -07:00
Frank Barchard	f96890a0be	yuvconstants for all YUV to RGB conversion functions. R=harryjin@google.com BUG=libyuv:488 Review URL: https://codereview.chromium.org/1363503002 .	2015-09-22 10:26:03 -07:00
Frank Barchard	278d88f872	Copy Alpha odd width support R=harryjin@google.com BUG=none Review URL: https://webrtc-codereview.appspot.com/59369004.	2015-08-13 15:05:14 -07:00
Frank Barchard	ce98129951	yuy2tonv12 R=bcornell@google.com BUG=libyuv:466 Review URL: https://webrtc-codereview.appspot.com/51309004.	2015-07-17 16:22:59 -07:00
Frank Barchard	faa4b14f85	uyvy to nv12 R=harryjin@google.com BUG=libyuv:466 Review URL: https://webrtc-codereview.appspot.com/50339004.	2015-07-17 14:43:19 -07:00
fbarchard@google.com	bd2d903e1b	odd width support for ARGBSobel functions. Improves performance for images that are not a multiple of 8 pixels. BUG=444 TESTED=libyuvTest.ARGBSobel_Opt R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/54589004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1415 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-05-28 22:22:28 +00:00
fbarchard@google.com	cfce47efc8	Change Sobel to use JPeg Luma calculation instead of extracting G channel. Using luma produces a better sobel that respects all 3 channels of RGB. Historically the G channel was used to improve performance, and because the luma of I420 is a constrained range, hurting quality. Using the JPeg variation of YUV, the luma is more accurate, including cross platform, better optimized for AVX2 and odd widths, and full range. BUG=444 TESTED=ARGBSobelXY_Opt R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/57479004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1414 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-05-27 22:32:26 +00:00
fbarchard@google.com	8f0b32773c	ARGBToUV AVX2 functions hooked up. BUG=none TESTED=RGB565ToI420 R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/46829004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1359 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-04-07 00:10:52 +00:00
fbarchard@google.com	e6ca9cc2a2	Scale down by 2 AVX2 port. Processes twice as many pixels as SSE2 and takes advantage of 3 argument instructions to reduce register usage and number of instructions. BUG=314 TESTED=libyuvTest.ScaleDownBy2_Box R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/42959004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1347 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-03-26 23:21:08 +00:00
fbarchard@google.com	d28cd77f99	Enable assembly for clangcl build on Windows. Previously assembly was disabled so clangcl would work, but only with C code. As clangcl mimics both Visual C and GCC, ifdefs need to pick one or the other or often you'll end up with both. In this CL we disable most Visual C code and use the GCC versions which allow assembly for both 32 and 64 bit intel. BUG=412 TESTED=clang=1 build on windows R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/51389004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1341 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-03-19 20:36:31 +00:00
fbarchard@google.com	446fa95587	I422ToRGB565, ARGB4444 and ARGB1555 for AVX2 BUG=403 TESTED=avx2 emulator Review URL: https://webrtc-codereview.appspot.com/34359004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1293 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-02-24 23:14:46 +00:00
fbarchard@google.com	fd8054791c	build fixe for InterpolateRow_MIPS_DSPR2 BUG=398 TESTED=untested R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/37999004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1268 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-02-07 01:01:30 +00:00
fbarchard@google.com	b2a6af1be6	Change rectangle low level functions to use more conventional row functions including 'any' variations. Previously the yuv function SetPlane stored 32 bit values. Now a more conventional memset() style function is used for YUV that stores bytes. On Haswell a rep stosb is used for YUV. Overall benefit of this CL is improved performance for 'any' width, and simpler row assembly instead of full image assembly. Previously ARGBRect used a low level function that supported a rectangle in assembly. Now it uses a row function, and relies on row coalesce to combine into a single low level call. BUG=371 TESTED=untested R=brucedawson@google.com, harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/35689004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1222 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-01-12 03:58:24 +00:00
fbarchard@google.com	8e3db2dc73	Support invert for ARGBRect and SetPlane BUG=387 TESTED=ARGBRect_Invert R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/37539004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1219 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-01-07 19:02:01 +00:00
fbarchard@google.com	992c3b089a	Use HAS_ARGBSETROWS_X86 to detect presence of function. BUG=none TESTED=rectangle unittests R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/35639004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1218 16f28f9a-4ce2-e073-06de-1de4eb20be90	2015-01-07 00:11:51 +00:00
fbarchard@google.com	ef67597b48	ARGBMirror use SSE2 pshufd instruction instead of SSSE3 pshufb. BUG=269 TESTED=local benchmark for ARGBMirror R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/32509004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1176 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-11-21 19:25:14 +00:00
fbarchard@google.com	91f240c5db	Move sub before branch for loops. Remove CopyRow_x86 Add CopyRow_Any versions for AVX, SSE2 and Neon. BUG=269 TESTED=local build R=harryjin@google.com, tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/26209004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1175 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-11-20 21:14:27 +00:00
fbarchard@google.com	9dd083a512	ARGBMirror Any BUG=none TESTED=mirror and rotate unittests R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/30159004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1172 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-11-19 00:46:51 +00:00
fbarchard@google.com	59ed448685	MirrorAny functions so assembly can always be used. BUG=none TESTED=untested R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/29069004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1170 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-11-18 01:03:47 +00:00
fbarchard@google.com	9ed836b154	The 'Any' versions of functions can handle any width now, so remove the check from the calling code. This has 2 advantages - less code, and less overhead in calling function when any function is NOT used. Downside is more code for case where any is used. BUG=373 TESTED=libyuv_unittest still passes R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/24129004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1143 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-24 23:29:31 +00:00
fbarchard@google.com	0a6dab42c0	Add check for minimum of 8 pixels for any functions and multiple of 8 not 16 for neon functions. BUG=373 TESTED=try bots R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/23189004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1139 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-23 23:05:12 +00:00
fbarchard@google.com	f2fa453b94	Port I422ToABGR to AVX2. BUG=269 TESTED=intelsde on I422ToABGR R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/23149004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1138 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-23 17:20:22 +00:00
fbarchard@google.com	c000955bc0	Port I422ToRGBA to AVX. BUG=269 TESTED=intelsde on I422ToRGBA R=brucedawson@google.com Review URL: https://webrtc-codereview.appspot.com/28769004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1136 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-22 22:41:39 +00:00
fbarchard@google.com	d81dddd3d0	port I420ToBGRA to AVX2. BUG=269 TESTED=c:\intelsde\sde -ast -hsw -- out\release\libyuv_unittest.exe --gtest_filter=I420ToBGRA R=brucedawson@google.com, harryjin@google.com, magjed@chromium.org Review URL: https://webrtc-codereview.appspot.com/26869004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1127 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-20 19:35:55 +00:00
fbarchard@google.com	055725b8fe	Neon does 8 at a time, so a check is added for any function of I422ToBGRA that width is >= 8 and for fast path that it is a multiple of 8 not 16. BUG=373 TESTED=untested R=brucedawson@google.com Review URL: https://webrtc-codereview.appspot.com/24059004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1126 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-20 18:39:21 +00:00
fbarchard@google.com	f713691a6f	Change elif to endif and if to allow AVX2 as well as SSE2 in future changes instead of one or the other. BUG=none TESTED=try bots R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/30719004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1122 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-16 20:47:22 +00:00
fbarchard@google.com	b720049a54	Make row functions used for planarfunctions and convert use movdqu to relax alignment constraint. Step 1 - make functions unaligned. BUG=365 TESTED=libyuv_unittest passes R=harryjin@google.com Review URL: https://webrtc-codereview.appspot.com/26709004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1111 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-03 21:11:37 +00:00
fbarchard@google.com	044f914c29	Change scale to unaligned movdqu. BUG=365 TESTED=scale unittests R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/22879004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1101 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-10-01 01:16:04 +00:00
fbarchard@google.com	d33bf86b25	CopyRow_AVX which supports unaligned pointers for Sandy Bridge CPU. BUG=363 TESTED=out\release\libyuv_unittest --gtest_filter=ARGBToARGB_ R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/31489004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1097 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-09-29 23:53:18 +00:00
fbarchard@google.com	a9ff15b7bb	check copy has different address. If same, skip the copy to avoid valgrind error. BUG=334 TESTED=unittests still pass R=tpsiaki@google.com Review URL: https://webrtc-codereview.appspot.com/14679004 git-svn-id: http://libyuv.googlecode.com/svn/trunk@1011 16f28f9a-4ce2-e073-06de-1de4eb20be90	2014-06-11 00:16:59 +00:00

1 2 3 4 5 ...

362 Commits