Frank Barchard
6e6f81b803
Floating point Gaussian kernels
...
On SkylakeX for 720p
TestGaussPlane_F32 (657 ms)
On Pixel3
TestGaussPlane_F32 (1787 ms)
Bug: libyuv:852, b/145611468
Change-Id: I9859af1b9381621067992305727da285f82bdded
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1949667
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Marat Dukhan <maratek@google.com>
2019-12-09 04:45:59 +00:00
Frank Barchard
d82f4baf5f
Upstream minor changes. Faster tests, Faster YUV Rotate180 and Mirror
...
Bug: libyuv:840, libyuv:849: b/144318948
Change-Id: I303c02ac2b838a09d3e623df7a69ffc085fe3cd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1914781
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-13 20:02:40 +00:00
Frank Barchard
c85a7b3ae3
MMI Optimized functions I422ToARGB for 1080p video
...
Improves playback performance for 1080p video on www.youku.com
BUG=libyuv:841
Change-Id: Iabe7693fba276162af0290863f46e214ab86fb6c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1790959
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2019-09-11 21:06:21 +00:00
Frank Barchard
fec9121b67
SwapUV AVX2 and SSSE3
...
Based on ARGBShuffle but with count adjusted and new shuffle mask
BUG=libyuv:809
Change-Id: Idd936ee6bedcf285607a68c2fc54d876b4becc01
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1711882
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-26 18:41:40 +00:00
Frank Barchard
f1c00932df
NV21 unittest and benchmark
...
BUG=libyuv:809
Change-Id: I75afb5612dcd05820479848a90ad16b07a7981bc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1707229
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-18 02:13:02 +00:00
Frank Barchard
f9aacffa02
Fix arm unittest failure by removing unused FloatDivToByteRow.
...
Apply clang-format to fix jpeg if() for lint fix.
Change comments about 4th pixel for open source compliance.
Rename UVToVU to SwapUV for consistency with MergeUV.
BUG=b/135532289, b/136515133
Change-Id: I9ce377c57b1d4d8f8b373c4cb44cd3f836300f79
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1685936
Reviewed-by: Chong Zhang <chz@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-02 20:00:30 +00:00
Frank Barchard
413a8d8041
Add AYUVToNV12 and NV21ToNV12
...
BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*ToNV12* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
R=rrwinterton@gmail.com
Change-Id: Id03b4613211fb6a6e163d10daa7c692fe31e36d8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1560080
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2019-04-12 17:48:45 +00:00
Frank Barchard
20bf569a04
Fix ConvertToI420() for odd crop_y
...
The original src_u calculation of FOURCC_I420 shifted half width if
crop_y is odd.
This CL fixs the problem and also add a test case for it.
Bug: b:115278653
Test: pass libyuv_unittest
Change-Id: Ia9732d22e64e13de26df47726ba44ad1c5a06484
Reviewed-on: https://chromium-review.googlesource.com/c/1258743
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-10-03 19:14:01 +00:00
Martin Storsjö
9b772abf97
Restore the file mode for source files
...
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.
Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-08-06 18:53:32 +00:00
lixia zhang
21be9122aa
libyuv:loongson optimize compare/row/scale/rotate files with mmi.
...
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).
In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.
BUG=libyuv:804
Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-07-20 22:53:04 +00:00
Frank Barchard
85722f5d93
ByteToFloatRow_NEON to convert and scale bytes to floats
...
Each byte is converted to float (0.0 to 255.0) and then multiplied
by a scale parameter.
Bug: None
Test: arm 64 build passes.
Change-Id: I04736798540b8d985f60abdf0388e24a209d075b
Reviewed-on: https://chromium-review.googlesource.com/930226
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ian Field <ianfield@google.com>
2018-02-24 00:34:07 +00:00
Frank Barchard
e1f6c1c0b5
tidy applied with readability-inconsistent-declaration-parameter-name
...
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281
Reviewed-on: https://chromium-review.googlesource.com/902170
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2018-02-07 00:24:25 +00:00
Frank Barchard
92e22cf5b6
Lint cleanup after C99 change CL
...
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: git cl lint
Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743
Reviewed-on: https://chromium-review.googlesource.com/882217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 19:16:03 +00:00
Frank Barchard
7e389884a1
Switch to C99 types
...
Append _t to all sized types.
uint64 becomes uint64_t etc
Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Frank Barchard
2ed2402fa0
I420ToI010 for 8 to 10 bit YUV conversion.
...
Convert planar 8 bit formats to planar 16 bit formats.
Includes msan fix for Convert8To16Row_Opt unittest.
I420 is YUV bt.601 8 bits per channel with 420 subsampling.
I010 is YUV bt.601 10 bits per channel with 420 subsampling.
I is color space - bt.601. The function does no color space
conversion so H420ToI010 is aliased to this function as well.
0 = 420 subsampling. The chroma channels are half width / height.
10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.
For SSSE3 version:
out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
[ RUN ] LibYUVConvertTest.I420ToI010_Opt
[ OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToI010_Opt
Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
Reviewed-on: https://chromium-review.googlesource.com/846421
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-02 21:09:39 +00:00
Frank Barchard
140fc0a261
Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2
...
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.
Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-02 18:57:39 +00:00
Frank Barchard
5336217f11
H010Copy function to copy 16 bit planar formats
...
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToH010_Opt
Change-Id: I996d309040a14193a97d05b62ac0b3e1ad1ee74b
Reviewed-on: https://chromium-review.googlesource.com/823445
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-15 03:34:34 +00:00
Frank Barchard
3b81288ece
Remove Mips DSPR2 code
...
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-14 18:22:16 +00:00
Frank Barchard
1e16cb5c38
SplitRGBPlane and MergeRGBPlane functions added
...
Converts packed RGB to planar and back.
TBR=kjellander@chromium.org
BUG=libyuv:728
TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added
Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc
Reviewed-on: https://chromium-review.googlesource.com/658069
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-11 21:02:04 +00:00
Manojkumar Bhosale
b6e8e9aa97
Add MSA optimized HalfFloatRow function
...
TBR=kjellander@chromium.org
R=fbarchard@google.com
Bug:libyuv:634
Change-Id: I54a2c57d66093b887c8ba31fd7a21a102165393a
Reviewed-on: https://chromium-review.googlesource.com/628557
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-29 18:40:08 +00:00
Frank Barchard
78e44628c6
Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.
...
TBR=kjellander@chromium.org
R=fbarchard@google.com
Bug:libyuv:634
Change-Id: Ie2342f841f1bb8469fc4631b784eddd804f5d53e
Reviewed-on: https://chromium-review.googlesource.com/616765
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-17 18:39:22 +00:00
Manojkumar Bhosale
dbd7c1a9c5
Add MSA optimized ARGBExtractAlpha, ARGBBlend, ARGBQuantize and ARGBColorMatrix row functions
...
TBR=kjellander@chromium.org
R=fbarchard@google.com
Bug:libyuv:634
Change-Id: I17bd3f87336f613ad363af7d7b9d7af49d725e56
Reviewed-on: https://chromium-review.googlesource.com/613100
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-14 17:38:31 +00:00
Frank Barchard
73a603e120
clang-format 5.0 applied to libyuv
...
BUG=None
TEST=try bots and lint test
Change-Id: I1ab462adf2d309117862c5eb4b244a61ae202951
Reviewed-on: https://chromium-review.googlesource.com/450658
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-03-08 18:50:12 +00:00
Frank Barchard
136aa9d37c
any11p fix for buffer overrun
...
BUG=libyuv:686
TESTED=untested
Change-Id: Idfae93349dd78b1b633a596631e5397e11b77d0b
Reviewed-on: https://chromium-review.googlesource.com/448320
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-03 19:57:35 +00:00
Manojkumar Bhosale
45b176d153
Add MSA optimized Interpolate/MergeUV/Misc functions
...
BUG=libyuv:634
Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
Performance Gain (vs C auto-vectorized)
InterpolateRow_MSA - ~3.3x
InterpolateRow_Any_MSA - ~2.5x
ARGBSetRow_MSA - ~1.0x
ARGBSetRow_Any_MSA - ~1.0x
ARGBToRGB24Row_MSA - ~1.9x
ARGBToRGB24Row_Any_MSA - ~1.6x
MergeUVRow_MSA - ~1.6x
MergeUVRow_Any_MSA - ~1.2x
Performance Gain (vs C non-vectorized)
InterpolateRow_MSA - ~11.3x
InterpolateRow_Any_MSA - ~ 7.9x
ARGBSetRow_MSA - ~ 6.2x
ARGBSetRow_Any_MSA - ~ 4.0x
ARGBToRGB24Row_MSA - ~ 9.9x
ARGBToRGB24Row_Any_MSA - ~ 8.4x
MergeUVRow_MSA - ~12.7x
MergeUVRow_Any_MSA - ~ 8.0x
Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
Reviewed-on: https://chromium-review.googlesource.com/445817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-23 01:42:22 +00:00
Frank Barchard
20ecb366ea
I420ToI400 fix for unused u and v arguments
...
TBR=kjellander@chromium.org
BUG=libyuv:679
TEST=None
Change-Id: I830f31e0de58c967cb02c71f422df4c19c94c367
Reviewed-on: https://chromium-review.googlesource.com/441949
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-14 02:54:58 +00:00
Manojkumar Bhosale
54ce8f23d6
Add MSA optimized ARGB/ABGR/BGRA/RGBA To Y/UV row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C auto-vectorized)
ARGBToYJRow_MSA - ~3.2x
ARGBToYJRow_Any_MSA - ~2.7x
BGRAToYRow_MSA - ~3.2x
BGRAToYRow_Any_MSA - ~2.7x
ABGRToYRow_MSA - ~3.2x
ABGRToYRow_Any_MSA - ~2.6x
RGBAToYRow_MSA - ~3.1x
RGBAToYRow_Any_MSA - ~2.7x
ARGBToUVJRow_MSA - ~5.5x
ARGBToUVJRow_Any_MSA - ~4.5x
BGRAToUVRow_MSA - ~2.1x
BGRAToUVRow_Any_MSA - ~2.0x
ABGRToUVRow_MSA - ~2.1x
ABGRToUVRow_Any_MSA - ~1.9x
RGBAToUVRow_MSA - ~2.2x
RGBAToUVRow_Any_MSA - ~1.9x
Performance Gain (vs C non-vectorized)
ARGBToYJRow_MSA - ~10.9x
ARGBToYJRow_Any_MSA - ~9.2x
BGRAToYRow_MSA - ~10.9x
BGRAToYRow_Any_MSA - ~9.3x
ABGRToYRow_MSA - ~11.0x
ABGRToYRow_Any_MSA - ~9.3x
RGBAToYRow_MSA - ~10.9x
RGBAToYRow_Any_MSA - ~9.1x
ARGBToUVJRow_MSA - ~12.4x
ARGBToUVJRow_Any_MSA - ~10.5x
BGRAToUVRow_MSA - ~4.7x
BGRAToUVRow_Any_MSA - ~4.4x
ABGRToUVRow_MSA - ~4.7x
ABGRToUVRow_Any_MSA - ~4.5x
RGBAToUVRow_MSA - ~4.8x
RGBAToUVRow_Any_MSA - ~4.4x
Review-Url: https://codereview.chromium.org/2641153003 .
2017-02-01 10:31:28 +05:30
Manojkumar Bhosale
09b8c971b3
Add MSA optimized NV12/21 To RGB row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C auto-vectorized)
NV12ToARGBRow_MSA - ~1.5x
NV12ToARGBRow_Any_MSA - ~1.4x
NV12ToRGB565Row_MSA - ~1.4x
NV12ToRGB565Row_Any_MSA - ~1.4x
NV21ToARGBRow_MSA - ~1.5x
NV21ToARGBRow_Any_MSA - ~1.5x
SobelRow_MSA - ~4.3x
SobelRow_Any_MSA - ~3.4x
SobelToPlaneRow_MSA - ~8.0x
SobelToPlaneRow_Any_MSA - ~4.7x
SobelXYRow_MSA - ~3.0x
SobelXYRow_Any_MSA - ~2.5x
Performance Gain (vs C non-vectorized)
NV12ToARGBRow_MSA - ~6.5x
NV12ToARGBRow_Any_MSA - ~6.5x
NV12ToRGB565Row_MSA - ~6.2x
NV12ToRGB565Row_Any_MSA - ~6.1x
NV21ToARGBRow_MSA - ~6.5x
NV21ToARGBRow_Any_MSA - ~6.5x
SobelRow_MSA - ~14.5x
SobelRow_Any_MSA - ~11.3x
SobelToPlaneRow_MSA - ~34.2x
SobelToPlaneRow_Any_MSA - ~19.4x
SobelXYRow_MSA - ~11.1x
SobelXYRow_Any_MSA - ~9.1x
Review-Url: https://codereview.chromium.org/2636483002 .
2017-01-18 09:24:39 +05:30
Manojkumar Bhosale
a899dea251
Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBAttenuateRow_MSA - ~1.1x
ARGBAttenuateRow_Any_MSA - ~1.1x
ARGBToRGB565DitherRow_MSA - ~6.4x
ARGBToRGB565DitherRow_Any_MSA - ~6.2x
ARGBShuffleRow_MSA - ~5.1x
ARGBShuffleRow_Any_MSA - ~1.9x
ARGBShadeRow_MSA - ~1.1x
ARGBGrayRow_MSA - ~2.6x
ARGBSepiaRow_MSA - ~11.6x
Performance Gain (vs C non-vectorized)
ARGBAttenuateRow_MSA - ~2.46x
ARGBAttenuateRow_Any_MSA - ~2.45x
ARGBToRGB565DitherRow_MSA - ~9.4x
ARGBToRGB565DitherRow_Any_MSA - ~12.5x
ARGBShuffleRow_MSA - ~5.2x
ARGBShuffleRow_Any_MSA - ~1.9x
ARGBShadeRow_MSA - ~4.3x
ARGBGrayRow_MSA - ~10.5x
ARGBSepiaRow_MSA - ~12.2x
Review-Url: https://codereview.chromium.org/2559693002 .
2016-12-15 12:06:02 +05:30
Manojkumar Bhosale
83f460be33
Add MSA optimized ARGB Multiply/Add/Subtract row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBMultiplyRow_MSA - 1.4x
ARGBAddRow_MSA - 8.6x
ARGBSubtractRow_MSA - 8.6x
ARGBMultiplyRow_Any_MSA - 1.35x
ARGBAddRow_Any_MSA - 7.3x
ARGBSubtractRow_Any_MSA - 7.2x
Performance Gain (vs C non-vectorized)
ARGBMultiplyRow_MSA - 4.4x
ARGBAddRow_MSA - 27x
ARGBSubtractRow_MSA - 22x
ARGBMultiplyRow_Any_MSA - 3.5x
ARGBAddRow_Any_MSA - 23x
ARGBSubtractRow_Any_MSA - 18x
Review URL: https://codereview.chromium.org/2529983002 .
2016-12-02 15:21:10 +05:30
Frank Barchard
e62309f259
clang-format libyuv
...
BUG=libyuv:654
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
f5d5bd88d6
Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gains :- (vs C vectorized)
I422ToARGBRow_MSA : ~1.6x
I422ToRGBARow_MSA : ~1.6x
I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x
Performance Gains :- (vs C non-vectorized)
I422ToARGBRow_MSA : ~7x
I422ToRGBARow_MSA : ~7x
I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x
Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
Review URL: https://codereview.chromium.org/2430313005 .
2016-10-24 15:37:08 -07:00
Frank Barchard
451af5e922
scale by 1 for neon implemented
...
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fcvtn v4.4h, v2.4s \n" // 8 floatsgit
"fcvtn2 v4.8h, v1.4s \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
:
: "cc", "memory", "v1", "v2", "v4"
);
}
void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent
"fmul v1.4s, v1.4s, %3.s[0] \n"
"uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat
"uqshrn2 v4.8h, v1.4s, #13 \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
: "w"(scale * 1.9259299444e-34f) // %3
: "cc", "memory", "v1", "v2", "v4"
);
}
TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2430313008 .
2016-10-21 14:30:03 -07:00
Frank Barchard
f553db2d30
HalfFloatPlane unittest for denormal half floats
...
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.
TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2424233004 .
2016-10-19 18:13:01 -07:00
Frank Barchard
2d80fc3133
Port HalfFloatRow_SSE2 to AVX2 but not using F16C.
...
R=wangcheng@google.com , hubbe@chromium.org
BUG=libyuv:560
Review URL: https://codereview.chromium.org/2421993002 .
2016-10-14 19:01:41 -07:00
Frank Barchard
fdcf524aac
Add f16c (halffloat) cpuid
...
R=wangcheng@google.com , hubbe@chromium.org
BUG=libyuv:560
Review URL: https://codereview.chromium.org/2418763006 .
2016-10-14 16:34:08 -07:00
Frank Barchard
a5e93766a2
Add ARGBExtractAlpha_AVX2 function
...
Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com , magjed@chromium.org
Review URL: https://codereview.chromium.org/2420553002 .
2016-10-13 16:03:43 -07:00
Frank Barchard
af87c11c9a
YUY2ToI422 coalesce rows for small images
...
TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToI422_Opt
Review URL: https://codereview.chromium.org/2393393006 .
2016-10-07 18:35:42 -07:00
Frank Barchard
edd3a84d05
libyuv::YUY2ToY for isolating Y channel of YUY2.
...
This function is the first step of YUY2 To I420.
Provided primarily for diagnostics.
TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToY_Opt
Review URL: https://codereview.chromium.org/2399153004 .
2016-10-07 17:20:30 -07:00
Frank Barchard
a2891ec77c
Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains as below,
YUY2ToI422, YUY2ToI420 :-
YUY2ToYRow_MSA : ~10x
YUY2ToUVRow_MSA : ~11x
YUY2ToUV422Row_MSA : ~9x
YUY2ToYRow_Any_MSA : ~6x
YUY2ToUVRow_Any_MSA : ~5x
YUY2ToUV422Row_Any_MSA : ~4x
UYVYToI422, UYVYToI420 :-
UYVYToYRow_MSA : ~10x
UYVYToUVRow_MSA : ~11x
UYVYToUV422Row_MSA : ~9x
UYVYToYRow_Any_MSA : ~6x
UYVYToUVRow_Any_MSA : ~5x
UYVYToUV422Row_Any_MSA : ~4x
Review URL: https://codereview.chromium.org/2397693002 .
2016-10-07 10:37:22 -07:00
Frank Barchard
3b88a19ab1
YUY2ToI422_Any_Neon clean up to not require 16 pixels
...
YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
the last pixel. The replication was not necessary after a previous
change to treat YUY2 to 4 byte macro pixels.
TBR=harryjin@google.com
BUG=libyuv:648
TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"
Review URL: https://codereview.chromium.org/2399143002 .
2016-10-06 12:11:40 -07:00
Frank Barchard
aa197ee1a3
HalfFloat_SSE2 for Visual C
...
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560, chromium:445071
TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows
R=hubbe@chromium.org , wangcheng@google.com
Review URL: https://codereview.chromium.org/2387713002 .
2016-10-03 10:33:38 -07:00
Frank Barchard
4a14cb2e81
HalfFloat_SSE2 port from C algorithm to SSE2
...
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560, chromium:445071
TEST=untested
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2381493006 .
2016-09-30 09:47:16 -07:00
Frank Barchard
7fc932ddd3
Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
...
BUG=libyuv:560,chromium:445071
TEST=untested
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2371293002 .
2016-09-29 15:06:30 -07:00
Frank Barchard
618149084e
Add MIPS SIMD Arch (MSA) optimized ARGBMirrorRow function
...
This patch adds MSA optimized ARGBMirrorRow function in libYUV project.
Performance gain ~3x
R=fbarchard@google.com
BUG=libyuv:634
Review URL: https://codereview.chromium.org/2368313003 .
2016-09-26 16:28:01 -07:00
Frank Barchard
c5323b0fdc
Add MIPS SIMD Arch (MSA) optimized MirrorRow function
...
As per the preparation patch added in Chromium sources at,
2150943003: Add MIPS SIMD Arch (MSA) build flags for GYP/GN builds
This patch adds first MSA optimized function in libYUV project.
BUG=libyuv:634
R=fbarchard@google.com
Review URL: https://codereview.chromium.org/2285683002 .
2016-09-22 16:12:22 -07:00
Frank Barchard
c244a3e9a0
Add SplitUVPlanes and MergeUVPlanes
...
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists. Also, de-duplicate the
CPU dispatching code for these functions by moving them to helper
functions.
BUG=libyuv:629
R=braveyao@chromium.org
Review URL: https://codereview.chromium.org/2277603004 .
2016-08-24 16:47:24 -07:00
Frank Barchard
161e5c4569
Allow NULL for dst_y in planar formats. BUG=libyuv:631 TEST=unittests build/pass
...
BUG=libyuv:631
TEST=unittests build/pass
R=harryjin@google.com
Review URL: https://codereview.chromium.org/2271053003 .
2016-08-24 10:19:14 -07:00
Frank Barchard
6546096269
ARGBExtractAlpha 16 pixels at a time for ARM
...
arm64 8 TestARGBExtractAlpha (10019 ms) <-original 64 bit code
arm64 8 x2 TestARGBExtractAlpha (7639 ms)
arm64 16 TestARGBExtractAlpha (7369 ms) <- new 64 bit code
thumb32 8 TestARGBExtractAlpha (9505 ms) <- original 32 bit code
thumb32 8 x2 TestARGBExtractAlpha (7400 ms)
thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code
arm32 8 TestARGBExtractAlpha (10002 ms)
BUG=libyuv:572
TESTED=local test on nexus 9
R=harryjin@google.com , wangcheng@google.com
Review URL: https://codereview.chromium.org/2035573002 .
2016-06-07 10:44:28 -07:00
Frank Barchard
b00d40160a
make unittest allocator align to 64 bytes.
...
blur requires memory be aligned. change the unittest allocator to guarantee 64 byte alignment.
re-enable blur any test that fails if memory is unaligned.
TBR=harryjin@google.com
BUG=libyuv:596,libyuv:594
TESTED=local build passes with row.h removed from tests.
Review URL: https://codereview.chromium.org/2019753002 .
2016-05-27 18:02:47 -07:00
Frank Barchard
ade85fb55c
remove row.h from unittests
...
add SIMD_ALIGNED to unittest header.
BUG=libyuv:594
TESTED=local build passes with row.h removed from tests.
R=harryjin@google.com
Review URL: https://codereview.chromium.org/2001373002 .
2016-05-27 10:57:49 -07:00
Magnus Jedvert
942db3016a
Add ARGBExtractAlpha function
...
BUG=libyuv:572
R=fbarchard@google.com
Review URL: https://codereview.chromium.org/1995293002 .
2016-05-26 10:30:57 +02:00
Frank Barchard
3c862e3d29
Fix stride bug for msan on I420Interpolate.
...
When using C version of I420Interpolate for msan, a 50% interpolation
would cause stride to be cast to int, which could cause erroneous
memory reads on 64 bit build.
This CL makes the stride use ptrdiff_t for HalfRow_C
BUG=libyuv:582
TESTED=try bots tests
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1872953002 .
2016-04-08 15:58:53 -07:00
Frank Barchard
0d880e5bc0
rename MIPS_DSPR2 to DSPR2 for consistency
...
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor. This should be named consistently.
Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.
TBR=harryjin@google.com
BUG=libyuv:562
Review URL: https://codereview.chromium.org/1677633002 .
2016-02-05 14:49:54 -08:00
Frank Barchard
58cb534962
Fix memory overwrite in YUY2ToNV12 odd wdiths
...
When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination. Performance
is slower.
Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)
Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545
Review URL: https://codereview.chromium.org/1593833002 .
2016-01-19 11:28:09 -08:00
Frank Barchard
fc52d8ded2
Odd width variation of scale down by 2 for subsampling
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:538
Review URL: https://codereview.chromium.org/1558093003 .
2016-01-06 15:12:17 -08:00
Frank Barchard
f4447745ae
Add rounding to InterpolateRow for improved quality and consistency.
...
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly. Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.
BUG=libyuv:535
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1533643005 .
2015-12-17 15:24:06 -08:00
Frank Barchard
ae55e41851
use rounding in scaledown by 2
...
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
2015-12-14 17:25:36 -08:00
Frank Barchard
a2ea905679
BlendPlane any width.
...
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)
Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)
which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1513443002 .
2015-12-08 18:59:48 -08:00
Frank Barchard
dee77a4ebe
Optimize yuv alpha blend AVX2 code to do 32 pixels at time.
...
out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt
Was LibYUVPlanarTest.I420Blend_Opt (2335 ms)
Now LibYUVPlanarTest.I420Blend_Opt (1937 ms)
vs SSSE3
LibYUVPlanarTest.I420Blend_Opt (2599 ms)
BUG=libyuv:527
R=dhrosa@google.com
Review URL: https://codereview.chromium.org/1505673003 .
2015-12-08 18:20:30 -08:00
Frank Barchard
2657688e70
Add support for odd height YUVA alpha blending.
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1507683003 .
2015-12-07 12:03:20 -08:00
Frank Barchard
48a919d86e
Bug fix for UYVYToNV12 odd height
...
TBR=harryjin@google.com
BUG=libyuv:528
Review URL: https://codereview.chromium.org/1506973002 .
2015-12-07 11:39:48 -08:00
Frank Barchard
bea690b3e0
AVX2 YUV alpha blender and improved unittests
...
AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions.
unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1505433002 .
2015-12-05 22:23:29 -08:00
Frank Barchard
b6f37bd8ec
Interpolate plane initial implementation.
...
YUV version of interpolation between two images.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:526
Review URL: https://codereview.chromium.org/1479593002 .
2015-11-25 16:11:42 -08:00
Frank Barchard
d95d2169d9
rename yuv matrix constants to be more clear about what they are
...
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1429693006 .
2015-11-03 17:09:53 -08:00
Frank Barchard
ce4c2fad1d
Raw 24 bit RGB to RGB24 (bgr)
...
Add unittests that do 1 step conversion vs 2 step conversion.
Tests end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1419103007 .
2015-11-03 10:30:30 -08:00
Frank Barchard
2c7aa0070a
remove I422ToBGRA and use I422ToRGBA internally
...
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.
Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
5d97b93369
refactor I420ToABGR to use I420ToARGBRow
...
Using a transposed conversion matrix, I420ToARGB can output ABGR.
R=harryjin@google.com , xhwang@chromium.org
BUG=libyuv:473
Review URL: https://codereview.chromium.org/1413573010 .
2015-10-30 11:56:57 -07:00
Frank Barchard
76a599ec3b
fix jpeg and bt.709 yuvconstants for neon64.
...
yuv constants for bt.601 were previously ported to neon64, as well
as the code to respect other color spaces. But the jpeg and bt.709
colour conversion constants were still in armv7 form. This changes
the constants for aarch64 builds to be compatible with the code.
yuv constants are now passed as const *
Remove Yvu constants which were used for older version on nv21 but not new code.
TBR=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1398623002 .
2015-10-07 19:46:56 -07:00
Frank Barchard
914a9856c7
Reimplement NV21ToARGB to allow different color matrix.
...
Low level for NV21ToARGB written to accept yuv matrix used by
other YUV to ARGB functions.
Previously NV21 was implemented for Windows using NV12 with a different
matrix that swapped U and V. But the Arm version of the low level does
not allow the matrix U and V contributions to be swapped.
Using a new low level function that reads NV21 and uses the same
yuvconstants as other YUV conversion functions allows an Arm port of
this function.
TBR=harryjin@google.com
BUG=libyuv:500
Review URL: https://codereview.chromium.org/1388273002 .
2015-10-06 20:34:44 -07:00
Frank Barchard
2cc1a2b233
Remove sse2 functions that also have ssse3
...
ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2
Since vast majority of CPUs have SSSE3 now, removing the SSE2
improves the performance of CPU dispatching.
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1377053003 .
2015-09-30 14:24:44 -07:00
Frank Barchard
f96890a0be
yuvconstants for all YUV to RGB conversion functions.
...
R=harryjin@google.com
BUG=libyuv:488
Review URL: https://codereview.chromium.org/1363503002 .
2015-09-22 10:26:03 -07:00
Frank Barchard
278d88f872
Copy Alpha odd width support
...
R=harryjin@google.com
BUG=none
Review URL: https://webrtc-codereview.appspot.com/59369004 .
2015-08-13 15:05:14 -07:00
Frank Barchard
ce98129951
yuy2tonv12
...
R=bcornell@google.com
BUG=libyuv:466
Review URL: https://webrtc-codereview.appspot.com/51309004 .
2015-07-17 16:22:59 -07:00
Frank Barchard
faa4b14f85
uyvy to nv12
...
R=harryjin@google.com
BUG=libyuv:466
Review URL: https://webrtc-codereview.appspot.com/50339004 .
2015-07-17 14:43:19 -07:00
fbarchard@google.com
bd2d903e1b
odd width support for ARGBSobel functions. Improves performance for images that are not a multiple of 8 pixels.
...
BUG=444
TESTED=libyuvTest.ARGBSobel_Opt
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/54589004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1415 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-28 22:22:28 +00:00
fbarchard@google.com
cfce47efc8
Change Sobel to use JPeg Luma calculation instead of extracting G channel. Using luma produces a better sobel that respects all 3 channels of RGB. Historically the G channel was used to improve performance, and because the luma of I420 is a constrained range, hurting quality. Using the JPeg variation of YUV, the luma is more accurate, including cross platform, better optimized for AVX2 and odd widths, and full range.
...
BUG=444
TESTED=ARGBSobelXY_Opt
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/57479004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1414 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-27 22:32:26 +00:00
fbarchard@google.com
8f0b32773c
ARGBToUV AVX2 functions hooked up.
...
BUG=none
TESTED=RGB565ToI420
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/46829004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1359 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-07 00:10:52 +00:00
fbarchard@google.com
e6ca9cc2a2
Scale down by 2 AVX2 port. Processes twice as many pixels as SSE2 and takes advantage of 3 argument instructions to reduce register usage and number of instructions.
...
BUG=314
TESTED=libyuvTest.ScaleDownBy2_Box
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/42959004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1347 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-26 23:21:08 +00:00
fbarchard@google.com
d28cd77f99
Enable assembly for clangcl build on Windows. Previously assembly was disabled so clangcl would work, but only with C code. As clangcl mimics both Visual C and GCC, ifdefs need to pick one or the other or often you'll end up with both. In this CL we disable most Visual C code and use the GCC versions which allow assembly for both 32 and 64 bit intel.
...
BUG=412
TESTED=clang=1 build on windows
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/51389004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1341 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-19 20:36:31 +00:00
fbarchard@google.com
446fa95587
I422ToRGB565, ARGB4444 and ARGB1555 for AVX2
...
BUG=403
TESTED=avx2 emulator
Review URL: https://webrtc-codereview.appspot.com/34359004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1293 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-24 23:14:46 +00:00
fbarchard@google.com
fd8054791c
build fixe for InterpolateRow_MIPS_DSPR2
...
BUG=398
TESTED=untested
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/37999004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1268 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-07 01:01:30 +00:00
fbarchard@google.com
b2a6af1be6
Change rectangle low level functions to use more conventional row functions including 'any' variations. Previously the yuv function SetPlane stored 32 bit values. Now a more conventional memset() style function is used for YUV that stores bytes. On Haswell a rep stosb is used for YUV. Overall benefit of this CL is improved performance for 'any' width, and simpler row assembly instead of full image assembly. Previously ARGBRect used a low level function that supported a rectangle in assembly. Now it uses a row function, and relies on row coalesce to combine into a single low level call.
...
BUG=371
TESTED=untested
R=brucedawson@google.com , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/35689004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1222 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-12 03:58:24 +00:00
fbarchard@google.com
8e3db2dc73
Support invert for ARGBRect and SetPlane
...
BUG=387
TESTED=ARGBRect_Invert
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/37539004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1219 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-07 19:02:01 +00:00
fbarchard@google.com
992c3b089a
Use HAS_ARGBSETROWS_X86 to detect presence of function.
...
BUG=none
TESTED=rectangle unittests
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/35639004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1218 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-07 00:11:51 +00:00
fbarchard@google.com
ef67597b48
ARGBMirror use SSE2 pshufd instruction instead of SSSE3 pshufb.
...
BUG=269
TESTED=local benchmark for ARGBMirror
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/32509004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1176 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-21 19:25:14 +00:00
fbarchard@google.com
91f240c5db
Move sub before branch for loops.
...
Remove CopyRow_x86
Add CopyRow_Any versions for AVX, SSE2 and Neon.
BUG=269
TESTED=local build
R=harryjin@google.com , tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/26209004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1175 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-20 21:14:27 +00:00
fbarchard@google.com
9dd083a512
ARGBMirror Any
...
BUG=none
TESTED=mirror and rotate unittests
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30159004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1172 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-19 00:46:51 +00:00
fbarchard@google.com
59ed448685
MirrorAny functions so assembly can always be used.
...
BUG=none
TESTED=untested
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/29069004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1170 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-18 01:03:47 +00:00
fbarchard@google.com
9ed836b154
The 'Any' versions of functions can handle any width now, so remove the check from the calling code. This has 2 advantages - less code, and less overhead in calling function when any function is NOT used. Downside is more code for case where any is used.
...
BUG=373
TESTED=libyuv_unittest still passes
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/24129004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1143 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-24 23:29:31 +00:00
fbarchard@google.com
0a6dab42c0
Add check for minimum of 8 pixels for any functions and multiple of 8 not 16 for neon functions.
...
BUG=373
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/23189004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1139 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-23 23:05:12 +00:00
fbarchard@google.com
f2fa453b94
Port I422ToABGR to AVX2.
...
BUG=269
TESTED=intelsde on I422ToABGR
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/23149004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1138 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-23 17:20:22 +00:00
fbarchard@google.com
c000955bc0
Port I422ToRGBA to AVX.
...
BUG=269
TESTED=intelsde on I422ToRGBA
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/28769004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1136 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-22 22:41:39 +00:00
fbarchard@google.com
d81dddd3d0
port I420ToBGRA to AVX2.
...
BUG=269
TESTED=c:\intelsde\sde -ast -hsw -- out\release\libyuv_unittest.exe --gtest_filter=*I420ToBGRA*
R=brucedawson@google.com , harryjin@google.com , magjed@chromium.org
Review URL: https://webrtc-codereview.appspot.com/26869004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1127 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-20 19:35:55 +00:00
fbarchard@google.com
055725b8fe
Neon does 8 at a time, so a check is added for any function of I422ToBGRA that width is >= 8 and for fast path that it is a multiple of 8 not 16.
...
BUG=373
TESTED=untested
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/24059004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1126 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-20 18:39:21 +00:00
fbarchard@google.com
f713691a6f
Change elif to endif and if to allow AVX2 as well as SSE2 in future changes instead of one or the other.
...
BUG=none
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30719004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1122 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-16 20:47:22 +00:00
fbarchard@google.com
b720049a54
Make row functions used for planarfunctions and convert use movdqu to relax alignment constraint. Step 1 - make functions unaligned.
...
BUG=365
TESTED=libyuv_unittest passes
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/26709004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1111 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-03 21:11:37 +00:00
fbarchard@google.com
044f914c29
Change scale to unaligned movdqu.
...
BUG=365
TESTED=scale unittests
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/22879004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1101 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-01 01:16:04 +00:00
fbarchard@google.com
d33bf86b25
CopyRow_AVX which supports unaligned pointers for Sandy Bridge CPU.
...
BUG=363
TESTED=out\release\libyuv_unittest --gtest_filter=*ARGBToARGB_*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/31489004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1097 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-09-29 23:53:18 +00:00
fbarchard@google.com
a9ff15b7bb
check copy has different address. If same, skip the copy to avoid valgrind error.
...
BUG=334
TESTED=unittests still pass
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/14679004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1011 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-06-11 00:16:59 +00:00