Manojkumar Bhosale
83f460be33
Add MSA optimized ARGB Multiply/Add/Subtract row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBMultiplyRow_MSA - 1.4x
ARGBAddRow_MSA - 8.6x
ARGBSubtractRow_MSA - 8.6x
ARGBMultiplyRow_Any_MSA - 1.35x
ARGBAddRow_Any_MSA - 7.3x
ARGBSubtractRow_Any_MSA - 7.2x
Performance Gain (vs C non-vectorized)
ARGBMultiplyRow_MSA - 4.4x
ARGBAddRow_MSA - 27x
ARGBSubtractRow_MSA - 22x
ARGBMultiplyRow_Any_MSA - 3.5x
ARGBAddRow_Any_MSA - 23x
ARGBSubtractRow_Any_MSA - 18x
Review URL: https://codereview.chromium.org/2529983002 .
2016-12-02 15:21:10 +05:30
Frank Barchard
da0c29dada
Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBToRGB565Row_MSA - ~1.6x
ARGBToRGB565Row_Any_MSA - ~1.6x
ARGBToARGB1555Row_MSA - ~1.3x
ARGBToARGB1555Row_Any_MSA - ~1.3x
ARGBToARGB4444Row_MSA - ~3.8x
ARGBToARGB4444Row_Any_MSA - ~3.8x
ARGBToUV444Row_MSA - ~2.4x
ARGBToUV444Row_Any_MSA - ~2.4x
Performance Gain (vs C non-vectorized)
ARGBToRGB565Row_MSA - ~2.8x
ARGBToRGB565Row_Any_MSA - ~2.8x
ARGBToARGB1555Row_MSA - ~2.2x
ARGBToARGB1555Row_Any_MSA - ~2.2x
ARGBToARGB4444Row_MSA - ~6.8x
ARGBToARGB4444Row_Any_MSA - ~6.6x
ARGBToUV444Row_MSA - ~6.7x
ARGBToUV444Row_Any_MSA - ~6.7x
Review URL: https://codereview.chromium.org/2520003004 .
2016-11-22 10:47:55 -08:00
Frank Barchard
b1504a8e48
Add MSA optimized ARGBToRGB24Row_MSA and ARGBToRAWRow_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Review URL: https://codereview.chromium.org/2487913004 .
2016-11-18 15:05:10 -08:00
Frank Barchard
e62309f259
clang-format libyuv
...
BUG=libyuv:654
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
10ce829bad
Add MSA optimized I422ToRGB565Row_MSA, I422ToARGB4444Row_MSA and I422ToARGB1555Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
I422ToRGB565Row_MSA : ~1.5x
I422ToRGB565Row_Any_MSA : ~1.5x
I422ToARGB4444Row_MSA : ~1.4x
I422ToARGB4444Row_Any_MSA : ~1.4x
I422ToARGB1555Row_MSA : ~1.4x
I422ToARGB1555Row_Any_MSA : ~1.4x
Performance Gain (vs C non-vectorized)
I422ToRGB565Row_MSA : ~6.8x
I422ToRGB565Row_Any_MSA : ~6.8x
I422ToARGB4444Row_MSA : ~6.6x
I422ToARGB4444Row_Any_MSA : ~6.6x
I422ToARGB1555Row_MSA : ~6.6x
I422ToARGB1555Row_Any_MSA : ~6.6x
Review URL: https://codereview.chromium.org/2445343007 .
2016-10-27 10:47:35 -07:00
Frank Barchard
532f5708a9
Add MSA optimized I422AlphaToARGBRow_MSA and I422ToRGB24Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
I422AlphaToARGBRow_MSA : ~1.4x
I422AlphaToARGBRow_Any_MSA : ~1.4x
I422ToRGB24Row_MSA : ~4.8x
I422ToRGB24Row_Any_MSA : ~4.8x
Performance Gain (vs C non-vectorized)
I422AlphaToARGBRow_MSA : ~7.0x
I422AlphaToARGBRow_Any_MSA : ~7.0x
I422ToRGB24Row_MSA : ~7.9x
I422ToRGB24Row_Any_MSA : ~7.7x
Review URL: https://codereview.chromium.org/2454433003 .
2016-10-26 11:12:17 -07:00
Frank Barchard
f5d5bd88d6
Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gains :- (vs C vectorized)
I422ToARGBRow_MSA : ~1.6x
I422ToRGBARow_MSA : ~1.6x
I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x
Performance Gains :- (vs C non-vectorized)
I422ToARGBRow_MSA : ~7x
I422ToRGBARow_MSA : ~7x
I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x
Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
Review URL: https://codereview.chromium.org/2430313005 .
2016-10-24 15:37:08 -07:00
Frank Barchard
451af5e922
scale by 1 for neon implemented
...
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fcvtn v4.4h, v2.4s \n" // 8 floatsgit
"fcvtn2 v4.8h, v1.4s \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
:
: "cc", "memory", "v1", "v2", "v4"
);
}
void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent
"fmul v1.4s, v1.4s, %3.s[0] \n"
"uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat
"uqshrn2 v4.8h, v1.4s, #13 \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
: "w"(scale * 1.9259299444e-34f) // %3
: "cc", "memory", "v1", "v2", "v4"
);
}
TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2430313008 .
2016-10-21 14:30:03 -07:00
Frank Barchard
f553db2d30
HalfFloatPlane unittest for denormal half floats
...
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.
TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2424233004 .
2016-10-19 18:13:01 -07:00
Frank Barchard
78c58ab8aa
Add MSA optimized ARGB4444ToI420 and ARGB4444ToARGB functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains : (Auto-vectorized C vs MSA SIMD)
ARGB4444ToYRow_MSA : ~3.0x
ARGB4444ToUVRow_MSA : ~1.8x
ARGB4444ToARGBRow_MSA : ~3.4x
ARGB4444ToYRow_Any_MSA : ~2.8x
ARGB4444ToUVRow_Any_MSA : ~1.7x
ARGB4444ToARGBRow_Any_MSA : ~3.2x
Review URL: https://codereview.chromium.org/2421843002 .
2016-10-19 11:10:51 -07:00
Frank Barchard
2d80fc3133
Port HalfFloatRow_SSE2 to AVX2 but not using F16C.
...
R=wangcheng@google.com , hubbe@chromium.org
BUG=libyuv:560
Review URL: https://codereview.chromium.org/2421993002 .
2016-10-14 19:01:41 -07:00
Frank Barchard
a5e93766a2
Add ARGBExtractAlpha_AVX2 function
...
Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com , magjed@chromium.org
Review URL: https://codereview.chromium.org/2420553002 .
2016-10-13 16:03:43 -07:00
Frank Barchard
d363ea6527
Remove I411 support.
...
YUV 411 is very uncommon format. Remove support.
Update documentation to reflect that 411 is deprecated.
Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.
BUG=libyuv:645
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2406123002 .
2016-10-11 11:14:16 -07:00
Frank Barchard
af87c11c9a
YUY2ToI422 coalesce rows for small images
...
TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToI422_Opt
Review URL: https://codereview.chromium.org/2393393006 .
2016-10-07 18:35:42 -07:00
Frank Barchard
a2891ec77c
Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains as below,
YUY2ToI422, YUY2ToI420 :-
YUY2ToYRow_MSA : ~10x
YUY2ToUVRow_MSA : ~11x
YUY2ToUV422Row_MSA : ~9x
YUY2ToYRow_Any_MSA : ~6x
YUY2ToUVRow_Any_MSA : ~5x
YUY2ToUV422Row_Any_MSA : ~4x
UYVYToI422, UYVYToI420 :-
UYVYToYRow_MSA : ~10x
UYVYToUVRow_MSA : ~11x
UYVYToUV422Row_MSA : ~9x
UYVYToYRow_Any_MSA : ~6x
UYVYToUVRow_Any_MSA : ~5x
UYVYToUV422Row_Any_MSA : ~4x
Review URL: https://codereview.chromium.org/2397693002 .
2016-10-07 10:37:22 -07:00
Frank Barchard
3b88a19ab1
YUY2ToI422_Any_Neon clean up to not require 16 pixels
...
YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
the last pixel. The replication was not necessary after a previous
change to treat YUY2 to 4 byte macro pixels.
TBR=harryjin@google.com
BUG=libyuv:648
TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"
Review URL: https://codereview.chromium.org/2399143002 .
2016-10-06 12:11:40 -07:00
Frank Barchard
7018f5be0f
Add MSA optimized I422ToYUY2Row, I422ToUYVYRow functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains :-
I422ToYUY2Row_MSA - ~12x
I422ToYUY2Row_Any_MSA - ~7x
I422ToUYVYRow_MSA - ~12x
I422ToUYVYRow_Any_MSA - ~7x
Review URL: https://codereview.chromium.org/2378753004 .
2016-10-03 18:21:31 -07:00
Frank Barchard
4a14cb2e81
HalfFloat_SSE2 port from C algorithm to SSE2
...
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560, chromium:445071
TEST=untested
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2381493006 .
2016-09-30 09:47:16 -07:00
Frank Barchard
7fc932ddd3
Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
...
BUG=libyuv:560,chromium:445071
TEST=untested
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2371293002 .
2016-09-29 15:06:30 -07:00
Frank Barchard
618149084e
Add MIPS SIMD Arch (MSA) optimized ARGBMirrorRow function
...
This patch adds MSA optimized ARGBMirrorRow function in libYUV project.
Performance gain ~3x
R=fbarchard@google.com
BUG=libyuv:634
Review URL: https://codereview.chromium.org/2368313003 .
2016-09-26 16:28:01 -07:00
Frank Barchard
c5323b0fdc
Add MIPS SIMD Arch (MSA) optimized MirrorRow function
...
As per the preparation patch added in Chromium sources at,
2150943003: Add MIPS SIMD Arch (MSA) build flags for GYP/GN builds
This patch adds first MSA optimized function in libYUV project.
BUG=libyuv:634
R=fbarchard@google.com
Review URL: https://codereview.chromium.org/2285683002 .
2016-09-22 16:12:22 -07:00
Frank Barchard
6546096269
ARGBExtractAlpha 16 pixels at a time for ARM
...
arm64 8 TestARGBExtractAlpha (10019 ms) <-original 64 bit code
arm64 8 x2 TestARGBExtractAlpha (7639 ms)
arm64 16 TestARGBExtractAlpha (7369 ms) <- new 64 bit code
thumb32 8 TestARGBExtractAlpha (9505 ms) <- original 32 bit code
thumb32 8 x2 TestARGBExtractAlpha (7400 ms)
thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code
arm32 8 TestARGBExtractAlpha (10002 ms)
BUG=libyuv:572
TESTED=local test on nexus 9
R=harryjin@google.com , wangcheng@google.com
Review URL: https://codereview.chromium.org/2035573002 .
2016-06-07 10:44:28 -07:00
Magnus Jedvert
942db3016a
Add ARGBExtractAlpha function
...
BUG=libyuv:572
R=fbarchard@google.com
Review URL: https://codereview.chromium.org/1995293002 .
2016-05-26 10:30:57 +02:00
Frank Barchard
fbdc43a03c
fix wrong HAS_ARGBCOPYALPHAROW_SSE2 ifdef
...
TBR=kjellander@chromium.org
BUG=libyuv:593
TESTED=try bots pass.
Review URL: https://codereview.chromium.org/2000393002 .
2016-05-23 16:26:02 -07:00
Frank Barchard
127ff512b3
add perf data files to ignores
...
document play services update
R=jkellander@chromium.org
BUG=none
Review URL: https://codereview.chromium.org/1712463002 .
2016-02-17 21:37:09 -08:00
Frank Barchard
0d880e5bc0
rename MIPS_DSPR2 to DSPR2 for consistency
...
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor. This should be named consistently.
Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.
TBR=harryjin@google.com
BUG=libyuv:562
Review URL: https://codereview.chromium.org/1677633002 .
2016-02-05 14:49:54 -08:00
Frank Barchard
081475b3c8
refactor ARGBToI422 using ARGBToI420 internally
...
R=harryjin@google.com
BUG=libyuv:546
Review URL: https://codereview.chromium.org/1574253004 .
2016-01-12 17:05:49 -08:00
Frank Barchard
f4447745ae
Add rounding to InterpolateRow for improved quality and consistency.
...
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly. Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.
BUG=libyuv:535
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1533643005 .
2015-12-17 15:24:06 -08:00
Frank Barchard
a2ea905679
BlendPlane any width.
...
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)
Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)
which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:527
Review URL: https://codereview.chromium.org/1513443002 .
2015-12-08 18:59:48 -08:00
Frank Barchard
526558b2d8
disable debug build of 411 to work around compiler bug
...
TBR=harryjin@google.com
BUG=libyuv:524
Review URL: https://codereview.chromium.org/1461013002 .
2015-11-19 02:25:00 -08:00
Frank Barchard
72a9e282ec
disable more avx2 functions that dont link in chrome
...
libyuv builds/runs, but when integrated into chromium, produces link errors. unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.
TBR=harryjin@google.com
BUG=libyuv:522
Review URL: https://codereview.chromium.org/1427683004 .
2015-11-09 17:20:02 -08:00
Frank Barchard
860cc0357a
Neon versions of I420AlphaToARGB
...
Add alpha version of YUV to RGB to neon code for ARMv7 and aarch64.
For other YUV to RGB conversions, hoist alpha set to 255 out of loop.
TBR=harryjin@google.com
BUG=libyuv:516
Review URL: https://codereview.chromium.org/1413763017 .
2015-11-03 19:21:36 -08:00
Frank Barchard
ce4c2fad1d
Raw 24 bit RGB to RGB24 (bgr)
...
Add unittests that do 1 step conversion vs 2 step conversion.
Tests end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1419103007 .
2015-11-03 10:30:30 -08:00
Frank Barchard
2c7aa0070a
remove I422ToBGRA and use I422ToRGBA internally
...
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.
Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.
R=harryjin@google.com
BUG=libyuv:518
Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
5d97b93369
refactor I420ToABGR to use I420ToARGBRow
...
Using a transposed conversion matrix, I420ToARGB can output ABGR.
R=harryjin@google.com , xhwang@chromium.org
BUG=libyuv:473
Review URL: https://codereview.chromium.org/1413573010 .
2015-10-30 11:56:57 -07:00
Frank Barchard
b86dbf24d3
refactor I420AlphaToABGR to use I420AlphaToARGB internally
...
swap U and V and transpose conversion matrix, so I420AlphaToARGB and
I420AlphaToABGR share low level code.
Having less code with same performance allows more focused
optimization for future ARM versions.
R=harryjin@google.com
TBR=harryjin@chromium.org
BUG=libyuv:473,libyuv:516
Review URL: https://codereview.chromium.org/1422263002 .
2015-10-27 14:17:21 -07:00
Frank Barchard
cf160cdbaa
implement I444ToABGR by swapping uv and transpose matrix
...
U contributes to B and G. V contributes to R and G.
By swapping U and V, they contribute to the opposite channels. Adjust the matrix so the U contribution is in the matrix location such that it till contribute to the
new B channel and vice versa.
This allows ABGR versions of YUV conversion to use the same low level code as ARGB, just using a different matrix and swapping U and V pointers.
As a result the existing I444ToABGRRow functions are no longer needed and are removed.
Previously this function was only Intel AVX2 optimized for Windwos. Now it is also optimized for Arm and GCC.
ARMv7 Neon
Was LibYUVConvertTest.I444ToABGR_Opt (75971 ms)
Now LibYUVConvertTest.I444ToABGR_Opt (3672 ms)
20.6 times faster.
R=xhwang@chromium.org
BUG=libyuv:515
Review URL: https://codereview.chromium.org/1414133006 .
2015-10-27 10:21:21 -07:00
Frank Barchard
430bb0a0f0
odd width 444 fix
...
TBR=harryjin@google.com
BUG=libyuv:510
Review URL: https://codereview.chromium.org/1415583003 .
2015-10-21 20:03:19 -07:00
Frank Barchard
90335f6043
bug fix for odd width 16/24 bit to i420
...
A bug was introduced on arm when the code for 'any' width switch to
a temporary stack buffer and simd.
The C version handles odd width by doing 1 pixel, instead of averaging 2.
But the SIMD any version is supposed to replicate the last pixel, then
the subsampling in Neon will average the pixel with itself, producing
the same result.
The previous version did this, but only for ARGB 32 bit, which was to
avoid introducing issues with subsampled YUY2 source. This CL adds
replication for RGB 16 bit values.
TBR=harryjin@google.com
BUG=libyuv:510
Review URL: https://codereview.chromium.org/1418983003 .
2015-10-21 18:23:02 -07:00
Frank Barchard
5bf4de0806
width and 3 bug fix in odd width support of ARGBToI411
...
TBR=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1415213002 .
2015-10-21 12:45:08 -07:00
Frank Barchard
ba4b409d51
Fix ARGBToI411 odd width bug.
...
The any function for handling ARGBToI411 was not handling the pixel
replication correctly. On 422 and odd width was handled by duplicating
a pixel of source. 411 needs replication for remainders of 1, 2 or 3
pixels.
The C version was handling odd width but with an average of the remainder
pixels, which does not match the SIMD 'any' handling off remainder.
This changes the odd width handling to mimic the any version.
TBR=harryjin@google.com
BUG=libyuv:491
Review URL: https://codereview.chromium.org/1411733004 .
2015-10-21 12:22:24 -07:00
Frank Barchard
cf19a0c9a2
nv21 any fix
...
R=harryjin@google.com
BUG=libyuv:507
Review URL: https://codereview.chromium.org/1410643002 .
2015-10-15 16:24:51 -07:00
Frank Barchard
76a599ec3b
fix jpeg and bt.709 yuvconstants for neon64.
...
yuv constants for bt.601 were previously ported to neon64, as well
as the code to respect other color spaces. But the jpeg and bt.709
colour conversion constants were still in armv7 form. This changes
the constants for aarch64 builds to be compatible with the code.
yuv constants are now passed as const *
Remove Yvu constants which were used for older version on nv21 but not new code.
TBR=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1398623002 .
2015-10-07 19:46:56 -07:00
Frank Barchard
914a9856c7
Reimplement NV21ToARGB to allow different color matrix.
...
Low level for NV21ToARGB written to accept yuv matrix used by
other YUV to ARGB functions.
Previously NV21 was implemented for Windows using NV12 with a different
matrix that swapped U and V. But the Arm version of the low level does
not allow the matrix U and V contributions to be swapped.
Using a new low level function that reads NV21 and uses the same
yuvconstants as other YUV conversion functions allows an Arm port of
this function.
TBR=harryjin@google.com
BUG=libyuv:500
Review URL: https://codereview.chromium.org/1388273002 .
2015-10-06 20:34:44 -07:00
Frank Barchard
2cc1a2b233
Remove sse2 functions that also have ssse3
...
ARGBBlendRow_SSE2, ARGBAttenuateRow_SSE2, and MirrorRow_SSE2
Since vast majority of CPUs have SSSE3 now, removing the SSE2
improves the performance of CPU dispatching.
R=harryjin@google.com
BUG=none
Review URL: https://codereview.chromium.org/1377053003 .
2015-09-30 14:24:44 -07:00
Frank Barchard
9a0e12f5f1
AVX2 1 step I422AlphaToARGB for gcc and win.
...
C I420AlphaToARGB_Opt (5169 ms)
SSSE3 I420AlphaToARGB_Opt (432 ms)
AVX2 I420AlphaToARGB_Opt (358 ms)
and with premultiplication as 2 step process:
I420AlphaToARGB_Premult (7029 ms)
I420AlphaToARGB_Premult (757 ms)
I420AlphaToARGB_Premult (508 ms)
R=harryjin@google.com
BUG=libyuv:496,libyuv:473
Review URL: https://codereview.chromium.org/1372653003 .
2015-09-25 13:37:42 -07:00
Frank Barchard
e365cdde3b
I420Alpha row function in 1 pass.
...
API change - I420AlphaToARGB takes flag indicating if RGB should be
premultiplied by alpha.
This version implements an efficient SSSE3 version for Windows.
C version done in 2 steps.
Was
libyuvTest.I420AlphaToARGB_Any (1136 ms)
libyuvTest.I420AlphaToARGB_Unaligned (1210 ms)
libyuvTest.I420AlphaToARGB_Invert (966 ms)
libyuvTest.I420AlphaToARGB_Opt (1031 ms)
libyuvTest.I420AlphaToABGR_Any (1020 ms)
libyuvTest.I420AlphaToABGR_Unaligned (1359 ms)
libyuvTest.I420AlphaToABGR_Invert (1082 ms)
libyuvTest.I420AlphaToABGR_Opt (986 ms)
R=harryjin@google.com
BUG=libyuv:496
Review URL: https://codereview.chromium.org/1367093002 .
2015-09-25 10:29:20 -07:00
Frank Barchard
f96890a0be
yuvconstants for all YUV to RGB conversion functions.
...
R=harryjin@google.com
BUG=libyuv:488
Review URL: https://codereview.chromium.org/1363503002 .
2015-09-22 10:26:03 -07:00
Frank Barchard
28427a53e2
I444ToABGR for android
...
Reimplements I444ToARGB as a matrix function.
new I444ToABGR as matrix functions with wrappers and any functions.
Allows for future J444 and H444 versions.
I444ToABGR user level function added.
BUG=libyuv:490, libyuv:449
R=harryjin@google.com
Review URL: https://codereview.chromium.org/1355733002 .
2015-09-18 11:20:58 -07:00
Frank Barchard
28ce7d94f5
j422toabgr neon port using i422toabgr matrix function.
...
R=harryjin@google.com
BUG=libyuv:488
Review URL: https://codereview.chromium.org/1353923003 .
2015-09-17 15:20:55 -07:00
Frank Barchard
6fcbae1409
J422ToARGB Neon but not aarch64
...
TBR=harryjin@google.com
BUG=libyuv:493
Review URL: https://codereview.chromium.org/1348203004 .
2015-09-17 12:43:05 -07:00
Frank Barchard
6a6b67e7a9
Add H422ToARGB armv7 neon version.
...
Patch provided by zhongwei.yao@linaro.org
R=fbarchard@chromium.org , fbarchard@google.com
BUG=libyuv:488
Review URL: https://codereview.chromium.org/1344393002 .
2015-09-17 10:38:15 -07:00
Frank Barchard
509c644245
Add J422ToARGB armv7 neon version.
...
R=fbarchard@chromium.org , fbarchard@google.com
BUG=libyuv:488
Review URL: https://codereview.chromium.org/1334173005 .
2015-09-15 15:01:48 -07:00
Frank Barchard
ed55d24d9f
H420 functionality
...
R=harryjin@google.com
BUG=libyuv:488
Review URL: https://webrtc-codereview.appspot.com/54869004 .
2015-09-06 11:01:40 -07:00
Frank Barchard
67b06e66cb
I422ToABGR for win64. Moves any functions to accomidate win64 subset of formats.
...
TBR=harryjin@google.com
BUG=libyuv:488
Review URL: https://webrtc-codereview.appspot.com/57679004 .
2015-09-03 11:00:18 -07:00
Frank Barchard
7060e0d826
I420ToABGRMatrix functions with J420ToABGR wrapper.
...
Allows direct conversion from JPeg to ABGR for android.
BUG=libyuv:488
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/55719004 .
2015-09-03 10:42:36 -07:00
Frank Barchard
cda9d38a4e
xmmword cast for clang
...
clangcl use compare_win for 32 bit, allowing fallback and enabling avx2 code for clang.
move defines/protos to compare_row.h
fix issue with odd width ARGBCopyAlpha functions by copying destination to temp buffer, then doing alpha copy, then copy back to destination.
R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:484
Review URL: https://webrtc-codereview.appspot.com/59379004 .
2015-08-18 11:13:12 -07:00
Frank Barchard
278d88f872
Copy Alpha odd width support
...
R=harryjin@google.com
BUG=none
Review URL: https://webrtc-codereview.appspot.com/59369004 .
2015-08-13 15:05:14 -07:00
Frank Barchard
93464b926c
Add rotate any support. Fix for sobel for neon which does 16 at a time, not 8. Disable scaling color test that fails on arm. Test is not complete.
...
R=harryjin@google.com
BUG=libyuv:479
Review URL: https://webrtc-codereview.appspot.com/52229004 .
2015-07-28 15:06:20 -07:00
Frank Barchard
97b35daf75
disable faulty avx2 in argb conversions and box filter. and extend temporary buffer to 128 for an avx2 any function.
...
R=harryjin@google.com
BUG=libyuv:462
TESTED=libyuv_unittest run on haswell laptop
Review URL: https://webrtc-codereview.appspot.com/53759004 .
2015-07-07 15:40:24 -07:00
Frank Barchard
0737ff5bd0
128 for avx2
...
R=harryjin@google.com
BUG=libyuv:461
Review URL: https://webrtc-codereview.appspot.com/55649004 .
2015-07-04 09:13:20 -07:00
Frank Barchard
9487b9d6d8
any allow for avx2 32 pixels at a time of argb
...
R=harryjin@google.com
BUG=libyuv:461
Review URL: https://webrtc-codereview.appspot.com/54779004 .
2015-07-01 17:50:48 -07:00
Frank Barchard
553c7f85f1
mirror odd width with simd
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/54769004 .
2015-06-23 17:53:02 -07:00
Frank Barchard
6a9ef1ea36
any 1 to 2 with stride use SIMD
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/54759004 .
2015-06-23 17:08:08 -07:00
Frank Barchard
6dde4f14bd
argb to uv read 4 not 8
...
R=harryjin@google.com
BUG=libyuv:457
Review URL: https://webrtc-codereview.appspot.com/52139004 .
2015-06-23 14:48:37 -07:00
Frank Barchard
54100b91c1
copy 2 rows for interpolate and use SIMD.
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/50279004 .
2015-06-23 10:41:46 -07:00
Frank Barchard
3b5d726a4f
1 to 1 any functions with a parameter use memcpy.
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/57619004 .
2015-06-22 15:08:20 -07:00
Frank Barchard
a0fca88b1d
remove fmemcpy and bump version
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/50269004 .
2015-06-19 17:58:17 -07:00
Frank Barchard
722e87f19f
string.h for memcpy
...
R=harryjin
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/57609004 .
2015-06-19 16:40:22 -07:00
Frank Barchard
dfb2120a42
set us simd
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/55629004 .
2015-06-19 14:18:48 -07:00
Frank Barchard
6608c100e2
copy last 4
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/54749004 .
2015-06-18 17:40:19 -07:00
Frank Barchard
a209d7314b
simd for 1 to 1
...
R=harryjin@google.com , harryjin
BUG=448
Review URL: https://webrtc-codereview.appspot.com/55619004 .
2015-06-17 18:22:11 -07:00
Frank Barchard
72a235af9f
repeat y for yuy2 so that unittests that check the 2nd y on odd widths will match the C and SIMD. The C code duplicates the last Y.
...
R=harryjin@google.com
BUG=libyuv:455
Review URL: https://webrtc-codereview.appspot.com/50249004 .
2015-06-16 16:27:15 -07:00
Frank Barchard
44ff3c333d
split share macro
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/55609004 .
2015-06-16 12:44:15 -07:00
Frank Barchard
2edfe0f0c6
merge
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/52119004 .
2015-06-16 12:17:53 -07:00
Frank Barchard
bff1e18e51
share functions in any
...
R=harryjin@google.com
BUG=libyuv:448
Review URL: https://webrtc-codereview.appspot.com/57599004 .
2015-06-16 12:05:39 -07:00
Frank Barchard
0b3294af6c
disable I422ToYUY2 sse for odd sizes.
...
BUG=455
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/51239004 .
2015-06-16 11:09:03 -07:00
Frank Barchard
68e8d9bebd
Math functions need BPP of 4 for odd width support on first source argument
...
BUG=455
TESTED=ARGBMultply
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/54719004 .
2015-06-16 09:34:51 -07:00
Frank Barchard
b071a3d321
subsample yuy2 dest
...
BUG=455
TESTED=out\release\libyuv_unittest.exe --gtest_catch_exceptions=0 --gtest_filter=*ARGBToYUY2*
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/58429004 .
2015-06-15 12:01:28 -07:00
Frank Barchard
58ca9f899e
remainder done unconditionally and with a variable
...
BUG=448
TESTED=local build
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/57559004 .
2015-06-12 17:21:41 -07:00
Frank Barchard
242cb2554c
nv12 odd width support using SIMD for remainder
...
BUG=libyuv:448
TESTED=NV21ToRGB565_Any etc
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/53689004 .
2015-06-12 16:07:20 -07:00
Frank Barchard
cae07fb0e0
bump subsampling up
...
BUG=455
TESTED=libyuvTest.ARGBToYUY2_Random
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/58419004 .
2015-06-12 15:25:03 -07:00
Frank Barchard
03da5420bc
use SIMD for I420ToARGB odd widths in a temporary buffer instead of using C for remainder.
...
Enter a description of the change.
use SIMD for I420ToARGB odd widths in a temporary buffer instead of using C for remainder. Currently the C code does not exactly match the SIMD code, so an odd width produces different pixels than an even width, causing a subtle artifact. By using SIMD consistently, there is no difference in even and odd widths. Also the SIMD performance is faster, so even with overhead of memcpy, performance improves.
BUG=447
TESTED=out\release\libyuv_unittest.exe --gtest_filter=*I420ToARGB*
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/55579004 .
2015-06-11 16:38:52 -07:00
fbarchard@google.com
bd2d903e1b
odd width support for ARGBSobel functions. Improves performance for images that are not a multiple of 8 pixels.
...
BUG=444
TESTED=libyuvTest.ARGBSobel_Opt
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/54589004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1415 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-28 22:22:28 +00:00
fbarchard@google.com
cfce47efc8
Change Sobel to use JPeg Luma calculation instead of extracting G channel. Using luma produces a better sobel that respects all 3 channels of RGB. Historically the G channel was used to improve performance, and because the luma of I420 is a constrained range, hurting quality. Using the JPeg variation of YUV, the luma is more accurate, including cross platform, better optimized for AVX2 and odd widths, and full range.
...
BUG=444
TESTED=ARGBSobelXY_Opt
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/57479004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1414 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-27 22:32:26 +00:00
fbarchard@google.com
bb5a009d11
ARGB4444ToARGB and ARGB1555ToARGB ported to AVX2.
...
BUG=421
TESTED=out\release\libyuv_unittest --gtest_filter=*ARGB4444ToARGB*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/48009004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1363 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-07 23:52:57 +00:00
fbarchard@google.com
2827277496
port RGB565ToARGB to AVX2.
...
BUG=421
TESTED=out\release\libyuv_unittest --gtest_filter=*RGB565ToARGB*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/49609004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1357 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-06 19:24:23 +00:00
fbarchard@google.com
0e4388aea3
I422ToRGB24 AVX2 and I422ToRAW
...
BUG=none
TESTED=I422ToRGB24 unittest
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/46619004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1337 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-17 17:25:27 +00:00
yang.zhang@arm.com
e246e6c18f
Add ARGBToRGB565DitherRow_NEON for ARM32/64
...
ARM32/64 NEON versions of ARGBToRGB565DitherRow_NEON are implemented.
BUG=407
TESTED=libyuvTest.* on ARM32/64 with Android
R=fbarchard@google.com
Change-Id: Ia689170fb39db964392e5e1113801592ab0628bf
Review URL: https://webrtc-codereview.appspot.com/49409004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1335 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-17 02:22:25 +00:00
fbarchard@google.com
92f7f421fd
rename I400 to J400 and I400 reference to I400. J400 is a simple replication of values to convert to RGB, which is what the old I400 was. I400 reference is the Y part of the YUV formula, so renaming that to I400.
...
BUG=none
TESTED=libyuvTest (5925 ms total)
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/50369005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1333 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-17 00:01:18 +00:00
fbarchard@google.com
f2fad0faa5
Optimized J422ToARGB.
...
BUG=414
TESTED=J422ToARGB unittest
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/42799004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1328 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-16 18:08:30 +00:00
fbarchard@google.com
685b92b0a6
I400ToARGB_AVX2 port from SSE2 to AVX2.
...
BUG=403
TESTED=libyuv_unittest.exe --gtest_catch_exceptions=0 --gtest_filter=*I400ToARGB*
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/46569004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1322 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-11 18:12:17 +00:00
fbarchard@google.com
f5a7b2b48a
I411ToARGB AVX2 version
...
BUG=403
TESTED=I411ToARGB unittest
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/42689004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1321 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-11 00:08:56 +00:00
fbarchard@google.com
cdd80e04c9
Port I444ToARGB to AVX2.
...
BUG=403
TESTED=I444ToARGB unittests
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/45589004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1314 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-09 21:56:48 +00:00
fbarchard@google.com
bdeb9ac584
switch from 8x8 to 4x4 matrix for dithering
...
BUG=407
TESTED=Dither unittests
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/46459004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1310 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-06 18:28:00 +00:00
fbarchard@google.com
0fe4abbc5c
ARGBToRGB565 AVX2 with dithering
...
BUG=407
TESTED=ARGBToRGB565Dither unittest
R=brucedawson@google.com , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/44519004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1309 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-04 22:31:43 +00:00
fbarchard@google.com
9245317e16
ARGBToRGB565 SSE2 port.
...
BUG=407
TESTED=ARGBToRGB565Dither unittest
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/41039004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1308 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-04 00:00:50 +00:00
fbarchard@google.com
933bd40c3c
port ARGBToRGB565 and ARGB1555 to AVX2. Enable functions that use ARGBToRGB565 AVX2 code. Add ARGBToRGB565Dither function.
...
BUG=403
TESTED=local windows build
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/42109004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1302 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-27 21:15:28 +00:00
fbarchard@google.com
bffd326f74
AVX2 version of ARGBToARGB4444
...
BUG=403
TESTED=local build on windows
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/43429004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1297 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-25 17:26:28 +00:00
fbarchard@google.com
d96047761e
AVX2 version of NV12ToARGB
...
BUG=403
TESTED=untested
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/40089004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1295 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-24 23:45:08 +00:00
fbarchard@google.com
446fa95587
I422ToRGB565, ARGB4444 and ARGB1555 for AVX2
...
BUG=403
TESTED=avx2 emulator
Review URL: https://webrtc-codereview.appspot.com/34359004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1293 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-24 23:14:46 +00:00
fbarchard@google.com
e2f1a75474
move mask to last parameter of any functions for consistency.
...
BUG=none
TESTED=local libyuv unittest passes
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/43419004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1292 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-24 21:18:30 +00:00
fbarchard@google.com
239962fa00
YUY2 and UYVY to ARGB AVX2 versions via wrappers.
...
BUG=403
TESTED=UNTESTED
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/34339004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1291 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-24 18:58:51 +00:00
kjellander@google.com
28d1a582ba
Revert "YUY2ToARGB and UYVYToARGB AVX with C wrapper to call lower level conversions."
...
This reverts r1288 due to breaking compilation on bots:
http://build.chromium.org/p/client.libyuv/builders/Mac64%20Debug/builds/365
http://build.chromium.org/p/client.libyuv/builders/Linux64%20Debug/builds/667
TBR=fbarchard@google.com
TESTED=Reverted locally and all built fine again.
Review URL: https://webrtc-codereview.appspot.com/40879004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1289 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-23 09:22:22 +00:00
fbarchard@google.com
b52606c024
YUY2ToARGB and UYVYToARGB AVX with C wrapper to call lower level conversions.
...
BUG=403
TESTED=convert unittest
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/40839004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1288 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-21 00:49:35 +00:00
fbarchard@google.com
0887315390
Remove bayer format support from libyuv. This format is very rare and used on legacy hardware. Its not well optimized and has bugs related to odd widths. Removing the format will allow tests to pass under more circumstances, run faster and allow focus on higher priority quality and performance issues.
...
BUG=301
TESTED=local unittests build/pass on windows gyp build.
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/38059004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1270 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-09 19:58:19 +00:00
fbarchard@google.com
3982998c7c
YToARGB AVX2 port from SSE2
...
BUG=393
TESTED=YToARGB unittest
R=brucedawson@google.com , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/41679004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1258 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-02-03 01:35:11 +00:00
fbarchard@google.com
63882a356f
Disable YToARGB assembly which is off by 1
...
BUG=392
TESTED=libyuvTest.YToARGB_Opt
Review URL: https://webrtc-codereview.appspot.com/40549004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1250 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-26 17:16:44 +00:00
fbarchard@google.com
b2a6af1be6
Change rectangle low level functions to use more conventional row functions including 'any' variations. Previously the yuv function SetPlane stored 32 bit values. Now a more conventional memset() style function is used for YUV that stores bytes. On Haswell a rep stosb is used for YUV. Overall benefit of this CL is improved performance for 'any' width, and simpler row assembly instead of full image assembly. Previously ARGBRect used a low level function that supported a rectangle in assembly. Now it uses a row function, and relies on row coalesce to combine into a single low level call.
...
BUG=371
TESTED=untested
R=brucedawson@google.com , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/35689004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1222 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-01-12 03:58:24 +00:00
fbarchard@google.com
966233e5eb
Remove sub 16 from yuv conversions and change bias to include it.
...
BUG=388
TESTED=out\release\libyuv_unittest --gtest_catch_exceptions=0 --gtest_filter=*420ToARGB_Opt | sortms
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/34609004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1216 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-12-31 01:07:02 +00:00
fbarchard@google.com
40e3457574
J420ToARGB jpeg variation of YUV color space to ARGB.
...
BUG=241
TESTED=J420ToARGB unittest
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/32929004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1212 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-12-29 19:17:53 +00:00
fbarchard@google.com
ada2a3eb12
Fix for ARGBToY on AVX
...
BUG=269
TESTED=local build on osx
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/29229005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1198 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-12-13 01:44:33 +00:00
fbarchard@google.com
ef67597b48
ARGBMirror use SSE2 pshufd instruction instead of SSSE3 pshufb.
...
BUG=269
TESTED=local benchmark for ARGBMirror
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/32509004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1176 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-21 19:25:14 +00:00
fbarchard@google.com
91f240c5db
Move sub before branch for loops.
...
Remove CopyRow_x86
Add CopyRow_Any versions for AVX, SSE2 and Neon.
BUG=269
TESTED=local build
R=harryjin@google.com , tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/26209004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1175 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-20 21:14:27 +00:00
fbarchard@google.com
9dd083a512
ARGBMirror Any
...
BUG=none
TESTED=mirror and rotate unittests
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30159004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1172 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-19 00:46:51 +00:00
fbarchard@google.com
59ed448685
MirrorAny functions so assembly can always be used.
...
BUG=none
TESTED=untested
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/29069004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1170 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-18 01:03:47 +00:00
fbarchard@google.com
bb3a4b41e9
vextractf128 requuires a constant argument for which dqword to extract, so add a new macro.
...
BUG=none
TESTED=local build on clang for osx
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30869004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1153 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-04 21:05:55 +00:00
fbarchard@google.com
067892c5a1
Port YUY2ToYRow_AVX2 and UYVYToYRow_AVX2 to gcc/NaCL from Windows AVX code.
...
BUG=269
TESTED=ncval
R=brucedawson@google.com , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/25039004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1151 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-11-03 18:30:17 +00:00
fbarchard@google.com
88ac01aed0
Change YAny functions to share, and use mask for how many bytes at a time for simd vs C.
...
BUG=373
TESTED=libyuv_unittest passes
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/31819004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1142 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-24 22:58:38 +00:00
fbarchard@google.com
78a3a6b345
Change Any functions that convert 1 to 1 formats, memcpy style, so use C for remainder to allow a minimum width of 1. This has some advantages - allows function to be used even with SIMD that only allows aligned memory. Fewer macros, used by more functions. SIMD is not used unaligned avoiding page/cache split. No overlap so it can be used in place. Disadvantage is it will be slower if close to the maximum number of non-SIMD pixels.
...
BUG=373
TESTED=libyuv_unittest still passes
R=brucedawson@google.com , tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/23209004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1141 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-24 22:17:59 +00:00
fbarchard@google.com
1f151f62a9
add a check that the simd function should be called. allows any functions to support any width, simplifing and speeding up the calling code.
...
BUG=373
TESTED=try bots
R=brucedawson@chromium.org , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/25949004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1140 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-24 00:45:27 +00:00
fbarchard@google.com
f2fa453b94
Port I422ToABGR to AVX2.
...
BUG=269
TESTED=intelsde on I422ToABGR
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/23149004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1138 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-23 17:20:22 +00:00
fbarchard@google.com
c000955bc0
Port I422ToRGBA to AVX.
...
BUG=269
TESTED=intelsde on I422ToRGBA
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/28769004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1136 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-22 22:41:39 +00:00
fbarchard@google.com
af6f25245e
Reenable AVX2 scaling with bug fix for any width
...
BUG=376
TESTED=unittest on scale functions
R=brucedawson@google.com , harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30759004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1135 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-22 01:15:20 +00:00
fbarchard@google.com
d81dddd3d0
port I420ToBGRA to AVX2.
...
BUG=269
TESTED=c:\intelsde\sde -ast -hsw -- out\release\libyuv_unittest.exe --gtest_filter=*I420ToBGRA*
R=brucedawson@google.com , harryjin@google.com , magjed@chromium.org
Review URL: https://webrtc-codereview.appspot.com/26869004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1127 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-20 19:35:55 +00:00
fbarchard@google.com
008ce53ac4
pavgb with memory op requires alignment. This CL disables conversions that use pavgb, and resolves scale by 3/8 unittest for checking alignment works. The 3/8 code used a pavgb with a memory operand. tests are added for scaling and allow unaligning on purpose.
...
BUG=365
TESTED=local change to force unaligned memory fails on some conversions and scaling code.
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/29699004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1114 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-07 01:57:34 +00:00
fbarchard@google.com
ca308327d2
Remove unaligned functions, since most function support unaligned memory now. This reduces complexity and improves performance for unaligned cases because C code can be avoided, and overhead is less. Downside is old cpus (core2 and earlier) will be slower for aligned memory case. Except mips, which has alignment requirement, but remove unaligned variant.
...
BUG=365
TESTED=unittest builds and passes locally
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/24839004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1113 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-07 00:59:31 +00:00
fbarchard@google.com
044f914c29
Change scale to unaligned movdqu.
...
BUG=365
TESTED=scale unittests
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/22879004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1101 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-10-01 01:16:04 +00:00
ashok.bhat@gmail.com
2df5743bd4
Row AArch64 Neon implementation - Part 6
...
BUG=319
TESTED=libyuv_unittest
R=fbarchard@google.com
Change-Id: I5d93eb184ba873d5e7637a3b5a830be39a967c6f
Signed-off-by: Ashok Bhat <ashok.bhat@arm.com>
Review URL: https://webrtc-codereview.appspot.com/15239004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1069 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-08-29 08:12:51 +00:00
ashok.bhat@gmail.com
cb8be2fb2b
Row AArch64 Neon implementation - Part 4
...
BUG=319
TESTED=libyuv_unittest
R=fbarchard@chromium.org , fbarchard@google.com
Change-Id: If145660d999e95246efeedb64a45ba70bf0fe23e
Signed-off-by: Ashok Bhat <ashok.bhat@arm.com>
Review URL: https://webrtc-codereview.appspot.com/13199004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1054 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-08-21 09:55:58 +00:00
fbarchard@google.com
e6dd1fa024
Port I420ToARGB to intrinsics for win64
...
BUG=336
TESTED=out\release_x64\libyuv_unittest --gunit_also_run_disabled_tests --gtest_filter=*I420To*B*
R=bryan.bernhart@intel.com , tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/15809005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1018 16f28f9a-4ce2-e073-06de-1de4eb20be90
2014-06-24 20:45:45 +00:00
fbarchard@google.com
08b24a4232
Bayer GG specialized version for Sobel
...
BUG=none
TEST=Sobel
R=johannkoenig@google.com
Review URL: https://webrtc-codereview.appspot.com/2849004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@826 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-10-25 07:39:43 +00:00
fbarchard@google.com
212a1a5000
ARGBShuffle_SSE2 for lower end CPUs
...
BUG=271
TESTED=out\release\libyuv_unittest --gtest_filter=**R*ToARGB*
R=johannkoenig@google.com , ryanpetrie@google.com
Review URL: https://webrtc-codereview.appspot.com/2361004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@807 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-10-05 04:17:50 +00:00
fbarchard@google.com
2154de414c
Port InterpolateRows to AVX2
...
BUG=264
TEST=ARGBInterpolate*
R=changjun.yang@intel.com
Review URL: https://webrtc-codereview.appspot.com/2160004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@777 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-09-03 07:18:21 +00:00
fbarchard@google.com
7fa21d677c
More ifdefs to build all libyuv and not get link errors on missing assembly
...
BUG=253
TEST=nacl validator
R=nfullagar@google.com , ryanpetrie@google.com
Review URL: https://webrtc-codereview.appspot.com/2024004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@756 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-08-13 21:54:23 +00:00
fbarchard@google.com
b911428afd
Adapt row interpolator to do YUV as well as ARGB without extrude so it can be used in I420Scale.
...
BUG=237
TEST=Scale*
R=ryanpetrie@google.com
Review URL: https://webrtc-codereview.appspot.com/1587004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@710 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-05-30 23:42:27 +00:00
fbarchard@google.com
cd6056c01c
InterpolateAny for unaligned and odd width interpolate. To be used in ARGBScaler in future.
...
BUG=208
TEST=ARGBInterpolate255_Unaligned
Review URL: https://webrtc-codereview.appspot.com/1324004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@662 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-04-15 03:05:08 +00:00
fbarchard@google.com
c297d103f1
I420ToARGB for Haswell.
...
BUG=216
TEST=I420ToARGB
Review URL: https://webrtc-codereview.appspot.com/1314004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@660 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-04-12 07:26:24 +00:00
fbarchard@google.com
c56a55fc72
Sobel and SobelXY Neon port. Improved Bayer - did 8 at time version, and specialized G channel version.
...
BUG=201
TEST=libyuvTest.TestSobel and libyuvTest.TestSobelXY
Review URL: https://webrtc-codereview.appspot.com/1279006
git-svn-id: http://libyuv.googlecode.com/svn/trunk@642 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-04-04 18:33:44 +00:00
fbarchard@google.com
91c50c3a7d
ARGBToYJ_AVX2 port to AVX2.
...
BUG=none
TEST=none
Review URL: https://webrtc-codereview.appspot.com/1272008
git-svn-id: http://libyuv.googlecode.com/svn/trunk@640 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-04-03 23:47:10 +00:00
fbarchard@google.com
050b39a5cb
Recomputed JPeg coefficients normalized to 128. Apply to ARGBGray function reusing YJ function/coefficients and rounding.
...
BUG=201
TESTED=Gray unittest improved
Review URL: https://webrtc-codereview.appspot.com/1269006
git-svn-id: http://libyuv.googlecode.com/svn/trunk@629 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-04-01 20:07:14 +00:00
fbarchard@google.com
cfaa66c041
ARGBToJ420 and ARGBToJ400 - Full range YUV Jpeg style.
...
BUG=159
TEST=*J4*
Review URL: https://webrtc-codereview.appspot.com/1243004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@622 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-26 09:14:46 +00:00
fbarchard@google.com
e8df16bd7c
Sobel use G channel for consistency on all CPUs, better performance and full range of 0 to 255.
...
BUG=201
TESTED=out\release\libyuv_unittest --gtest_filter=*Sobel*
Review URL: https://webrtc-codereview.appspot.com/1225004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@614 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-22 04:47:14 +00:00
fbarchard@google.com
5ca144d214
NV12 to/from I420 coalesce rows for Y and UV independently.
...
BUG=197
TESTED=*NV12*_Opt
Review URL: https://webrtc-codereview.appspot.com/1201004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@607 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-14 17:57:47 +00:00
fbarchard@google.com
07a99dc278
Row coalesce convert_from.cc for I420ToNV12, YUY2ToI422, UYVYToI422
...
BUG=197
TESTED=I420ToNV12_Opt
Review URL: https://webrtc-codereview.appspot.com/1196004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@605 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-13 17:48:22 +00:00
fbarchard@google.com
a3956fc5f4
lint fix for Intel and YANY fix for Neon
...
BUG=none
TEST=none
Review URL: https://webrtc-codereview.appspot.com/1161005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@602 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-12 23:22:32 +00:00
fbarchard@google.com
518833b983
Fix RGB565ToARGB_Any which uses SSE2 that requires ARGB alignment. Add row coalescing to convert_argb.cc. Improve coalescing on planar_functions.cc and convert_from_argb.cc. Use stride * 2 == width to test for even width. Apply coalescing to all functions that have same vertical subsampling.
...
BUG=197
TESTED=libyuv unittest passes where _Opt uses row coalescing.
Review URL: https://webrtc-codereview.appspot.com/1186004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@601 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-12 21:44:56 +00:00
fbarchard@google.com
1096543eaa
ARGBShuffle AVX2
...
BUG=196
TESTED=BGRAToARGB*
Review URL: https://webrtc-codereview.appspot.com/1171006
git-svn-id: http://libyuv.googlecode.com/svn/trunk@596 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-08 23:22:32 +00:00
fbarchard@google.com
51d3e236cb
AVX2 math functions for images
...
BUG=none
TEST=ARGBMultiply ARGBAdd and ARGBSubtract unittests.
Review URL: https://webrtc-codereview.appspot.com/1146006
git-svn-id: http://libyuv.googlecode.com/svn/trunk@588 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-03-04 21:50:23 +00:00
fbarchard@google.com
c0d9c34690
Attenuate and Unattenuate Any variations for sse2, sss3 and avx2
...
BUG=190
TESTED=out\release\libyuv_unittest --gtest_filter=*Unatt*
Review URL: https://webrtc-codereview.appspot.com/1121005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@579 16f28f9a-4ce2-e073-06de-1de4eb20be90
2013-02-21 00:55:47 +00:00