fbarchard@google.com
|
b2a6af1be6
|
Change rectangle low level functions to use more conventional row functions including 'any' variations. Previously the yuv function SetPlane stored 32 bit values. Now a more conventional memset() style function is used for YUV that stores bytes. On Haswell a rep stosb is used for YUV. Overall benefit of this CL is improved performance for 'any' width, and simpler row assembly instead of full image assembly. Previously ARGBRect used a low level function that supported a rectangle in assembly. Now it uses a row function, and relies on row coalesce to combine into a single low level call.
BUG=371
TESTED=untested
R=brucedawson@google.com, harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/35689004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1222 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2015-01-12 03:58:24 +00:00 |
|
fbarchard@google.com
|
992c3b089a
|
Use HAS_ARGBSETROWS_X86 to detect presence of function.
BUG=none
TESTED=rectangle unittests
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/35639004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1218 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2015-01-07 00:11:51 +00:00 |
|
fbarchard@google.com
|
966233e5eb
|
Remove sub 16 from yuv conversions and change bias to include it.
BUG=388
TESTED=out\release\libyuv_unittest --gtest_catch_exceptions=0 --gtest_filter=*420ToARGB_Opt | sortms
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/34609004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1216 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-12-31 01:07:02 +00:00 |
|
fbarchard@google.com
|
7892ea1fe1
|
Fix for ARGBToUV on AVX2
BUG=269
TESTED=local testing
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/33669004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1202 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-12-15 18:59:23 +00:00 |
|
fbarchard@google.com
|
ddee77cdbd
|
Fix for I422ToRGBA when I422ToARGB is not enabled for AVX2
BUG=269
TESTED=local windows build
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/32339004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1201 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-12-15 18:28:59 +00:00 |
|
fbarchard@google.com
|
f5f5d15dcd
|
Fix register order for ARGBToUV_AVX2
BUG=269
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/29249004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1200 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-12-15 18:07:09 +00:00 |
|
fbarchard@google.com
|
540e8af80c
|
remove add 16 from ARGBToYJ and add rounding, for consistency with Windows version. row.h header macros sorted alphabetically.
BUG=269
TESTED=untested
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/32579005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1185 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-12-02 22:37:47 +00:00 |
|
fbarchard@google.com
|
c5aac16af9
|
Remove loop alignment for benefit of modern cpus that dont require alignment.
BUG=none
TESTED=local libyuv unittest passes
R=brucedawson@google.com, tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/32159004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1180 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-24 21:26:22 +00:00 |
|
fbarchard@google.com
|
ef14972df0
|
MergeUV AVX2 use vextractf128 to store results to avoid shuffling.
BUG=none
TESTED=intel sde on unittests
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/33369004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1178 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-22 03:33:33 +00:00 |
|
fbarchard@google.com
|
ef67597b48
|
ARGBMirror use SSE2 pshufd instruction instead of SSSE3 pshufb.
BUG=269
TESTED=local benchmark for ARGBMirror
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/32509004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1176 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-21 19:25:14 +00:00 |
|
fbarchard@google.com
|
91f240c5db
|
Move sub before branch for loops.
Remove CopyRow_x86
Add CopyRow_Any versions for AVX, SSE2 and Neon.
BUG=269
TESTED=local build
R=harryjin@google.com, tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/26209004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1175 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-20 21:14:27 +00:00 |
|
fbarchard@google.com
|
b9d17e1d79
|
Fix offset in addresses for windows. Wants it within [] now.
BUG=none
TESTED=local windows build.
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/32479004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1168 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-17 19:50:42 +00:00 |
|
fbarchard@google.com
|
5822505e0a
|
Remove extra unaligned loop from alphablender. Both aligned and unaligned loops were the same, so remove the extra.
BUG=none
TESTED=try bots.
R=brucedawson@google.com, harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/29059004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1166 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-17 18:33:07 +00:00 |
|
fbarchard@google.com
|
1eb636d249
|
remove initial lea in mirror functions and add the offset in the address mode.
BUG=none
TESTED=local libyuv unittests on windows
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/26169004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1165 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-17 18:16:23 +00:00 |
|
fbarchard@google.com
|
35508d0979
|
Mirror_AVX2 ported to GCC.
BUG=269
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/32079004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1164 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-13 23:11:10 +00:00 |
|
fbarchard@google.com
|
91000425a3
|
ARGBUnattenuate_AVX2 ported to GCC. Minor cleanup of constants to use broadcast to make 16 byte constant instead of 32 byte.
BUG=269
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30999004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1163 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-13 17:57:33 +00:00 |
|
fbarchard@google.com
|
ec1f854f86
|
Use broadcast to duplicate constants from 16 bytes to 32 bytes to save data space.
BUG=none
TESTED=intelsde
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/32029004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1161 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-11-12 01:45:27 +00:00 |
|
fbarchard@google.com
|
ee4bc0d834
|
vzeroupper moved to just before ret. in one case it was done after ret, which is a bug that would cause a performance stall.
BUG=none
TESTED=try bots
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/24159004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1149 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-30 19:27:21 +00:00 |
|
fbarchard@google.com
|
2edea9454d
|
Fix lint extraneous warning on row_win assembly by disabling the warning for those affected lines.
BUG=none
TESTED=line row_win.cc
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/29969004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1144 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-27 16:27:48 +00:00 |
|
fbarchard@google.com
|
f2fa453b94
|
Port I422ToABGR to AVX2.
BUG=269
TESTED=intelsde on I422ToABGR
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/23149004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1138 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-23 17:20:22 +00:00 |
|
fbarchard@google.com
|
22eb5965fc
|
Optimize I422ToRGBA for AVX2 by hoisting ymm5 initialization and using different register for output of unpack.
BUG=269
TESTED=intelsde on I422ToABGR
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/29889004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1137 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-22 23:39:16 +00:00 |
|
fbarchard@google.com
|
c000955bc0
|
Port I422ToRGBA to AVX.
BUG=269
TESTED=intelsde on I422ToRGBA
R=brucedawson@google.com
Review URL: https://webrtc-codereview.appspot.com/28769004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1136 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-22 22:41:39 +00:00 |
|
fbarchard@google.com
|
af6f25245e
|
Reenable AVX2 scaling with bug fix for any width
BUG=376
TESTED=unittest on scale functions
R=brucedawson@google.com, harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30759004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1135 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-22 01:15:20 +00:00 |
|
fbarchard@google.com
|
4ec55a21cf
|
Use macros to simplify I422ToARGB for AVX code.
BUG=269
TESTED=local build with Visual C
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/24079004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1133 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-21 22:48:32 +00:00 |
|
fbarchard@google.com
|
a063a66de4
|
Change I422ToARGB_AVX2 register usage to match SSSE3. ymm0 = B, ymm1 = G, ymm2 = R.
BUG=269
TESTED=intelsde passes on unittests.
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/28759004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1132 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-21 19:02:06 +00:00 |
|
fbarchard@google.com
|
d81dddd3d0
|
port I420ToBGRA to AVX2.
BUG=269
TESTED=c:\intelsde\sde -ast -hsw -- out\release\libyuv_unittest.exe --gtest_filter=*I420ToBGRA*
R=brucedawson@google.com, harryjin@google.com, magjed@chromium.org
Review URL: https://webrtc-codereview.appspot.com/26869004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1127 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-20 19:35:55 +00:00 |
|
fbarchard@google.com
|
3dbaaf0032
|
switch win64 intrinsics to loadu / storeu for unaligned memory.
BUG=372
TESTED=untested
R=brucedawson@google.com, harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30729004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1124 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-16 23:46:48 +00:00 |
|
fbarchard@google.com
|
205c1440cf
|
Use movdqu then pavgb to allow unaligned memory for rgb subsampling code. Allows this assembly to be used for unaligned pointers as well as aligned ones with no performance hit when memory is aligned on a modern cpu.
BUG=365
TESTED=libyuvTest.ARGBToI420_Unaligned (453 ms)
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/30679004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1116 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-07 19:47:06 +00:00 |
|
fbarchard@google.com
|
ca308327d2
|
Remove unaligned functions, since most function support unaligned memory now. This reduces complexity and improves performance for unaligned cases because C code can be avoided, and overhead is less. Downside is old cpus (core2 and earlier) will be slower for aligned memory case. Except mips, which has alignment requirement, but remove unaligned variant.
BUG=365
TESTED=unittest builds and passes locally
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/24839004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1113 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-07 00:59:31 +00:00 |
|
fbarchard@google.com
|
b720049a54
|
Make row functions used for planarfunctions and convert use movdqu to relax alignment constraint. Step 1 - make functions unaligned.
BUG=365
TESTED=libyuv_unittest passes
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/26709004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1111 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-03 21:11:37 +00:00 |
|
fbarchard@google.com
|
d83f63a3b4
|
InterpolateRow used for scale handle unaligned memory. Remove HalfRow which is not used.
BUG=367
TESTED=unittest on I422ToI420
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/28639004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1107 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-03 17:37:11 +00:00 |
|
fbarchard@google.com
|
455ae94c60
|
Make rotate SIMD allow unaligned pointers.
BUG=365
TESTED=libyuv_unittest
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/22899004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1102 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-02 17:56:48 +00:00 |
|
fbarchard@google.com
|
044f914c29
|
Change scale to unaligned movdqu.
BUG=365
TESTED=scale unittests
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/22879004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1101 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-10-01 01:16:04 +00:00 |
|
fbarchard@google.com
|
d33bf86b25
|
CopyRow_AVX which supports unaligned pointers for Sandy Bridge CPU.
BUG=363
TESTED=out\release\libyuv_unittest --gtest_filter=*ARGBToARGB_*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/31489004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1097 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-09-29 23:53:18 +00:00 |
|
fbarchard@google.com
|
aec76f2e30
|
add stride to pointer in C and pass as register to inline.
BUG=357
TESTED=clang on ios
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/29489004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1086 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-09-19 22:51:39 +00:00 |
|
fbarchard@google.com
|
6e95f6f7e1
|
ifdef headers to avoid intrinsics if built with gcc 64 bit on windows.
BUG=351
TESTED=untested
R=jzern@chromium.org
Review URL: https://webrtc-codereview.appspot.com/22419004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1058 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-08-21 22:44:49 +00:00 |
|
fbarchard@google.com
|
9e0f21af0b
|
fixes for blank line lint warnings
BUG=348
TESTED=cpplint.py --filter=-casting source/*.cc include/libyuv/*.h
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/18139004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1045 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-08-14 19:42:48 +00:00 |
|
fbarchard@google.com
|
e6dd1fa024
|
Port I420ToARGB to intrinsics for win64
BUG=336
TESTED=out\release_x64\libyuv_unittest --gunit_also_run_disabled_tests --gtest_filter=*I420To*B*
R=bryan.bernhart@intel.com, tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/15809005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1018 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-06-24 20:45:45 +00:00 |
|
fbarchard@google.com
|
a1f5254a95
|
Switch to c style casts for all source and includes.
BUG=303
TESTED=try
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/6629004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@952 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-01-07 03:03:00 +00:00 |
|
fbarchard@google.com
|
5dba58cb1e
|
FixedDiv1 using a single 64/32 divide. Removes size restriction from slope.
BUG=302
TESTED=libyuv scale tests
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/6489004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@940 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2014-01-02 22:32:09 +00:00 |
|
fbarchard@google.com
|
c2295807bd
|
Reduce alignment for loops from 16 bytes to 4 bytes. Reduces outer loop overhead without hurting innerloop time.
BUG=none
TESTED=try bots
R=fbarchard@chromium.org, mflodman@webrtc.org
Review URL: https://webrtc-codereview.appspot.com/4659004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@880 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-12-02 15:57:39 +00:00 |
|
fbarchard@google.com
|
a0630d77f0
|
Report of affine to nacl using %k0
BUG=none
TEST=none
R=johannkoenig@google.com
Review URL: https://webrtc-codereview.appspot.com/3929004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@855 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-11-15 17:42:44 +00:00 |
|
fbarchard@google.com
|
c2a889eb55
|
Bump reciprocal up by 1
BUG=none
TEST=none
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/3599004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@847 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-11-11 05:14:13 +00:00 |
|
fbarchard@google.com
|
191ab18073
|
Use fixed point for small blurs
BUG=none
TEST=libyuvTest.ARGBBlurSmall_Opt
R=ryanpetrie@google.com
Review URL: https://webrtc-codereview.appspot.com/3389004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@843 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-11-05 18:19:11 +00:00 |
|
fbarchard@google.com
|
4a4b7374c1
|
Load matrix with one vector and splat to 4 different ones.
BUG=none
TEST=none
R=ryanpetrie@google.com
Review URL: https://webrtc-codereview.appspot.com/3299004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@838 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-11-01 21:29:45 +00:00 |
|
fbarchard@google.com
|
11a0d48e45
|
pass parameter for yuv conversion
BUG=267
TEST=Luma
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/3169005
git-svn-id: http://libyuv.googlecode.com/svn/trunk@834 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-10-31 05:47:13 +00:00 |
|
fbarchard@google.com
|
21796c94aa
|
Move constant to its own asm block to save 3 GPR registers for main loop
BUG=267
TESTED=32 bit mac build
Review URL: https://webrtc-codereview.appspot.com/3099004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@832 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-10-29 08:43:13 +00:00 |
|
fbarchard@google.com
|
ca8f826ba3
|
Luma fetch 4 pixels
BUG=267
TEST=Luma*
R=ryanpetrie@google.com
Review URL: https://webrtc-codereview.appspot.com/3079004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@831 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-10-28 22:53:22 +00:00 |
|
fbarchard@google.com
|
4c736098d6
|
Use packssdw which is SSE2 not packusdw which is SSSE4.
BUG=none
TEST=Sobel* on AMD cpu
R=ryanpetrie@google.com
Review URL: https://webrtc-codereview.appspot.com/3069004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@829 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-10-28 19:12:49 +00:00 |
|
fbarchard@google.com
|
6f7e514caa
|
Full metal BCS
BUG=none
TEST=Luma* unittest
R=thorcarpenter@google.com
Review URL: https://webrtc-codereview.appspot.com/3029004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@828 16f28f9a-4ce2-e073-06de-1de4eb20be90
|
2013-10-28 17:10:49 +00:00 |
|