George Steed
776a509891
[AArch64] Unroll ScaleRowDown34_1_Box_NEON
...
We can make use of wider instructions for the loads and stores as well
as the URHADD instructions. In addition the duplicated instructions of
the code from the unrolling provides a further small improvement for
little cores with limited out-of-order capability.
Reduction in runtimes observed compared to the existing Neon
implementation:
Cortex-A55: -23.5%
Cortex-A510: -35.4%
Cortex-A520: -40.5%
Cortex-A76: -15.1%
Cortex-A715: -6.2%
Cortex-A720: -6.2%
Cortex-X1: -17.9%
Cortex-X2: -18.4%
Cortex-X3: -18.3%
Cortex-X4: -14.0%
Bug: b/42280945
Change-Id: I5905e026a0507870bfc580b702906d6acb4ed6f4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5725170
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-07-19 19:51:45 +00:00
George Steed
be5de19db3
[AArch64] Unroll ScaleRowUp2_Linear_NEON
...
On little cores with limited out-of-order capability this gives a good
improvement.
Reduction in runtimes observed compared to the existing Neon
implementation:
Cortex-A55: -21.3%
Cortex-A520: -33.6%
Cortex-A76: +1.1%
Cortex-A715: =0.0%
Cortex-A720: =0.0%
Cortex-X1: +10.4% (!)
Cortex-X2: -5.3%
Cortex-X3: -4.3%
Cortex-X4: -9.9%
Bug: b/42280945
Change-Id: I45b3510f13c05b19d61052e2f8e447199dbd0551
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5725169
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-07-19 19:51:17 +00:00
Frank Barchard
68659d0d68
UVScale down by 2 fix for C and optimize for NEON
...
- update cpu_id to use "re" for fopen to avoid leaking handles if a thread is started while the file is open.
Bug: libyuv:958
Change-Id: I1af9de68fce12e440e1226fc8070634ccb1bf090
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4417176
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-12 22:49:20 +00:00
Yuan Tong
98ec7c28d5
Fix SSE2 version of ScalePlaneUp2_16_Bilinear
...
- Define HAS_SCALEROWUP2_BILINEAR_16_SSE2: it's now fixed.
- Correct function name to ScaleRowUp2_Bilinear_16_Any_SSE2:
this row function uses only SSE2 instructions.
Bug: libyuv:882
Change-Id: Ib1c7ac5b09997cb5b32bc54109d8c566af762433
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3800842
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-02 20:35:48 +00:00
Frank Barchard
b028453ba6
Disable bilinear 16 bit scale up for SSE2
...
- Undefine HAS_SCALEROWUP2_BILINEAR_16_SSE2
- Save XMM7 in ScaleRowUp2_Bilinear_16_SSE2().
- Rename HAS_SCALEROWUP2LINEAR_xxx to HAS_SCALEROWUP2_LINEAR_xxx
- DetileSplitUVRow_C() is implemented using SplitUVRow_C().
- Changes to unit_test/planar_test.cc.
Bug: libyuv:882
Change-Id: I0a8e8e5fb43bdf58ded87244e802343eacb789f2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3795063
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-01 22:54:48 +00:00
Frank Barchard
eec8dd37e8
Change ScaleUVRowUp2_Biinear_16_SSE2 to SSE41
...
Bug: libyuv:928
xed -i scale_gcc.o:
SYM ScaleUVRowUp2_Linear_16_SSE2:
XDIS 0: LOGICAL SSE2 660FEFED pxor xmm5, xmm5
XDIS 4: SSE SSE2 660F76E4 pcmpeqd xmm4, xmm4
XDIS 8: SSE SSE2 660F72D41F psrld xmm4, 0x1f
XDIS d: SSE SSE2 660F72F401 pslld xmm4, 0x1
XDIS 12: DATAXFER SSE2 F30F7E07 movq xmm0, qword ptr [rdi]
XDIS 16: DATAXFER SSE2 F30F7E4F04 movq xmm1, qword ptr [rdi+0x4]
XDIS 1b: SSE SSE2 660F61C5 punpcklwd xmm0, xmm5
XDIS 1f: SSE SSE2 660F61CD punpcklwd xmm1, xmm5
XDIS 23: DATAXFER SSE2 660F6FD0 movdqa xmm2, xmm0
XDIS 27: DATAXFER SSE2 660F6FD9 movdqa xmm3, xmm1
XDIS 2b: SSE SSE2 660F70D24E pshufd xmm2, xmm2, 0x4e
XDIS 30: SSE SSE2 660F70DB4E pshufd xmm3, xmm3, 0x4e
XDIS 35: SSE SSE2 660FFED4 paddd xmm2, xmm4
XDIS 39: SSE SSE2 660FFEDC paddd xmm3, xmm4
XDIS 3d: SSE SSE2 660FFED0 paddd xmm2, xmm0
XDIS 41: SSE SSE2 660FFED9 paddd xmm3, xmm1
XDIS 45: SSE SSE2 660FFEC0 paddd xmm0, xmm0
XDIS 49: SSE SSE2 660FFEC9 paddd xmm1, xmm1
XDIS 4d: SSE SSE2 660FFEC2 paddd xmm0, xmm2
XDIS 51: SSE SSE2 660FFECB paddd xmm1, xmm3
XDIS 55: SSE SSE2 660F72D002 psrld xmm0, 0x2
XDIS 5a: SSE SSE2 660F72D102 psrld xmm1, 0x2
XDIS 5f: SSE SSE4 660F382BC1 packusdw xmm0, xmm1
XDIS 64: DATAXFER SSE2 F30F7F06 movdqu xmmword ptr [rsi], xmm0
XDIS 68: MISC BASE 488D7F08 lea rdi, ptr [rdi+0x8]
XDIS 6c: MISC BASE 488D7610 lea rsi, ptr [rsi+0x10]
XDIS 70: BINARY BASE 83EA04 sub edx, 0x4
XDIS 73: COND_BR BASE 7F9D jnle 0x12 <ScaleUVRowUp2_Linear_16_SSE2+0x12>
XDIS 75: RET BASE C3 ret
SYM ScaleUVRowUp2_Bilinear_16_SSE2:
XDIS 0: LOGICAL SSE2 660FEFFF pxor xmm7, xmm7
XDIS 4: SSE SSE2 660F76F6 pcmpeqd xmm6, xmm6
XDIS 8: SSE SSE2 660F72D61F psrld xmm6, 0x1f
XDIS d: SSE SSE2 660F72F603 pslld xmm6, 0x3
XDIS 12: DATAXFER SSE2 F30F7E07 movq xmm0, qword ptr [rdi]
XDIS 16: DATAXFER SSE2 F30F7E4F04 movq xmm1, qword ptr [rdi+0x4]
XDIS 1b: SSE SSE2 660F61C7 punpcklwd xmm0, xmm7
XDIS 1f: SSE SSE2 660F61CF punpcklwd xmm1, xmm7
XDIS 23: DATAXFER SSE2 660F6FD0 movdqa xmm2, xmm0
XDIS 27: DATAXFER SSE2 660F6FD9 movdqa xmm3, xmm1
XDIS 2b: SSE SSE2 660F70D24E pshufd xmm2, xmm2, 0x4e
XDIS 30: SSE SSE2 660F70DB4E pshufd xmm3, xmm3, 0x4e
XDIS 35: SSE SSE2 660FFED0 paddd xmm2, xmm0
XDIS 39: SSE SSE2 660FFED9 paddd xmm3, xmm1
XDIS 3d: SSE SSE2 660FFEC0 paddd xmm0, xmm0
XDIS 41: SSE SSE2 660FFEC9 paddd xmm1, xmm1
XDIS 45: SSE SSE2 660FFEC2 paddd xmm0, xmm2
XDIS 49: SSE SSE2 660FFECB paddd xmm1, xmm3
XDIS 4d: DATAXFER SSE2 F30F7E1477 movq xmm2, qword ptr [rdi+rsi*2]
XDIS 52: DATAXFER SSE2 F30F7E5C7704 movq xmm3, qword ptr [rdi+rsi*2+0x4]
XDIS 58: SSE SSE2 660F61D7 punpcklwd xmm2, xmm7
XDIS 5c: SSE SSE2 660F61DF punpcklwd xmm3, xmm7
XDIS 60: DATAXFER SSE2 660F6FE2 movdqa xmm4, xmm2
XDIS 64: DATAXFER SSE2 660F6FEB movdqa xmm5, xmm3
XDIS 68: SSE SSE2 660F70E44E pshufd xmm4, xmm4, 0x4e
XDIS 6d: SSE SSE2 660F70ED4E pshufd xmm5, xmm5, 0x4e
XDIS 72: SSE SSE2 660FFEE2 paddd xmm4, xmm2
XDIS 76: SSE SSE2 660FFEEB paddd xmm5, xmm3
XDIS 7a: SSE SSE2 660FFED2 paddd xmm2, xmm2
XDIS 7e: SSE SSE2 660FFEDB paddd xmm3, xmm3
XDIS 82: SSE SSE2 660FFED4 paddd xmm2, xmm4
XDIS 86: SSE SSE2 660FFEDD paddd xmm3, xmm5
XDIS 8a: DATAXFER SSE2 660F6FE0 movdqa xmm4, xmm0
XDIS 8e: DATAXFER SSE2 660F6FEA movdqa xmm5, xmm2
XDIS 92: SSE SSE2 660FFEE0 paddd xmm4, xmm0
XDIS 96: SSE SSE2 660FFEEE paddd xmm5, xmm6
XDIS 9a: SSE SSE2 660FFEE0 paddd xmm4, xmm0
XDIS 9e: SSE SSE2 660FFEE5 paddd xmm4, xmm5
XDIS a2: SSE SSE2 660F72D404 psrld xmm4, 0x4
XDIS a7: DATAXFER SSE2 660F6FEA movdqa xmm5, xmm2
XDIS ab: SSE SSE2 660FFEEA paddd xmm5, xmm2
XDIS af: SSE SSE2 660FFEC6 paddd xmm0, xmm6
XDIS b3: SSE SSE2 660FFEEA paddd xmm5, xmm2
XDIS b7: SSE SSE2 660FFEE8 paddd xmm5, xmm0
XDIS bb: SSE SSE2 660F72D504 psrld xmm5, 0x4
XDIS c0: DATAXFER SSE2 660F6FC1 movdqa xmm0, xmm1
XDIS c4: DATAXFER SSE2 660F6FD3 movdqa xmm2, xmm3
XDIS c8: SSE SSE2 660FFEC1 paddd xmm0, xmm1
XDIS cc: SSE SSE2 660FFED6 paddd xmm2, xmm6
XDIS d0: SSE SSE2 660FFEC1 paddd xmm0, xmm1
XDIS d4: SSE SSE2 660FFEC2 paddd xmm0, xmm2
XDIS d8: SSE SSE2 660F72D004 psrld xmm0, 0x4
XDIS dd: DATAXFER SSE2 660F6FD3 movdqa xmm2, xmm3
XDIS e1: SSE SSE2 660FFED3 paddd xmm2, xmm3
XDIS e5: SSE SSE2 660FFECE paddd xmm1, xmm6
XDIS e9: SSE SSE2 660FFED3 paddd xmm2, xmm3
XDIS ed: SSE SSE2 660FFED1 paddd xmm2, xmm1
XDIS f1: SSE SSE2 660F72D204 psrld xmm2, 0x4
XDIS f6: SSE SSE4 660F382BE0 packusdw xmm4, xmm0
XDIS fb: DATAXFER SSE2 F30F7F22 movdqu xmmword ptr [rdx], xmm4
XDIS ff: SSE SSE4 660F382BEA packusdw xmm5, xmm2
XDIS 104: DATAXFER SSE2 F30F7F2C4A movdqu xmmword ptr [rdx+rcx*2], xmm5
XDIS 109: MISC BASE 488D7F08 lea rdi, ptr [rdi+0x8]
XDIS 10d: MISC BASE 488D5210 lea rdx, ptr [rdx+0x10]
XDIS 111: BINARY BASE 4183E804 sub r8d, 0x4
XDIS 115: COND_BR BASE 0F8FF7FEFFFF jnle 0x12 <ScaleUVRowUp2_Bilinear_16_SSE2+0x12>
XDIS 11b: RET BASE C3 ret
Change-Id: Ia20860e9c3c45368822cfd8877167ff0bf973dcc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3587602
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-15 18:46:09 +00:00
Yuan Tong
ebb27d6916
Add YUV to RGB conversion function with filter parameter
...
Add the following functions:
I420ToARGBMatrixFilter
I422ToARGBMatrixFilter
I010ToAR30MatrixFilter
I210ToAR30MatrixFilter
I010ToARGBMatrixFilter
I210ToARGBMatrixFilter
I420AlphaToARGBMatrixFilter
I422AlphaToARGBMatrixFilter
I010AlphaToARGBMatrixFilter
I210AlphaToARGBMatrixFilter
Bug: libyuv:872
Change-Id: Ib33b09fd7d304688c5e06c55e0a576a964665a51
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3430334
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 11:50:35 +00:00
Frank Barchard
2c6bfc02d5
Remove MMI support
...
Bug: libyuv:916
Change-Id: I345b7e271ceb4b32fe91e292915e66be40812810
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3415817
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-26 08:41:33 +00:00
Hao Chen
2f87e9a713
Add optimization functions in scale_lsx.cc file.
...
Optimize 20 functions in source/scale_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Frank Barchard
a04e4f87fb
Fix scale any mask parameter bug for NV12Scale
...
Bug: None
Change-Id: Ib4e174c086162ee709faf4b04c7d5d5847a7de3d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3267488
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-11-08 20:00:04 +00:00
Martin Storsjö
95ff456c33
Fix the mask for odd widths for ScaleRowUp2_Linear*_Any_NEON
...
These NEON functions produce 16 pixels per iteration each, thus
use the mask 15, not 7.
Change-Id: I1f3eb691a9ca4af705393b2842b18b65f6878926
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2731801
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-03 16:19:17 +00:00
Yuan Tong
c41eabe3d4
Add full 16 bit scaling up by 2x function
...
R=fbarchard@chromium.org
Change-Id: I4a869aefdc16e34357a615727711594c5d8e3a80
Bug: libyuv:882
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2719842
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-02 19:29:02 +00:00
Yuan Tong
d4ecb70610
Add P010ToP410 and P210ToP410
...
These are 16 bit bi-planar convert functions to scale UV plane to
Y plane's size using (bi)linear filter.
libyuv_unittest --gtest_filter=*ToP41*
R=fbarchard@chromium.org
Bug: libyuv:872
Change-Id: I3cb4fafe2b2c9eedd0d91cf4c619abb9ee107bc1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2690102
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-12 14:55:24 +00:00
Frank Barchard
12a4a2372c
Rounding added to scaling upsampler
...
Bug: libyuv:872, b/178521093
Change-Id: I86749f73f5e55d5fd8b87ea6938084cbacb1cda7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2686945
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-02-10 18:51:02 +00:00
Yuan Tong
f7fc83f46d
Add NV12ToNV24 and NV16ToNV24
...
These are bi-planar convert functions to scale UV plane to Y plane's size using (bi)linear filter.
libyuv_unittest --gtest_filter=*ToNV24*
R=fbarchard@chromium.org
Change-Id: I3d98f833feeef00af3c903ac9ad0e41bdcbcb51f
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2682152
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-09 07:38:40 +00:00
Frank Barchard
942c508448
BT.2020 Full Range yuvconstants
...
new color util to compute constants needed based on white point.
[ RUN ] LibYUVColorTest.TestFullYUVV
hist -2 -1 0 1 2
red 0 1627136 13670144 1479936 0
green 319285 3456836 9243059 3440771 317265
blue 0 1561088 14202112 1014016 0
Bug: libyuv:877, b/178283356
Change-Id: If432ebfab76b01302fdb416a153c4f26ca0832d6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2678859
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-02-06 00:26:55 +00:00
Yuan Tong
fc61dde1eb
Add special optimization for I420ToI444 and I422ToI444
...
These functions use (bi)linear filter, to scale U and V planes to the size of Y plane.
This will help enhance the quality of YUV to RGB conversion.
Also added 10bit and 12bit version:
I010ToI410
I210ToI410
I012ToI412
I212ToI412
libyuv_unittest --gtest_filter=LibYUVConvertTest.I42*ToI444*:LibYUVConvertTest.I*1*ToI41*
R=fbarchard@chromium.org
Change-Id: Ie4a711a5ba28f2ff1f44c021f7a5c149022264c5
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2658097
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-03 10:53:02 +00:00
Frank Barchard
b7a1c5ee5d
Scale by even factor low level row function
...
Bug: b/171884264
Change-Id: I6a94bde0aa05e681bb4590ea8beec33a61ddbfc9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2518361
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-11-03 21:25:18 +00:00
Frank Barchard
a4ec5cf9c2
UVScale down use AVX2 and Neon for aarch32
...
Intel SkylakeX
Was SSSE3 UVScaleDownBy4_Box (2496 ms)
Now AVX2 UVScaleDownBy4_Box (1983 ms)
Was SSSE3 UVScaleDownBy2_Box (380 ms)
Now AVX2 UVScaleDownBy2_Box (360 ms)
Pixel 4 aarch32
Was UVScaleDownBy4_Box (4295 ms)
Now UVScaleDownBy4_Box (3307 ms)
Was UVScaleDownBy2_Box (1022 ms)
Now UVScaleDownBy2_Box (778 ms)
Bug: libuyv:838
Change-Id: Ic823fa15e5761c1b9a897da27341adbf1ed39883
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2470196
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-14 06:23:26 +00:00
Frank Barchard
d730dc2f18
2x down sample for UV planes ported to SSSE3 / NEON
...
Bug: libuyv:838
Change-Id: Id9fb3282a3e86143d76b5e0cb557f0523a88b3c8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2465578
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-13 21:42:15 +00:00
Frank Barchard
c85a7b3ae3
MMI Optimized functions I422ToARGB for 1080p video
...
Improves playback performance for 1080p video on www.youku.com
BUG=libyuv:841
Change-Id: Iabe7693fba276162af0290863f46e214ab86fb6c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1790959
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2019-09-11 21:06:21 +00:00
Frank Barchard
b36c86fdfe
Port box filter to NEON
...
Bug: libyuv:821
Change-Id: I4a6b9bee2c2fae199c73c9ec7ecb32bde37c1852
Tested: out/Release/libyuv_unittest --gtest_filter=*ScaleFrom1920x1080_Box --libyuv_width=160 --libyuv_height=90 --libyuv_repeat=1000
Reviewed-on: https://chromium-review.googlesource.com/c/1298598
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-10-25 18:56:29 +00:00
Martin Storsjö
9b772abf97
Restore the file mode for source files
...
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.
Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-08-06 18:53:32 +00:00
lixia zhang
21be9122aa
libyuv:loongson optimize compare/row/scale/rotate files with mmi.
...
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).
In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.
BUG=libyuv:804
Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-07-20 22:53:04 +00:00
Frank Barchard
92e22cf5b6
Lint cleanup after C99 change CL
...
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: git cl lint
Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743
Reviewed-on: https://chromium-review.googlesource.com/882217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 19:16:03 +00:00
Frank Barchard
7e389884a1
Switch to C99 types
...
Append _t to all sized types.
uint64 becomes uint64_t etc
Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Frank Barchard
3b81288ece
Remove Mips DSPR2 code
...
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-14 18:22:16 +00:00
Frank Barchard
8cd3e4f3f2
Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions
...
TBR=kjellander@chromium.org
R=fbarchard@google.com
Bug:libyuv:634
Change-Id: Ib139b9701fc67e24d27a6886377c0cb8b2773fda
Reviewed-on: https://chromium-review.googlesource.com/620791
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-18 17:23:27 +00:00
Frank Barchard
6c94ad13b5
Remove ARM NaCL macros from source
...
NaCL has been disabled for awhile, so the code
will still build, but only with C versions.
This change removes the MEMACCESS() macros from
Neon and Neon64 source.
BUG=libyuv:702
TEST=try bots build for arm.
R=kjellander@chromium.org
Change-Id: Id581a5c8ff71e18cc69595e7fee9337f97c44a19
Reviewed-on: https://chromium-review.googlesource.com/528332
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-09 22:22:07 +00:00
Frank Barchard
44abf70187
ScaleDown odd functions adjust math so last pixel is half width source.
...
existing test passes
out/Release/libyuv_unittest --gtest_filter=*Blend* --libyuv_width=33 --libyuv_height=16
new test added
BUG=libyuv:705
TEST=LibYUVScaleTest.TestScaleOdd
Change-Id: Ica91812aee2e4ed9bcc18df4962b089c2e4ae704
Reviewed-on: https://chromium-review.googlesource.com/524932
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-06-06 01:37:26 +00:00
Frank Barchard
000d2fa91a
Libyuv MIPS DSPR2 optimizations.
...
Optimized functions:
I444ToARGBRow_DSPR2
I422ToARGB4444Row_DSPR2
I422ToARGB1555Row_DSPR2
NV12ToARGBRow_DSPR2
BGRAToUVRow_DSPR2
BGRAToYRow_DSPR2
ABGRToUVRow_DSPR2
ARGBToYRow_DSPR2
ABGRToYRow_DSPR2
RGBAToUVRow_DSPR2
RGBAToYRow_DSPR2
ARGBToUVRow_DSPR2
RGB24ToARGBRow_DSPR2
RAWToARGBRow_DSPR2
RGB565ToARGBRow_DSPR2
ARGB1555ToARGBRow_DSPR2
ARGB4444ToARGBRow_DSPR2
ScaleAddRow_DSPR2
Bug-fixes in functions:
ScaleRowDown2_DSPR2
ScaleRowDown4_DSPR2
BUG=
Review-Url: https://codereview.chromium.org/2626123003 .
2017-01-11 12:19:13 -08:00
Manojkumar Bhosale
288bfbefb5
Add MSA optimized remaining scale row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ScaleRowDown2_MSA - ~22.3x
ScaleRowDown2_Any_MSA - ~19.9x
ScaleRowDown2Linear_MSA - ~31.2x
ScaleRowDown2Linear_Any_MSA - ~29.4x
ScaleRowDown2Box_MSA - ~20.1x
ScaleRowDown2Box_Any_MSA - ~19.6x
ScaleRowDown4_MSA - ~11.7x
ScaleRowDown4_Any_MSA - ~11.2x
ScaleRowDown4Box_MSA - ~15.1x
ScaleRowDown4Box_Any_MSA - ~15.1x
ScaleRowDown38_MSA - ~1x
ScaleRowDown38_Any_MSA - ~1x
ScaleRowDown38_2_Box_MSA - ~1.7x
ScaleRowDown38_2_Box_Any_MSA - ~1.7x
ScaleRowDown38_3_Box_MSA - ~1.7x
ScaleRowDown38_3_Box_Any_MSA - ~1.7x
ScaleAddRow_MSA - ~1.2x
ScaleAddRow_Any_MSA - ~1.15x
Performance Gain (vs C non-vectorized)
ScaleRowDown2_MSA - ~22.4x
ScaleRowDown2_Any_MSA - ~19.8x
ScaleRowDown2Linear_MSA - ~31.6x
ScaleRowDown2Linear_Any_MSA - ~29.4x
ScaleRowDown2Box_MSA - ~20.1x
ScaleRowDown2Box_Any_MSA - ~19.6x
ScaleRowDown4_MSA - ~11.7x
ScaleRowDown4_Any_MSA - ~11.2x
ScaleRowDown4Box_MSA - ~15.1x
ScaleRowDown4Box_Any_MSA - ~15.1x
ScaleRowDown38_MSA - ~3.2x
ScaleRowDown38_Any_MSA - ~3.2x
ScaleRowDown38_2_Box_MSA - ~2.4x
ScaleRowDown38_2_Box_Any_MSA - ~2.3x
ScaleRowDown38_3_Box_MSA - ~2.9x
ScaleRowDown38_3_Box_Any_MSA - ~2.8x
ScaleAddRow_MSA - ~8x
ScaleAddRow_Any_MSA - ~7.46x
Review-Url: https://codereview.chromium.org/2559683002 .
2016-12-21 13:39:44 +05:30
Manojkumar Bhosale
56b5bbb0be
Add MSA optimized ARGB scaling functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ScaleARGBRowDown2_MSA - ~2.6x
ScaleARGBRowDown2Linear_MSA - ~7.9x
ScaleARGBRowDown2Box_MSA - ~3.7x
ScaleARGBRowDownEven_MSA - ~1.2x
ScaleARGBRowDownEvenBox_MSA - ~3.5x
ScaleARGBRowDown2_Any_MSA - ~2.6x
ScaleARGBRowDown2Linear_Any_MSA - ~7.9x
ScaleARGBRowDown2Box_Any_MSA - ~3.6x
ScaleARGBRowDownEven_Any_MSA - ~1.2x
ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x
Performance Gain (vs C non-vectorized)
ScaleARGBRowDown2_MSA - 2.6x
ScaleARGBRowDown2Linear_MSA - 13.5x
ScaleARGBRowDown2Box_MSA - 5.8x
ScaleARGBRowDownEven_MSA - 1.2x
ScaleARGBRowDownEvenBox_MSA - 3.7x
ScaleARGBRowDown2_Any_MSA - 2.6x
ScaleARGBRowDown2Linear_Any_MSA - 13.5x
ScaleARGBRowDown2Box_Any_MSA - 5.3x
ScaleARGBRowDownEven_Any_MSA - 1.2x
ScaleARGBRowDownEvenBox_Any_MSA - 3.7x
Review URL: https://codereview.chromium.org/2527983002 .
2016-12-07 11:47:15 +05:30
Frank Barchard
e62309f259
clang-format libyuv
...
BUG=libyuv:654
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
fc52d8ded2
Odd width variation of scale down by 2 for subsampling
...
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:538
Review URL: https://codereview.chromium.org/1558093003 .
2016-01-06 15:12:17 -08:00
Frank Barchard
80ca4514ef
change scale down by 4 to use rounding.
...
TBR=harryjin@google.com
BUG=libyuv:447
Review URL: https://codereview.chromium.org/1525033005 .
2015-12-15 21:25:18 -08:00
Frank Barchard
ae55e41851
use rounding in scaledown by 2
...
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.
R=dhrosa@google.com , harryjin@google.com
BUG=libyuv:447,libyuv:527
Review URL: https://codereview.chromium.org/1513183004 .
2015-12-14 17:25:36 -08:00
fbarchard@google.com
05416e2d9a
Box filter for YUV use rows with accumulation buffer for better memory behavior. The old code would do columns accumulated into registers, and then store the result once. This was slow from a memory point of view. The new code does a row of source at a time, updating an accumulation buffer every row. The accumulation buffer is small, and should fit cache. Before each accumulation of N rows, the buffer needs to be reset to zero. If the memset is a bottleneck, it would be faster to do the first row without an add, storing to the accumulation buffer, and then add for the remaining rows.
...
BUG=425
TESTED=out\release\libyuv_unittest --gtest_filter=*ScaleTo1x1*
R=harryjin@google.com
Review URL: https://webrtc-codereview.appspot.com/52659004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1428 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-06-09 01:05:18 +00:00
fbarchard@google.com
7c09264ffc
odd width support for scale by even scale factor and box scale down by 4. scale down by 4 uses scale down by 2 internally.
...
BUG=431
TESTED=libyuvTest.ARGBScaleDownBy4_Bilinear
Review URL: https://webrtc-codereview.appspot.com/57399004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1412 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-26 17:56:51 +00:00
fbarchard@google.com
c38aeec322
scale down by 2 on argb images support odd widths using _any function.
...
BUG=431
TESTED=libyuvTest.ARGBScaleDownBy2_Bilinear
Review URL: https://webrtc-codereview.appspot.com/52569004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1410 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-05-22 21:39:21 +00:00
fbarchard@google.com
31806d76d9
scale to 3/4 bug fix for odd widths. multiply to index into source by scale factor should be 4 / 3 not 3 / 4.
...
BUG=433
TESTED=set LIBYUV_WIDTH=1276 out\release\libyuv_unittest.exe --gtest_catch_exceptions=0 --gtest_filter=*.Scale*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/49219004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1391 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-30 17:18:13 +00:00
fbarchard@google.com
9f4636e298
AVX2 port of ScaleDownBy4.
...
BUG=314
TESTED=out\release\libyuv_unittest --gtest_filter=*.ScaleDownBy4*
Review URL: https://webrtc-codereview.appspot.com/46159004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1390 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-30 01:58:32 +00:00
fbarchard@google.com
4e78b8dc2e
scale to 3/4 or 3/8 with odd width destinations efficiently. previously if width was not multiple of what the simd loop would do (24), scaling would fall back on slower C code. This change allows SIMD to be used for most of the scaling and C for the remainder, improving efficiency.
...
BUG=314
TESTED=set LIBYUV_WIDTH=1896 & ScaleDownBy3by4_*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/48249004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1380 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-27 21:56:08 +00:00
fbarchard@google.com
1ffb04b43e
Allow ScaleRowDown any functions to accept non-power of 2 for destination SIMD multiple.
...
BUG=none
TESTED=local unittests pass
R=bcornell@google.com
Review URL: https://webrtc-codereview.appspot.com/45129004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1379 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-24 22:32:12 +00:00
fbarchard@google.com
2b7f6b7dee
ScaleAddRows_Any_SSE2 functions for handling odd widths.
...
BUG=425
TESTED=out\release\libyuv_unittest_old --gtest_filter=*.ScaleDownBy3_*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/45219004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1377 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-22 00:51:56 +00:00
yang.zhang@arm.com
5f609856de
Add ScaleARGBFilterCols_NEON for ARM32/64
...
ARM32/64 NEON versions of ScaleARGBFilterCols_NEON are implemented.
BUG=319
TESTED=libyuvTest.* on ARM32/64 with Android
R=fbarchard@google.com
Change-Id: Ifea62bc25d846bf16cb51d13b408de7bf58dccd4
Review URL: https://webrtc-codereview.appspot.com/46699004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1361 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-07 03:45:29 +00:00
fbarchard@google.com
44b6ba91e4
Scale down by 4 for odd number of destination pixels using 'any' that handles SIMD for multiple of 8 pixels, and C for the remainder.
...
BUG=314
TESTED=local test with width odd
Review URL: https://webrtc-codereview.appspot.com/49599004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1355 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-04-03 22:12:53 +00:00
yang.zhang@arm.com
f23d6222ac
Add ScaleARGBCols_NEON for ARM32/64
...
ARM32/64 NEON versions of ScaleARGBCols_NEON are implemented.
BUG=319
TESTED=libyuvTest.* on ARM32/64 with Android
R=fbarchard@google.com
Change-Id: Id9ad97f7aa5d8a34cd55ace9e648cb6ff028efd9
Review URL: https://webrtc-codereview.appspot.com/47689004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1351 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-31 03:03:05 +00:00
fbarchard@google.com
72673ac873
linear and point sample scale to half size for AVX2.
...
BUG=314
TESTED=out\release\libyuv_unittest.exe --gtest_catch_exceptions=0 --gtest_filter=*.ScaleDownBy2*
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/44959004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1349 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-30 21:46:08 +00:00
fbarchard@google.com
e6ca9cc2a2
Scale down by 2 AVX2 port. Processes twice as many pixels as SSE2 and takes advantage of 3 argument instructions to reduce register usage and number of instructions.
...
BUG=314
TESTED=libyuvTest.ScaleDownBy2_Box
R=tpsiaki@google.com
Review URL: https://webrtc-codereview.appspot.com/42959004
git-svn-id: http://libyuv.googlecode.com/svn/trunk@1347 16f28f9a-4ce2-e073-06de-1de4eb20be90
2015-03-26 23:21:08 +00:00