Frank Barchard
2adb84e39e
make gflags command line parser optional
...
BUG=libyuv:691
TEST=gn gen out/Release "--args=is_debug=false target_cpu=\"x64\" libyuv_include_tests=true"
Change-Id: Ib481189be884c34d9bbc30bfcf71c7969c6f4dae
Reviewed-on: https://chromium-review.googlesource.com/452736
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-14 01:52:52 +00:00
Frank Barchard
d59d3fcd18
Change parameter for '_Any' functions to param to avoid misnomer
...
BUG=None
TEST=None
Change-Id: I6940fc4753783afd25f83868635381bf801c65f5
Reviewed-on: https://chromium-review.googlesource.com/452962
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-10 23:32:39 +00:00
Frank Barchard
e6fec061cf
lint cleanup for convert RGB24ToI420
...
RGB24, RAW, RGB565, ARGB1555 and ARGB4444 have conditional
2 pass versus direct path. 2 pass method requires a buffer that
is conditionally allocated. ifdef's were confusing lint.
simplifed ifdefs to clean up lint warning
BUG=libyuv:692
TEST=lint source/convert.cc
Change-Id: If868718af30b48824a5e3d28f0d7d01d4609ad55
Reviewed-on: https://chromium-review.googlesource.com/451552
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-09 10:32:23 +00:00
Frank Barchard
73a603e120
clang-format 5.0 applied to libyuv
...
BUG=None
TEST=try bots and lint test
Change-Id: I1ab462adf2d309117862c5eb4b244a61ae202951
Reviewed-on: https://chromium-review.googlesource.com/450658
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-03-08 18:50:12 +00:00
Frank Barchard
136aa9d37c
any11p fix for buffer overrun
...
BUG=libyuv:686
TESTED=untested
Change-Id: Idfae93349dd78b1b633a596631e5397e11b77d0b
Reviewed-on: https://chromium-review.googlesource.com/448320
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-03 19:57:35 +00:00
Frank Barchard
91ee9b729e
Fix missing return in MipsCpuCaps.
...
Previously if MipsCpuCaps were called with something other than
dspr2 or msa, the file was closed but still used.
This change assumed the function is only called internally twice:
once for msa and once for dspr2. If msa is not being detected,
the function assumed dspr2 was being tested and returns dspr2 was
true.
BUG=libyuv:687
TEST=try bots
Change-Id: I80b328eb5ffc7baf5f1ee5a79c16d75c45ff26cc
Reviewed-on: https://chromium-review.googlesource.com/447831
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-01 23:07:03 +00:00
Manojkumar Bhosale
45b176d153
Add MSA optimized Interpolate/MergeUV/Misc functions
...
BUG=libyuv:634
Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
Performance Gain (vs C auto-vectorized)
InterpolateRow_MSA - ~3.3x
InterpolateRow_Any_MSA - ~2.5x
ARGBSetRow_MSA - ~1.0x
ARGBSetRow_Any_MSA - ~1.0x
ARGBToRGB24Row_MSA - ~1.9x
ARGBToRGB24Row_Any_MSA - ~1.6x
MergeUVRow_MSA - ~1.6x
MergeUVRow_Any_MSA - ~1.2x
Performance Gain (vs C non-vectorized)
InterpolateRow_MSA - ~11.3x
InterpolateRow_Any_MSA - ~ 7.9x
ARGBSetRow_MSA - ~ 6.2x
ARGBSetRow_Any_MSA - ~ 4.0x
ARGBToRGB24Row_MSA - ~ 9.9x
ARGBToRGB24Row_Any_MSA - ~ 8.4x
MergeUVRow_MSA - ~12.7x
MergeUVRow_Any_MSA - ~ 8.0x
Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
Reviewed-on: https://chromium-review.googlesource.com/445817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-23 01:42:22 +00:00
Manojkumar Bhosale
eed66b2028
Add MSA optimized I444/I400/J400/YUY2/UYVY to ARGB row functions
...
BUG=libyuv:634
Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be
Performance Gain (vs C auto-vectorized)
I444ToARGBRow_MSA - ~1.6x
I444ToARGBRow_Any_MSA - ~1.6x
I400ToARGBRow_MSA - ~5.5x
I400ToARGBRow_Any_MSA - ~5.3x
J400ToARGBRow_MSA - ~1.0x
J400ToARGBRow_Any_MSA - ~1.0x
YUY2ToARGBRow_MSA - ~1.6x
YUY2ToARGBRow_Any_MSA - ~1.6x
UYVYToARGBRow_MSA - ~1.6x
UYVYToARGBRow_Any_MSA - ~1.6x
Performance Gain (vs C non-vectorized)
I444ToARGBRow_MSA - ~7.3x
I444ToARGBRow_Any_MSA - ~7.1x
I400ToARGBRow_MSA - ~5.5x
I400ToARGBRow_Any_MSA - ~5.2x
J400ToARGBRow_MSA - ~6.8x
J400ToARGBRow_Any_MSA - ~5.7x
YUY2ToARGBRow_MSA - ~7.2x
YUY2ToARGBRow_Any_MSA - ~7.0x
UYVYToARGBRow_MSA - ~7.1x
UYVYToARGBRow_Any_MSA - ~6.9x
Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be
Reviewed-on: https://chromium-review.googlesource.com/439246
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-21 23:22:07 +00:00
Frank Barchard
bbe8c233f2
scale warning fixes for unused parameters
...
BUG=libyuv:680
TEST=builds and runs with no warnings
Change-Id: I7d60ef44292fa6ad4f7c4e2e2657359b864d2dab
Reviewed-on: https://chromium-review.googlesource.com/442670
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-02-15 21:38:59 +00:00
Frank Barchard
20ecb366ea
I420ToI400 fix for unused u and v arguments
...
TBR=kjellander@chromium.org
BUG=libyuv:679
TEST=None
Change-Id: I830f31e0de58c967cb02c71f422df4c19c94c367
Reviewed-on: https://chromium-review.googlesource.com/441949
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-14 02:54:58 +00:00
Frank Barchard
1d5630ba2b
mjpeg_decoder - fix for unused parameter.
...
TBR=kjellander@chromium.org
BUG=libyuv:678
TEST=None
Change-Id: I5206de2026b56f155531a76f94aeb6cb1b0cd6c9
Reviewed-on: https://chromium-review.googlesource.com/442090
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-14 02:22:57 +00:00
Frank Barchard
177d344ed7
enable unused parameter warning
...
android.mk builds have unused parameter warning on by default.
This change for GN makes libyuv build the same way.
BUG=libyuv:681
TEST=build on linux with clang and ninja.
Change-Id: I76c627d446b96653f147725bca915d94a42ce9a6
Reviewed-on: https://chromium-review.googlesource.com/441194
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-14 01:33:57 +00:00
Frank Barchard
6825b161d7
HalfFloat SSE2/AVX2 optimized port scheduling.
...
Uses 1 add instead of 2 leas to reduce port pressure on ports 1 and 5
used for SIMD instructions.
BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -arch HSW out/Release/obj/libyuv/row_gcc.o
Change-Id: I3965ee5dcb49941a535efa611b5988d977f5b65c
Reviewed-on: https://chromium-review.googlesource.com/433391
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-11 01:02:06 +00:00
Frank Barchard
7a54d0a302
row_msa fix clang build warnings.
...
BUG=libyuv:634
TEST=untested
Change-Id: Ib7f0c99e669ddba0a1efbd15895880281ad6303e
Reviewed-on: https://chromium-review.googlesource.com/435303
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-02-07 09:36:28 +00:00
Frank Barchard
76e7f104ae
documentation updates
...
BUG=None
TEST=Untested
Change-Id: I8ab95654255d1aa9cf05a664ecf59ee6c0757e66
Reviewed-on: https://chromium-review.googlesource.com/434941
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-02 18:31:32 +00:00
Frank Barchard
0fb5675902
Fix dspr2 rename changes. Fix unused variables
...
TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=try bots
Review-Url: https://codereview.chromium.org/2675583002 .
2017-02-01 18:51:06 -08:00
Manojkumar Bhosale
54ce8f23d6
Add MSA optimized ARGB/ABGR/BGRA/RGBA To Y/UV row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C auto-vectorized)
ARGBToYJRow_MSA - ~3.2x
ARGBToYJRow_Any_MSA - ~2.7x
BGRAToYRow_MSA - ~3.2x
BGRAToYRow_Any_MSA - ~2.7x
ABGRToYRow_MSA - ~3.2x
ABGRToYRow_Any_MSA - ~2.6x
RGBAToYRow_MSA - ~3.1x
RGBAToYRow_Any_MSA - ~2.7x
ARGBToUVJRow_MSA - ~5.5x
ARGBToUVJRow_Any_MSA - ~4.5x
BGRAToUVRow_MSA - ~2.1x
BGRAToUVRow_Any_MSA - ~2.0x
ABGRToUVRow_MSA - ~2.1x
ABGRToUVRow_Any_MSA - ~1.9x
RGBAToUVRow_MSA - ~2.2x
RGBAToUVRow_Any_MSA - ~1.9x
Performance Gain (vs C non-vectorized)
ARGBToYJRow_MSA - ~10.9x
ARGBToYJRow_Any_MSA - ~9.2x
BGRAToYRow_MSA - ~10.9x
BGRAToYRow_Any_MSA - ~9.3x
ABGRToYRow_MSA - ~11.0x
ABGRToYRow_Any_MSA - ~9.3x
RGBAToYRow_MSA - ~10.9x
RGBAToYRow_Any_MSA - ~9.1x
ARGBToUVJRow_MSA - ~12.4x
ARGBToUVJRow_Any_MSA - ~10.5x
BGRAToUVRow_MSA - ~4.7x
BGRAToUVRow_Any_MSA - ~4.4x
ABGRToUVRow_MSA - ~4.7x
ABGRToUVRow_Any_MSA - ~4.5x
RGBAToUVRow_MSA - ~4.8x
RGBAToUVRow_Any_MSA - ~4.4x
Review-Url: https://codereview.chromium.org/2641153003 .
2017-02-01 10:31:28 +05:30
Frank Barchard
b89bcda273
Add comments for ARGBToUV_C and ARGBToUVJ_C
...
ARGBToUV_C and ARGBToUVJ_C are generated functions with subtle
difference in rounding. Adding comment to make them easier to find.
TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=untested
Change-Id: I9912d256a1e04c58475d33bdb472c37484f6cab9
Reviewed-on: https://chromium-review.googlesource.com/434980
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-01-30 23:44:05 +00:00
Frank Barchard
54f2094a5e
Rename mips source files to dspr2.
...
Add MSA detect to unittest.
Change macro to disable DSPR2 code to LIBYUV_DISABLE_DSPR2
BUG=libyuv:634
TEST=try bots
Change-Id: I9e0aa2452204fc529bb6f9e6fd93c4e1c379bba6
Reviewed-on: https://chromium-review.googlesource.com/433463
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-01-27 23:11:43 +00:00
Frank Barchard
749e316ed8
Remove commented out code
...
TEST=None
BUG=libyuv:672
Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169
Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169
Reviewed-on: https://chromium-review.googlesource.com/430321
Reviewed-by: Aaron Gable <agable@chromium.org>
2017-01-20 02:03:12 +00:00
Manojkumar Bhosale
09b8c971b3
Add MSA optimized NV12/21 To RGB row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C auto-vectorized)
NV12ToARGBRow_MSA - ~1.5x
NV12ToARGBRow_Any_MSA - ~1.4x
NV12ToRGB565Row_MSA - ~1.4x
NV12ToRGB565Row_Any_MSA - ~1.4x
NV21ToARGBRow_MSA - ~1.5x
NV21ToARGBRow_Any_MSA - ~1.5x
SobelRow_MSA - ~4.3x
SobelRow_Any_MSA - ~3.4x
SobelToPlaneRow_MSA - ~8.0x
SobelToPlaneRow_Any_MSA - ~4.7x
SobelXYRow_MSA - ~3.0x
SobelXYRow_Any_MSA - ~2.5x
Performance Gain (vs C non-vectorized)
NV12ToARGBRow_MSA - ~6.5x
NV12ToARGBRow_Any_MSA - ~6.5x
NV12ToRGB565Row_MSA - ~6.2x
NV12ToRGB565Row_Any_MSA - ~6.1x
NV21ToARGBRow_MSA - ~6.5x
NV21ToARGBRow_Any_MSA - ~6.5x
SobelRow_MSA - ~14.5x
SobelRow_Any_MSA - ~11.3x
SobelToPlaneRow_MSA - ~34.2x
SobelToPlaneRow_Any_MSA - ~19.4x
SobelXYRow_MSA - ~11.1x
SobelXYRow_Any_MSA - ~9.1x
Review-Url: https://codereview.chromium.org/2636483002 .
2017-01-18 09:24:39 +05:30
Frank Barchard
a7c87e19f0
add Intel Code Analyst markers
...
add macros to enable/disable code analyst around blocks of code.
Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.
BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com
Review-Url: https://codereview.chromium.org/2626193002 .
2017-01-13 15:50:24 -08:00
Manojkumar Bhosale
73a6f100a9
Add MSA optimized rotate functions (used 16x16 transpose)
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
TransposeWx16_MSA - ~6.0x
TransposeWx16_Any_MSA - ~4.7x
TransposeUVWx16_MSA - ~6.3x
TransposeUVWx16_Any_MSA - ~5.4x
Performance Gain (vs C non-vectorized)
TransposeWx16_MSA - ~6.0x
TransposeWx16_Any_MSA - ~4.8x
TransposeUVWx16_MSA - ~6.3x
TransposeUVWx16_Any_MSA - ~5.4x
Review-Url: https://codereview.chromium.org/2617703002 .
2017-01-13 15:50:02 +05:30
Manojkumar Bhosale
7c64163ff4
Add MSA optimized RAW/RGB/ARGB to ARGB/Y/UV row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGB1555ToARGBRow_MSA - 1.85
ARGB1555ToARGBRow_Any_MSA - 1.82
RGB565ToARGBRow_MSA - 2.14
RGB565ToARGBRow_Any_MSA - 2.08
RGB24ToARGBRow_MSA - 8.57
RGB24ToARGBRow_Any_MSA - 7.42
RAWToARGBRow_MSA - 8.57
RAWToARGBRow_Any_MSA - 7.42
ARGB1555ToYRow_MSA - 2.60
ARGB1555ToYRow_Any_MSA - 2.47
RGB565ToYRow_MSA - 2.45
RGB565ToYRow_Any_MSA - 2.33
RGB24ToYRow_MSA - 2.23
RGB24ToYRow_Any_MSA - 2.01
RAWToYRow_MSA - 2.25
RAWToYRow_Any_MSA - 2.02
ARGB1555ToUVRow_MSA - 1.40
ARGB1555ToUVRow_Any_MSA - 1.37
RGB565ToUVRow_MSA - 1.68
RGB565ToUVRow_Any_MSA - 1.63
RGB24ToUVRow_MSA - 3.02
RGB24ToUVRow_Any_MSA - 2.87
RAWToUVRow_MSA - 3.04
RAWToUVRow_Any_MSA - 2.85
Performance Gain (vs C non-vectorized)
ARGB1555ToARGBRow_MSA - 4.66
ARGB1555ToARGBRow_Any_MSA - 4.45
RGB565ToARGBRow_MSA - 5.58
RGB565ToARGBRow_Any_MSA - 5.34
RGB24ToARGBRow_MSA - 8.57
RGB24ToARGBRow_Any_MSA - 7.42
RAWToARGBRow_MSA - 8.57
RAWToARGBRow_Any_MSA - 7.42
ARGB1555ToYRow_MSA - 6.38
ARGB1555ToYRow_Any_MSA - 5.98
RGB565ToYRow_MSA - 6.42
RGB565ToYRow_Any_MSA - 6.05
RGB24ToYRow_MSA - 7.87
RGB24ToYRow_Any_MSA - 7.01
RAWToYRow_MSA - 7.98
RAWToYRow_Any_MSA - 7.01
ARGB1555ToUVRow_MSA - 5.39
ARGB1555ToUVRow_Any_MSA - 5.06
RGB565ToUVRow_MSA - 6.39
RGB565ToUVRow_Any_MSA - 5.90
RGB24ToUVRow_MSA - 3.04
RGB24ToUVRow_Any_MSA - 2.87
RAWToUVRow_MSA - 3.04
RAWToUVRow_Any_MSA - 2.88
Review-Url: https://codereview.chromium.org/2600713002 .
2017-01-13 15:43:37 +05:30
Frank Barchard
cb11559408
ConvertToARGB: Allows rotation on ARGB input
...
BUG=libyuv:668
TEST=run unit tests
R=fbarchard@google.com
Review-Url: https://codereview.chromium.org/2620183002 .
2017-01-11 14:38:25 -08:00
Frank Barchard
000d2fa91a
Libyuv MIPS DSPR2 optimizations.
...
Optimized functions:
I444ToARGBRow_DSPR2
I422ToARGB4444Row_DSPR2
I422ToARGB1555Row_DSPR2
NV12ToARGBRow_DSPR2
BGRAToUVRow_DSPR2
BGRAToYRow_DSPR2
ABGRToUVRow_DSPR2
ARGBToYRow_DSPR2
ABGRToYRow_DSPR2
RGBAToUVRow_DSPR2
RGBAToYRow_DSPR2
ARGBToUVRow_DSPR2
RGB24ToARGBRow_DSPR2
RAWToARGBRow_DSPR2
RGB565ToARGBRow_DSPR2
ARGB1555ToARGBRow_DSPR2
ARGB4444ToARGBRow_DSPR2
ScaleAddRow_DSPR2
Bug-fixes in functions:
ScaleRowDown2_DSPR2
ScaleRowDown4_DSPR2
BUG=
Review-Url: https://codereview.chromium.org/2626123003 .
2017-01-11 12:19:13 -08:00
Manojkumar Bhosale
288bfbefb5
Add MSA optimized remaining scale row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ScaleRowDown2_MSA - ~22.3x
ScaleRowDown2_Any_MSA - ~19.9x
ScaleRowDown2Linear_MSA - ~31.2x
ScaleRowDown2Linear_Any_MSA - ~29.4x
ScaleRowDown2Box_MSA - ~20.1x
ScaleRowDown2Box_Any_MSA - ~19.6x
ScaleRowDown4_MSA - ~11.7x
ScaleRowDown4_Any_MSA - ~11.2x
ScaleRowDown4Box_MSA - ~15.1x
ScaleRowDown4Box_Any_MSA - ~15.1x
ScaleRowDown38_MSA - ~1x
ScaleRowDown38_Any_MSA - ~1x
ScaleRowDown38_2_Box_MSA - ~1.7x
ScaleRowDown38_2_Box_Any_MSA - ~1.7x
ScaleRowDown38_3_Box_MSA - ~1.7x
ScaleRowDown38_3_Box_Any_MSA - ~1.7x
ScaleAddRow_MSA - ~1.2x
ScaleAddRow_Any_MSA - ~1.15x
Performance Gain (vs C non-vectorized)
ScaleRowDown2_MSA - ~22.4x
ScaleRowDown2_Any_MSA - ~19.8x
ScaleRowDown2Linear_MSA - ~31.6x
ScaleRowDown2Linear_Any_MSA - ~29.4x
ScaleRowDown2Box_MSA - ~20.1x
ScaleRowDown2Box_Any_MSA - ~19.6x
ScaleRowDown4_MSA - ~11.7x
ScaleRowDown4_Any_MSA - ~11.2x
ScaleRowDown4Box_MSA - ~15.1x
ScaleRowDown4Box_Any_MSA - ~15.1x
ScaleRowDown38_MSA - ~3.2x
ScaleRowDown38_Any_MSA - ~3.2x
ScaleRowDown38_2_Box_MSA - ~2.4x
ScaleRowDown38_2_Box_Any_MSA - ~2.3x
ScaleRowDown38_3_Box_MSA - ~2.9x
ScaleRowDown38_3_Box_Any_MSA - ~2.8x
ScaleAddRow_MSA - ~8x
ScaleAddRow_Any_MSA - ~7.46x
Review-Url: https://codereview.chromium.org/2559683002 .
2016-12-21 13:39:44 +05:30
Manojkumar Bhosale
a899dea251
Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBAttenuateRow_MSA - ~1.1x
ARGBAttenuateRow_Any_MSA - ~1.1x
ARGBToRGB565DitherRow_MSA - ~6.4x
ARGBToRGB565DitherRow_Any_MSA - ~6.2x
ARGBShuffleRow_MSA - ~5.1x
ARGBShuffleRow_Any_MSA - ~1.9x
ARGBShadeRow_MSA - ~1.1x
ARGBGrayRow_MSA - ~2.6x
ARGBSepiaRow_MSA - ~11.6x
Performance Gain (vs C non-vectorized)
ARGBAttenuateRow_MSA - ~2.46x
ARGBAttenuateRow_Any_MSA - ~2.45x
ARGBToRGB565DitherRow_MSA - ~9.4x
ARGBToRGB565DitherRow_Any_MSA - ~12.5x
ARGBShuffleRow_MSA - ~5.2x
ARGBShuffleRow_Any_MSA - ~1.9x
ARGBShadeRow_MSA - ~4.3x
ARGBGrayRow_MSA - ~10.5x
ARGBSepiaRow_MSA - ~12.2x
Review-Url: https://codereview.chromium.org/2559693002 .
2016-12-15 12:06:02 +05:30
Manojkumar Bhosale
6fa5e4eb78
Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
TransposeWx8_MSA - ~2.7x
TransposeWx8_Any_MSA - ~2.1x
TransposeUVWx8_MSA - ~2.5x
TransposeUVWx8_Any_MSA - ~2.7x
Performance Gain (vs C non-vectorized)
TransposeWx8_MSA - ~4.6x
TransposeWx8_Any_MSA - ~2.9x
TransposeUVWx8_MSA - ~4.4x
TransposeUVWx8_Any_MSA - ~3.7x
Review URL: https://codereview.chromium.org/2553403002 .
2016-12-15 10:06:01 +05:30
Frank Barchard
b18fd21d3c
Android420ToI420 - use ptrdiff_t for difference of u and v pointers
...
The difference was assigned to an int, causing a warning on Visual C.
BUG=662
TEST=tested with try bots.
R=devangelakos@google.com
Review-Url: https://codereview.chromium.org/2574373002 .
2016-12-14 11:53:55 -08:00
Frank Barchard
dde8ba7009
ConvertFromI420: use halfstride instead of halfwidth
...
BUG=libyuv:660
TEST=try bots
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2554213003 .
2016-12-07 10:16:16 -08:00
Manojkumar Bhosale
56b5bbb0be
Add MSA optimized ARGB scaling functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ScaleARGBRowDown2_MSA - ~2.6x
ScaleARGBRowDown2Linear_MSA - ~7.9x
ScaleARGBRowDown2Box_MSA - ~3.7x
ScaleARGBRowDownEven_MSA - ~1.2x
ScaleARGBRowDownEvenBox_MSA - ~3.5x
ScaleARGBRowDown2_Any_MSA - ~2.6x
ScaleARGBRowDown2Linear_Any_MSA - ~7.9x
ScaleARGBRowDown2Box_Any_MSA - ~3.6x
ScaleARGBRowDownEven_Any_MSA - ~1.2x
ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x
Performance Gain (vs C non-vectorized)
ScaleARGBRowDown2_MSA - 2.6x
ScaleARGBRowDown2Linear_MSA - 13.5x
ScaleARGBRowDown2Box_MSA - 5.8x
ScaleARGBRowDownEven_MSA - 1.2x
ScaleARGBRowDownEvenBox_MSA - 3.7x
ScaleARGBRowDown2_Any_MSA - 2.6x
ScaleARGBRowDown2Linear_Any_MSA - 13.5x
ScaleARGBRowDown2Box_Any_MSA - 5.3x
ScaleARGBRowDownEven_Any_MSA - 1.2x
ScaleARGBRowDownEvenBox_Any_MSA - 3.7x
Review URL: https://codereview.chromium.org/2527983002 .
2016-12-07 11:47:15 +05:30
Manojkumar Bhosale
83f460be33
Add MSA optimized ARGB Multiply/Add/Subtract row functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBMultiplyRow_MSA - 1.4x
ARGBAddRow_MSA - 8.6x
ARGBSubtractRow_MSA - 8.6x
ARGBMultiplyRow_Any_MSA - 1.35x
ARGBAddRow_Any_MSA - 7.3x
ARGBSubtractRow_Any_MSA - 7.2x
Performance Gain (vs C non-vectorized)
ARGBMultiplyRow_MSA - 4.4x
ARGBAddRow_MSA - 27x
ARGBSubtractRow_MSA - 22x
ARGBMultiplyRow_Any_MSA - 3.5x
ARGBAddRow_Any_MSA - 23x
ARGBSubtractRow_Any_MSA - 18x
Review URL: https://codereview.chromium.org/2529983002 .
2016-12-02 15:21:10 +05:30
Frank Barchard
da0c29dada
Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
ARGBToRGB565Row_MSA - ~1.6x
ARGBToRGB565Row_Any_MSA - ~1.6x
ARGBToARGB1555Row_MSA - ~1.3x
ARGBToARGB1555Row_Any_MSA - ~1.3x
ARGBToARGB4444Row_MSA - ~3.8x
ARGBToARGB4444Row_Any_MSA - ~3.8x
ARGBToUV444Row_MSA - ~2.4x
ARGBToUV444Row_Any_MSA - ~2.4x
Performance Gain (vs C non-vectorized)
ARGBToRGB565Row_MSA - ~2.8x
ARGBToRGB565Row_Any_MSA - ~2.8x
ARGBToARGB1555Row_MSA - ~2.2x
ARGBToARGB1555Row_Any_MSA - ~2.2x
ARGBToARGB4444Row_MSA - ~6.8x
ARGBToARGB4444Row_Any_MSA - ~6.6x
ARGBToUV444Row_MSA - ~6.7x
ARGBToUV444Row_Any_MSA - ~6.7x
Review URL: https://codereview.chromium.org/2520003004 .
2016-11-22 10:47:55 -08:00
Frank Barchard
b1504a8e48
Add MSA optimized ARGBToRGB24Row_MSA and ARGBToRAWRow_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Review URL: https://codereview.chromium.org/2487913004 .
2016-11-18 15:05:10 -08:00
Frank Barchard
3028e1bd97
clang-format row_gcc.cc with some functions disabled
...
BUG=libyuv:654
TEST=try bots build
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2484083003 .
2016-11-07 18:37:29 -08:00
Frank Barchard
e62309f259
clang-format libyuv
...
BUG=libyuv:654
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
f2c27dafa2
HalfFloat neon armv7 fix for destination pointer.
...
Improved unittests detect different in arm64 rounding.
TEST=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*Half* -a "--libyuv_width=640 --libyuv_height=360"
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2478313004 .
2016-11-07 12:13:04 -08:00
Frank Barchard
eca08525cb
HalfFloat Neon for ARMv7.
...
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.
TEST=out/Release/libyuv_unittest --gtest_filter=*Half*
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2467723002 .
2016-11-01 11:36:51 -07:00
Frank Barchard
10ce829bad
Add MSA optimized I422ToRGB565Row_MSA, I422ToARGB4444Row_MSA and I422ToARGB1555Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
I422ToRGB565Row_MSA : ~1.5x
I422ToRGB565Row_Any_MSA : ~1.5x
I422ToARGB4444Row_MSA : ~1.4x
I422ToARGB4444Row_Any_MSA : ~1.4x
I422ToARGB1555Row_MSA : ~1.4x
I422ToARGB1555Row_Any_MSA : ~1.4x
Performance Gain (vs C non-vectorized)
I422ToRGB565Row_MSA : ~6.8x
I422ToRGB565Row_Any_MSA : ~6.8x
I422ToARGB4444Row_MSA : ~6.6x
I422ToARGB4444Row_Any_MSA : ~6.6x
I422ToARGB1555Row_MSA : ~6.6x
I422ToARGB1555Row_Any_MSA : ~6.6x
Review URL: https://codereview.chromium.org/2445343007 .
2016-10-27 10:47:35 -07:00
Frank Barchard
532f5708a9
Add MSA optimized I422AlphaToARGBRow_MSA and I422ToRGB24Row_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gain (vs C vectorized)
I422AlphaToARGBRow_MSA : ~1.4x
I422AlphaToARGBRow_Any_MSA : ~1.4x
I422ToRGB24Row_MSA : ~4.8x
I422ToRGB24Row_Any_MSA : ~4.8x
Performance Gain (vs C non-vectorized)
I422AlphaToARGBRow_MSA : ~7.0x
I422AlphaToARGBRow_Any_MSA : ~7.0x
I422ToRGB24Row_MSA : ~7.9x
I422ToRGB24Row_Any_MSA : ~7.7x
Review URL: https://codereview.chromium.org/2454433003 .
2016-10-26 11:12:17 -07:00
Frank Barchard
2488b3105b
White spaces, comments and lint fixes for msa.
...
no functional changes.
TBR=kjellander@chromium.org
BUG=libyuv:634
Review URL: https://codereview.chromium.org/2446313002 .
2016-10-25 11:36:54 -07:00
Frank Barchard
c2073823b4
use __OPTIMIZE__ macro to determine debug vs release.
...
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.
BUG=libyuv:602
TEST=untested
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2451503002 .
2016-10-24 18:02:48 -07:00
Frank Barchard
f5d5bd88d6
Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance Gains :- (vs C vectorized)
I422ToARGBRow_MSA : ~1.6x
I422ToRGBARow_MSA : ~1.6x
I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x
Performance Gains :- (vs C non-vectorized)
I422ToARGBRow_MSA : ~7x
I422ToRGBARow_MSA : ~7x
I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x
Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
Review URL: https://codereview.chromium.org/2430313005 .
2016-10-24 15:37:08 -07:00
Frank Barchard
451af5e922
scale by 1 for neon implemented
...
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fcvtn v4.4h, v2.4s \n" // 8 floatsgit
"fcvtn2 v4.8h, v1.4s \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
:
: "cc", "memory", "v1", "v2", "v4"
);
}
void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
asm volatile (
"1: \n"
MEMACCESS(0)
"ld1 {v1.16b}, [%0], #16 \n" // load 8 shorts
"subs %w2, %w2, #8 \n" // 8 pixels per loop
"uxtl v2.4s, v1.4h \n" // 8 int's
"uxtl2 v1.4s, v1.8h \n"
"scvtf v2.4s, v2.4s \n" // 8 floats
"scvtf v1.4s, v1.4s \n"
"fmul v2.4s, v2.4s, %3.s[0] \n" // adjust exponent
"fmul v1.4s, v1.4s, %3.s[0] \n"
"uqshrn v4.4h, v2.4s, #13 \n" // isolate halffloat
"uqshrn2 v4.8h, v1.4s, #13 \n"
MEMACCESS(1)
"st1 {v4.16b}, [%1], #16 \n" // store 8 shorts
"b.gt 1b \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(width) // %2
: "w"(scale * 1.9259299444e-34f) // %3
: "cc", "memory", "v1", "v2", "v4"
);
}
TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2430313008 .
2016-10-21 14:30:03 -07:00
Frank Barchard
550cf829fb
HalfFloat avx2 unpack bug fix.
...
AVX unpack parameters were reverse ordered causing incorrect results
on AVX2 hardware.
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half*
BUG=libyuv:560
R=wangcheng@google.com
Review URL: https://codereview.chromium.org/2438893002 .
2016-10-20 15:49:00 -07:00
Frank Barchard
f553db2d30
HalfFloatPlane unittest for denormal half floats
...
Halffloats have a limited range. It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero. This test ensures they behave the same in C and SIMD and tests the performance of denormals.
TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org
Review URL: https://codereview.chromium.org/2424233004 .
2016-10-19 18:13:01 -07:00
Frank Barchard
78c58ab8aa
Add MSA optimized ARGB4444ToI420 and ARGB4444ToARGB functions
...
R=fbarchard@google.com
BUG=libyuv:634
Performance gains : (Auto-vectorized C vs MSA SIMD)
ARGB4444ToYRow_MSA : ~3.0x
ARGB4444ToUVRow_MSA : ~1.8x
ARGB4444ToARGBRow_MSA : ~3.4x
ARGB4444ToYRow_Any_MSA : ~2.8x
ARGB4444ToUVRow_Any_MSA : ~1.7x
ARGB4444ToARGBRow_Any_MSA : ~3.2x
Review URL: https://codereview.chromium.org/2421843002 .
2016-10-19 11:10:51 -07:00
Frank Barchard
e16e3a629f
cpu_id cleanup. no functional change.
...
remove old comment about initialize to zero.
remove ifdef and replace with macro defined to zero.
BUG=None
TEST=try bots
R=kjellander@chromium.org
Review URL: https://codereview.chromium.org/2425623004 .
2016-10-18 12:26:02 -07:00
Frank Barchard
2d80fc3133
Port HalfFloatRow_SSE2 to AVX2 but not using F16C.
...
R=wangcheng@google.com , hubbe@chromium.org
BUG=libyuv:560
Review URL: https://codereview.chromium.org/2421993002 .
2016-10-14 19:01:41 -07:00