1415 Commits

Author SHA1 Message Date
Manojkumar Bhosale
45b176d153 Add MSA optimized Interpolate/MergeUV/Misc functions
BUG=libyuv:634

Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126

Performance Gain (vs C auto-vectorized)
InterpolateRow_MSA      - ~3.3x
InterpolateRow_Any_MSA  - ~2.5x
ARGBSetRow_MSA          - ~1.0x
ARGBSetRow_Any_MSA      - ~1.0x
ARGBToRGB24Row_MSA      - ~1.9x
ARGBToRGB24Row_Any_MSA  - ~1.6x
MergeUVRow_MSA          - ~1.6x
MergeUVRow_Any_MSA      - ~1.2x

Performance Gain (vs C non-vectorized)
InterpolateRow_MSA      - ~11.3x
InterpolateRow_Any_MSA  - ~ 7.9x
ARGBSetRow_MSA          - ~ 6.2x
ARGBSetRow_Any_MSA      - ~ 4.0x
ARGBToRGB24Row_MSA      - ~ 9.9x
ARGBToRGB24Row_Any_MSA  - ~ 8.4x
MergeUVRow_MSA          - ~12.7x
MergeUVRow_Any_MSA      - ~ 8.0x

Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
Reviewed-on: https://chromium-review.googlesource.com/445817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-23 01:42:22 +00:00
Manojkumar Bhosale
eed66b2028 Add MSA optimized I444/I400/J400/YUY2/UYVY to ARGB row functions
BUG=libyuv:634

Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be

Performance Gain (vs C auto-vectorized)
I444ToARGBRow_MSA       - ~1.6x
I444ToARGBRow_Any_MSA   - ~1.6x
I400ToARGBRow_MSA       - ~5.5x
I400ToARGBRow_Any_MSA   - ~5.3x
J400ToARGBRow_MSA       - ~1.0x
J400ToARGBRow_Any_MSA   - ~1.0x
YUY2ToARGBRow_MSA       - ~1.6x
YUY2ToARGBRow_Any_MSA   - ~1.6x
UYVYToARGBRow_MSA       - ~1.6x
UYVYToARGBRow_Any_MSA   - ~1.6x

Performance Gain (vs C non-vectorized)
I444ToARGBRow_MSA       - ~7.3x
I444ToARGBRow_Any_MSA   - ~7.1x
I400ToARGBRow_MSA       - ~5.5x
I400ToARGBRow_Any_MSA   - ~5.2x
J400ToARGBRow_MSA       - ~6.8x
J400ToARGBRow_Any_MSA   - ~5.7x
YUY2ToARGBRow_MSA       - ~7.2x
YUY2ToARGBRow_Any_MSA   - ~7.0x
UYVYToARGBRow_MSA       - ~7.1x
UYVYToARGBRow_Any_MSA   - ~6.9x

Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be
Reviewed-on: https://chromium-review.googlesource.com/439246
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-21 23:22:07 +00:00
Frank Barchard
bbe8c233f2 scale warning fixes for unused parameters
BUG=libyuv:680
TEST=builds and runs with no warnings

Change-Id: I7d60ef44292fa6ad4f7c4e2e2657359b864d2dab
Reviewed-on: https://chromium-review.googlesource.com/442670
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-02-15 21:38:59 +00:00
Frank Barchard
20ecb366ea I420ToI400 fix for unused u and v arguments
TBR=kjellander@chromium.org
BUG=libyuv:679
TEST=None

Change-Id: I830f31e0de58c967cb02c71f422df4c19c94c367
Reviewed-on: https://chromium-review.googlesource.com/441949
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-14 02:54:58 +00:00
Frank Barchard
1d5630ba2b mjpeg_decoder - fix for unused parameter.
TBR=kjellander@chromium.org
BUG=libyuv:678
TEST=None

Change-Id: I5206de2026b56f155531a76f94aeb6cb1b0cd6c9
Reviewed-on: https://chromium-review.googlesource.com/442090
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-14 02:22:57 +00:00
Frank Barchard
177d344ed7 enable unused parameter warning
android.mk builds have unused parameter warning on by default.
This change for GN makes libyuv build the same way.

BUG=libyuv:681
TEST=build on linux with clang and ninja.

Change-Id: I76c627d446b96653f147725bca915d94a42ce9a6
Reviewed-on: https://chromium-review.googlesource.com/441194
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-14 01:33:57 +00:00
Frank Barchard
6825b161d7 HalfFloat SSE2/AVX2 optimized port scheduling.
Uses 1 add instead of 2 leas to reduce port pressure on ports 1 and 5
used for SIMD instructions.

BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -arch HSW out/Release/obj/libyuv/row_gcc.o

Change-Id: I3965ee5dcb49941a535efa611b5988d977f5b65c
Reviewed-on: https://chromium-review.googlesource.com/433391
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-11 01:02:06 +00:00
Frank Barchard
7a54d0a302 row_msa fix clang build warnings.
BUG=libyuv:634
TEST=untested

Change-Id: Ib7f0c99e669ddba0a1efbd15895880281ad6303e
Reviewed-on: https://chromium-review.googlesource.com/435303
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-02-07 09:36:28 +00:00
Frank Barchard
76e7f104ae documentation updates
BUG=None
TEST=Untested

Change-Id: I8ab95654255d1aa9cf05a664ecf59ee6c0757e66
Reviewed-on: https://chromium-review.googlesource.com/434941
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-02 18:31:32 +00:00
Frank Barchard
0fb5675902 Fix dspr2 rename changes. Fix unused variables
TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=try bots

Review-Url: https://codereview.chromium.org/2675583002 .
2017-02-01 18:51:06 -08:00
Manojkumar Bhosale
54ce8f23d6 Add MSA optimized ARGB/ABGR/BGRA/RGBA To Y/UV row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C auto-vectorized)
ARGBToYJRow_MSA       - ~3.2x
ARGBToYJRow_Any_MSA   - ~2.7x
BGRAToYRow_MSA        - ~3.2x
BGRAToYRow_Any_MSA    - ~2.7x
ABGRToYRow_MSA        - ~3.2x
ABGRToYRow_Any_MSA    - ~2.6x
RGBAToYRow_MSA        - ~3.1x
RGBAToYRow_Any_MSA    - ~2.7x
ARGBToUVJRow_MSA      - ~5.5x
ARGBToUVJRow_Any_MSA  - ~4.5x
BGRAToUVRow_MSA       - ~2.1x
BGRAToUVRow_Any_MSA   - ~2.0x
ABGRToUVRow_MSA       - ~2.1x
ABGRToUVRow_Any_MSA   - ~1.9x
RGBAToUVRow_MSA       - ~2.2x
RGBAToUVRow_Any_MSA   - ~1.9x

Performance Gain (vs C non-vectorized)
ARGBToYJRow_MSA       - ~10.9x
ARGBToYJRow_Any_MSA   -  ~9.2x
BGRAToYRow_MSA        - ~10.9x
BGRAToYRow_Any_MSA    -  ~9.3x
ABGRToYRow_MSA        - ~11.0x
ABGRToYRow_Any_MSA    -  ~9.3x
RGBAToYRow_MSA        - ~10.9x
RGBAToYRow_Any_MSA    -  ~9.1x
ARGBToUVJRow_MSA      - ~12.4x
ARGBToUVJRow_Any_MSA  - ~10.5x
BGRAToUVRow_MSA       -  ~4.7x
BGRAToUVRow_Any_MSA   -  ~4.4x
ABGRToUVRow_MSA       -  ~4.7x
ABGRToUVRow_Any_MSA   -  ~4.5x
RGBAToUVRow_MSA       -  ~4.8x
RGBAToUVRow_Any_MSA   -  ~4.4x

Review-Url: https://codereview.chromium.org/2641153003 .
2017-02-01 10:31:28 +05:30
Frank Barchard
b89bcda273 Add comments for ARGBToUV_C and ARGBToUVJ_C
ARGBToUV_C and ARGBToUVJ_C are generated functions with subtle
difference in rounding.  Adding comment to make them easier to find.

TBR=kjellander@chromium.org
BUG=libyuv:634
TEST=untested

Change-Id: I9912d256a1e04c58475d33bdb472c37484f6cab9
Reviewed-on: https://chromium-review.googlesource.com/434980
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-01-30 23:44:05 +00:00
Frank Barchard
54f2094a5e Rename mips source files to dspr2.
Add MSA detect to unittest.
Change macro to disable DSPR2 code to LIBYUV_DISABLE_DSPR2

BUG=libyuv:634
TEST=try bots

Change-Id: I9e0aa2452204fc529bb6f9e6fd93c4e1c379bba6
Reviewed-on: https://chromium-review.googlesource.com/433463
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-01-27 23:11:43 +00:00
Frank Barchard
749e316ed8 Remove commented out code
TEST=None
BUG=libyuv:672
Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169

Change-Id: Ia5949fb20913e4397e62d6a302c89a27dbd7e169
Reviewed-on: https://chromium-review.googlesource.com/430321
Reviewed-by: Aaron Gable <agable@chromium.org>
2017-01-20 02:03:12 +00:00
Manojkumar Bhosale
09b8c971b3 Add MSA optimized NV12/21 To RGB row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C auto-vectorized)
NV12ToARGBRow_MSA       - ~1.5x
NV12ToARGBRow_Any_MSA   - ~1.4x
NV12ToRGB565Row_MSA     - ~1.4x
NV12ToRGB565Row_Any_MSA - ~1.4x
NV21ToARGBRow_MSA       - ~1.5x
NV21ToARGBRow_Any_MSA   - ~1.5x
SobelRow_MSA            - ~4.3x
SobelRow_Any_MSA        - ~3.4x
SobelToPlaneRow_MSA     - ~8.0x
SobelToPlaneRow_Any_MSA - ~4.7x
SobelXYRow_MSA          - ~3.0x
SobelXYRow_Any_MSA      - ~2.5x

Performance Gain (vs C non-vectorized)
NV12ToARGBRow_MSA       - ~6.5x
NV12ToARGBRow_Any_MSA   - ~6.5x
NV12ToRGB565Row_MSA     - ~6.2x
NV12ToRGB565Row_Any_MSA - ~6.1x
NV21ToARGBRow_MSA       - ~6.5x
NV21ToARGBRow_Any_MSA   - ~6.5x
SobelRow_MSA            - ~14.5x
SobelRow_Any_MSA        - ~11.3x
SobelToPlaneRow_MSA     - ~34.2x
SobelToPlaneRow_Any_MSA - ~19.4x
SobelXYRow_MSA          - ~11.1x
SobelXYRow_Any_MSA      - ~9.1x

Review-Url: https://codereview.chromium.org/2636483002 .
2017-01-18 09:24:39 +05:30
Frank Barchard
a7c87e19f0 add Intel Code Analyst markers
add macros to enable/disable code analyst around blocks of code.

Normally these macros should not be used, but if performance
details are wanted for intel code, enable them around the code
and then run via the iaca tool, available on the intel website.

BUG=libyuv:670
TEST=~/iaca-lin64/bin/iaca.sh -64 out/Release/libyuv_unittest
R=wangcheng@google.com

Review-Url: https://codereview.chromium.org/2626193002 .
2017-01-13 15:50:24 -08:00
Manojkumar Bhosale
73a6f100a9 Add MSA optimized rotate functions (used 16x16 transpose)
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
TransposeWx16_MSA        - ~6.0x
TransposeWx16_Any_MSA    - ~4.7x
TransposeUVWx16_MSA      - ~6.3x
TransposeUVWx16_Any_MSA  - ~5.4x

Performance Gain (vs C non-vectorized)
TransposeWx16_MSA        - ~6.0x
TransposeWx16_Any_MSA    - ~4.8x
TransposeUVWx16_MSA      - ~6.3x
TransposeUVWx16_Any_MSA  - ~5.4x

Review-Url: https://codereview.chromium.org/2617703002 .
2017-01-13 15:50:02 +05:30
Manojkumar Bhosale
7c64163ff4 Add MSA optimized RAW/RGB/ARGB to ARGB/Y/UV row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGB1555ToARGBRow_MSA     - 1.85
ARGB1555ToARGBRow_Any_MSA - 1.82
RGB565ToARGBRow_MSA       - 2.14
RGB565ToARGBRow_Any_MSA   - 2.08
RGB24ToARGBRow_MSA        - 8.57
RGB24ToARGBRow_Any_MSA    - 7.42
RAWToARGBRow_MSA          - 8.57
RAWToARGBRow_Any_MSA      - 7.42
ARGB1555ToYRow_MSA        - 2.60
ARGB1555ToYRow_Any_MSA    - 2.47
RGB565ToYRow_MSA          - 2.45
RGB565ToYRow_Any_MSA      - 2.33
RGB24ToYRow_MSA           - 2.23
RGB24ToYRow_Any_MSA       - 2.01
RAWToYRow_MSA             - 2.25
RAWToYRow_Any_MSA         - 2.02
ARGB1555ToUVRow_MSA       - 1.40
ARGB1555ToUVRow_Any_MSA   - 1.37
RGB565ToUVRow_MSA         - 1.68
RGB565ToUVRow_Any_MSA     - 1.63
RGB24ToUVRow_MSA          - 3.02
RGB24ToUVRow_Any_MSA      - 2.87
RAWToUVRow_MSA            - 3.04
RAWToUVRow_Any_MSA        - 2.85

Performance Gain (vs C non-vectorized)
ARGB1555ToARGBRow_MSA     - 4.66
ARGB1555ToARGBRow_Any_MSA - 4.45
RGB565ToARGBRow_MSA       - 5.58
RGB565ToARGBRow_Any_MSA   - 5.34
RGB24ToARGBRow_MSA        - 8.57
RGB24ToARGBRow_Any_MSA    - 7.42
RAWToARGBRow_MSA          - 8.57
RAWToARGBRow_Any_MSA      - 7.42
ARGB1555ToYRow_MSA        - 6.38
ARGB1555ToYRow_Any_MSA    - 5.98
RGB565ToYRow_MSA          - 6.42
RGB565ToYRow_Any_MSA      - 6.05
RGB24ToYRow_MSA           - 7.87
RGB24ToYRow_Any_MSA       - 7.01
RAWToYRow_MSA             - 7.98
RAWToYRow_Any_MSA         - 7.01
ARGB1555ToUVRow_MSA       - 5.39
ARGB1555ToUVRow_Any_MSA   - 5.06
RGB565ToUVRow_MSA         - 6.39
RGB565ToUVRow_Any_MSA     - 5.90
RGB24ToUVRow_MSA          - 3.04
RGB24ToUVRow_Any_MSA      - 2.87
RAWToUVRow_MSA            - 3.04
RAWToUVRow_Any_MSA        - 2.88

Review-Url: https://codereview.chromium.org/2600713002 .
2017-01-13 15:43:37 +05:30
Frank Barchard
cb11559408 ConvertToARGB: Allows rotation on ARGB input
BUG=libyuv:668
TEST=run unit tests
R=fbarchard@google.com

Review-Url: https://codereview.chromium.org/2620183002 .
2017-01-11 14:38:25 -08:00
Frank Barchard
000d2fa91a Libyuv MIPS DSPR2 optimizations.
Optimized functions:

I444ToARGBRow_DSPR2
I422ToARGB4444Row_DSPR2
I422ToARGB1555Row_DSPR2
NV12ToARGBRow_DSPR2
BGRAToUVRow_DSPR2
BGRAToYRow_DSPR2
ABGRToUVRow_DSPR2
ARGBToYRow_DSPR2
ABGRToYRow_DSPR2
RGBAToUVRow_DSPR2
RGBAToYRow_DSPR2
ARGBToUVRow_DSPR2
RGB24ToARGBRow_DSPR2
RAWToARGBRow_DSPR2
RGB565ToARGBRow_DSPR2
ARGB1555ToARGBRow_DSPR2
ARGB4444ToARGBRow_DSPR2
ScaleAddRow_DSPR2

Bug-fixes in functions:

ScaleRowDown2_DSPR2
ScaleRowDown4_DSPR2

BUG=

Review-Url: https://codereview.chromium.org/2626123003 .
2017-01-11 12:19:13 -08:00
Manojkumar Bhosale
288bfbefb5 Add MSA optimized remaining scale row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ScaleRowDown2_MSA            - ~22.3x
ScaleRowDown2_Any_MSA        - ~19.9x
ScaleRowDown2Linear_MSA      - ~31.2x
ScaleRowDown2Linear_Any_MSA  - ~29.4x
ScaleRowDown2Box_MSA         - ~20.1x
ScaleRowDown2Box_Any_MSA     - ~19.6x
ScaleRowDown4_MSA            - ~11.7x
ScaleRowDown4_Any_MSA        - ~11.2x
ScaleRowDown4Box_MSA         - ~15.1x
ScaleRowDown4Box_Any_MSA     - ~15.1x
ScaleRowDown38_MSA           - ~1x
ScaleRowDown38_Any_MSA       - ~1x
ScaleRowDown38_2_Box_MSA     - ~1.7x
ScaleRowDown38_2_Box_Any_MSA - ~1.7x
ScaleRowDown38_3_Box_MSA     - ~1.7x
ScaleRowDown38_3_Box_Any_MSA - ~1.7x
ScaleAddRow_MSA              - ~1.2x
ScaleAddRow_Any_MSA          - ~1.15x

Performance Gain (vs C non-vectorized)
ScaleRowDown2_MSA            - ~22.4x
ScaleRowDown2_Any_MSA        - ~19.8x
ScaleRowDown2Linear_MSA      - ~31.6x
ScaleRowDown2Linear_Any_MSA  - ~29.4x
ScaleRowDown2Box_MSA         - ~20.1x
ScaleRowDown2Box_Any_MSA     - ~19.6x
ScaleRowDown4_MSA            - ~11.7x
ScaleRowDown4_Any_MSA        - ~11.2x
ScaleRowDown4Box_MSA         - ~15.1x
ScaleRowDown4Box_Any_MSA     - ~15.1x
ScaleRowDown38_MSA           - ~3.2x
ScaleRowDown38_Any_MSA       - ~3.2x
ScaleRowDown38_2_Box_MSA     - ~2.4x
ScaleRowDown38_2_Box_Any_MSA - ~2.3x
ScaleRowDown38_3_Box_MSA     - ~2.9x
ScaleRowDown38_3_Box_Any_MSA - ~2.8x
ScaleAddRow_MSA              - ~8x
ScaleAddRow_Any_MSA          - ~7.46x

Review-Url: https://codereview.chromium.org/2559683002 .
2016-12-21 13:39:44 +05:30
Manojkumar Bhosale
a899dea251 Add MSA optimized ARGB Attenuate/RGB565/Shuffle/Shader/Gray/Sepia row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBAttenuateRow_MSA          - ~1.1x
ARGBAttenuateRow_Any_MSA      - ~1.1x
ARGBToRGB565DitherRow_MSA     - ~6.4x
ARGBToRGB565DitherRow_Any_MSA - ~6.2x
ARGBShuffleRow_MSA            - ~5.1x
ARGBShuffleRow_Any_MSA        - ~1.9x
ARGBShadeRow_MSA              - ~1.1x
ARGBGrayRow_MSA               - ~2.6x
ARGBSepiaRow_MSA              - ~11.6x

Performance Gain (vs C non-vectorized)
ARGBAttenuateRow_MSA          - ~2.46x
ARGBAttenuateRow_Any_MSA      - ~2.45x
ARGBToRGB565DitherRow_MSA     - ~9.4x
ARGBToRGB565DitherRow_Any_MSA - ~12.5x
ARGBShuffleRow_MSA            - ~5.2x
ARGBShuffleRow_Any_MSA        - ~1.9x
ARGBShadeRow_MSA              - ~4.3x
ARGBGrayRow_MSA               - ~10.5x
ARGBSepiaRow_MSA              - ~12.2x

Review-Url: https://codereview.chromium.org/2559693002 .
2016-12-15 12:06:02 +05:30
Manojkumar Bhosale
6fa5e4eb78 Add MSA optimized TransposeWx8_MSA and TransposeUVWx8_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
TransposeWx8_MSA          - ~2.7x
TransposeWx8_Any_MSA      - ~2.1x
TransposeUVWx8_MSA        - ~2.5x
TransposeUVWx8_Any_MSA    - ~2.7x

Performance Gain (vs C non-vectorized)
TransposeWx8_MSA          - ~4.6x
TransposeWx8_Any_MSA      - ~2.9x
TransposeUVWx8_MSA        - ~4.4x
TransposeUVWx8_Any_MSA    - ~3.7x

Review URL: https://codereview.chromium.org/2553403002 .
2016-12-15 10:06:01 +05:30
Frank Barchard
b18fd21d3c Android420ToI420 - use ptrdiff_t for difference of u and v pointers
The difference was assigned to an int, causing a warning on Visual C.

BUG=662
TEST=tested with try bots.
R=devangelakos@google.com

Review-Url: https://codereview.chromium.org/2574373002 .
2016-12-14 11:53:55 -08:00
Frank Barchard
dde8ba7009 ConvertFromI420: use halfstride instead of halfwidth
BUG=libyuv:660
TEST=try bots
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2554213003 .
2016-12-07 10:16:16 -08:00
Manojkumar Bhosale
56b5bbb0be Add MSA optimized ARGB scaling functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ScaleARGBRowDown2_MSA           - ~2.6x
ScaleARGBRowDown2Linear_MSA     - ~7.9x
ScaleARGBRowDown2Box_MSA        - ~3.7x
ScaleARGBRowDownEven_MSA        - ~1.2x
ScaleARGBRowDownEvenBox_MSA     - ~3.5x

ScaleARGBRowDown2_Any_MSA       - ~2.6x
ScaleARGBRowDown2Linear_Any_MSA - ~7.9x
ScaleARGBRowDown2Box_Any_MSA    - ~3.6x
ScaleARGBRowDownEven_Any_MSA    - ~1.2x
ScaleARGBRowDownEvenBox_Any_MSA - ~3.5x

Performance Gain (vs C non-vectorized)
ScaleARGBRowDown2_MSA           - 2.6x
ScaleARGBRowDown2Linear_MSA     - 13.5x
ScaleARGBRowDown2Box_MSA        - 5.8x
ScaleARGBRowDownEven_MSA        - 1.2x
ScaleARGBRowDownEvenBox_MSA     - 3.7x

ScaleARGBRowDown2_Any_MSA       - 2.6x
ScaleARGBRowDown2Linear_Any_MSA - 13.5x
ScaleARGBRowDown2Box_Any_MSA    - 5.3x
ScaleARGBRowDownEven_Any_MSA    - 1.2x
ScaleARGBRowDownEvenBox_Any_MSA - 3.7x

Review URL: https://codereview.chromium.org/2527983002 .
2016-12-07 11:47:15 +05:30
Manojkumar Bhosale
83f460be33 Add MSA optimized ARGB Multiply/Add/Subtract row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBMultiplyRow_MSA       - 1.4x
ARGBAddRow_MSA            - 8.6x
ARGBSubtractRow_MSA       - 8.6x

ARGBMultiplyRow_Any_MSA   - 1.35x
ARGBAddRow_Any_MSA        - 7.3x
ARGBSubtractRow_Any_MSA   - 7.2x

Performance Gain (vs C non-vectorized)
ARGBMultiplyRow_MSA       - 4.4x
ARGBAddRow_MSA            - 27x
ARGBSubtractRow_MSA       - 22x

ARGBMultiplyRow_Any_MSA   - 3.5x
ARGBAddRow_Any_MSA        - 23x
ARGBSubtractRow_Any_MSA   - 18x

Review URL: https://codereview.chromium.org/2529983002 .
2016-12-02 15:21:10 +05:30
Frank Barchard
da0c29dada Add MSA optimized ARGBToRGB565Row_MSA, ARGBToARGB1555Row_MSA, ARGBToARGB4444Row_MSA, ARGBToUV444Row_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
ARGBToRGB565Row_MSA       - ~1.6x
ARGBToRGB565Row_Any_MSA   - ~1.6x
ARGBToARGB1555Row_MSA     - ~1.3x
ARGBToARGB1555Row_Any_MSA - ~1.3x
ARGBToARGB4444Row_MSA     - ~3.8x
ARGBToARGB4444Row_Any_MSA - ~3.8x
ARGBToUV444Row_MSA        - ~2.4x
ARGBToUV444Row_Any_MSA    - ~2.4x

Performance Gain (vs C non-vectorized)
ARGBToRGB565Row_MSA       - ~2.8x
ARGBToRGB565Row_Any_MSA   - ~2.8x
ARGBToARGB1555Row_MSA     - ~2.2x
ARGBToARGB1555Row_Any_MSA - ~2.2x
ARGBToARGB4444Row_MSA     - ~6.8x
ARGBToARGB4444Row_Any_MSA - ~6.6x
ARGBToUV444Row_MSA        - ~6.7x
ARGBToUV444Row_Any_MSA    - ~6.7x

Review URL: https://codereview.chromium.org/2520003004 .
2016-11-22 10:47:55 -08:00
Frank Barchard
b1504a8e48 Add MSA optimized ARGBToRGB24Row_MSA and ARGBToRAWRow_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2487913004 .
2016-11-18 15:05:10 -08:00
Frank Barchard
3028e1bd97 clang-format row_gcc.cc with some functions disabled
BUG=libyuv:654
TEST=try bots build
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2484083003 .
2016-11-07 18:37:29 -08:00
Frank Barchard
e62309f259 clang-format libyuv
BUG=libyuv:654
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2469353005 .
2016-11-07 17:37:23 -08:00
Frank Barchard
f2c27dafa2 HalfFloat neon armv7 fix for destination pointer.
Improved unittests detect different in arm64 rounding.

TEST=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*Half* -a "--libyuv_width=640 --libyuv_height=360"
BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2478313004 .
2016-11-07 12:13:04 -08:00
Frank Barchard
eca08525cb HalfFloat Neon for ARMv7.
64 bit version made similar to 32 bit with registers 1 for load and store results, and 2 and 3 as expanded float temporary values.

TEST=out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2467723002 .
2016-11-01 11:36:51 -07:00
Frank Barchard
10ce829bad Add MSA optimized I422ToRGB565Row_MSA, I422ToARGB4444Row_MSA and I422ToARGB1555Row_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422ToRGB565Row_MSA             : ~1.5x
I422ToRGB565Row_Any_MSA         : ~1.5x
I422ToARGB4444Row_MSA           : ~1.4x
I422ToARGB4444Row_Any_MSA       : ~1.4x
I422ToARGB1555Row_MSA           : ~1.4x
I422ToARGB1555Row_Any_MSA       : ~1.4x

Performance Gain (vs C non-vectorized)
I422ToRGB565Row_MSA             : ~6.8x
I422ToRGB565Row_Any_MSA         : ~6.8x
I422ToARGB4444Row_MSA           : ~6.6x
I422ToARGB4444Row_Any_MSA       : ~6.6x
I422ToARGB1555Row_MSA           : ~6.6x
I422ToARGB1555Row_Any_MSA       : ~6.6x

Review URL: https://codereview.chromium.org/2445343007 .
2016-10-27 10:47:35 -07:00
Frank Barchard
532f5708a9 Add MSA optimized I422AlphaToARGBRow_MSA and I422ToRGB24Row_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C vectorized)
I422AlphaToARGBRow_MSA      : ~1.4x
I422AlphaToARGBRow_Any_MSA  : ~1.4x
I422ToRGB24Row_MSA          : ~4.8x
I422ToRGB24Row_Any_MSA      : ~4.8x

Performance Gain (vs C non-vectorized)
I422AlphaToARGBRow_MSA      : ~7.0x
I422AlphaToARGBRow_Any_MSA  : ~7.0x
I422ToRGB24Row_MSA          : ~7.9x
I422ToRGB24Row_Any_MSA      : ~7.7x

Review URL: https://codereview.chromium.org/2454433003 .
2016-10-26 11:12:17 -07:00
Frank Barchard
2488b3105b White spaces, comments and lint fixes for msa.
no functional changes.

TBR=kjellander@chromium.org
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2446313002 .
2016-10-25 11:36:54 -07:00
Frank Barchard
c2073823b4 use __OPTIMIZE__ macro to determine debug vs release.
Debug builds of x86 gcc/clang can run out of register.
Previously NDEBUG or _DEBUG was used to detect a debug build.
But those macros are not set by gentoo builds.
This CL switches to the compiler predefine __OPTIMIZE__ which is
built into clang and gcc.

BUG=libyuv:602
TEST=untested
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2451503002 .
2016-10-24 18:02:48 -07:00
Frank Barchard
f5d5bd88d6 Add MSA optimized I422ToARGBRow_MSA and I422ToRGBARow_MSA functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gains :- (vs C vectorized)

I422ToARGBRow_MSA     : ~1.6x
I422ToRGBARow_MSA     : ~1.6x

I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x

Performance Gains :- (vs C non-vectorized)

I422ToARGBRow_MSA     : ~7x
I422ToRGBARow_MSA     : ~7x

I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x

Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.

Review URL: https://codereview.chromium.org/2430313005 .
2016-10-24 15:37:08 -07:00
Frank Barchard
451af5e922 scale by 1 for neon implemented
void HalfFloat1Row_NEON(const uint16* src, uint16* dst, float, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fcvtn      v4.4h, v2.4s                   \n"  // 8 floatsgit
    "fcvtn2     v4.8h, v1.4s                   \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  :
  : "cc", "memory", "v1", "v2", "v4"
  );
}

void HalfFloatRow_NEON(const uint16* src, uint16* dst, float scale, int width) {
  asm volatile (
  "1:                                          \n"
    MEMACCESS(0)
    "ld1        {v1.16b}, [%0], #16            \n"  // load 8 shorts
    "subs       %w2, %w2, #8                   \n"  // 8 pixels per loop
    "uxtl       v2.4s, v1.4h                   \n"  // 8 int's
    "uxtl2      v1.4s, v1.8h                   \n"
    "scvtf      v2.4s, v2.4s                   \n"  // 8 floats
    "scvtf      v1.4s, v1.4s                   \n"
    "fmul       v2.4s, v2.4s, %3.s[0]          \n"  // adjust exponent
    "fmul       v1.4s, v1.4s, %3.s[0]          \n"
    "uqshrn     v4.4h, v2.4s, #13              \n"  // isolate halffloat
    "uqshrn2    v4.8h, v1.4s, #13              \n"
   MEMACCESS(1)
    "st1        {v4.16b}, [%1], #16            \n"  // store 8 shorts
    "b.gt       1b                             \n"
  : "+r"(src),    // %0
    "+r"(dst),    // %1
    "+r"(width)   // %2
  : "w"(scale * 1.9259299444e-34f)    // %3
  : "cc", "memory", "v1", "v2", "v4"
  );
}

TEST=LibYUVPlanarTest.TestHalfFloatPlane_One
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2430313008 .
2016-10-21 14:30:03 -07:00
Frank Barchard
550cf829fb HalfFloat avx2 unpack bug fix.
AVX unpack parameters were reverse ordered causing incorrect results
on AVX2 hardware.

TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Half*

BUG=libyuv:560
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2438893002 .
2016-10-20 15:49:00 -07:00
Frank Barchard
f553db2d30 HalfFloatPlane unittest for denormal half floats
Halffloats have a limited range.  It shouldnt normally come up, but if the scale value passed in produces a small value, the half floats will be denormals, which are slow and/or flust to zero.  This test ensures they behave the same in C and SIMD and tests the performance of denormals.

TEST=TestHalfFloatPlane_denormal
BUG=libyuv:560
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2424233004 .
2016-10-19 18:13:01 -07:00
Frank Barchard
78c58ab8aa Add MSA optimized ARGB4444ToI420 and ARGB4444ToARGB functions
R=fbarchard@google.com
BUG=libyuv:634

Performance gains : (Auto-vectorized C vs MSA SIMD)

ARGB4444ToYRow_MSA        : ~3.0x
ARGB4444ToUVRow_MSA       : ~1.8x
ARGB4444ToARGBRow_MSA     : ~3.4x

ARGB4444ToYRow_Any_MSA    : ~2.8x
ARGB4444ToUVRow_Any_MSA   : ~1.7x
ARGB4444ToARGBRow_Any_MSA : ~3.2x

Review URL: https://codereview.chromium.org/2421843002 .
2016-10-19 11:10:51 -07:00
Frank Barchard
e16e3a629f cpu_id cleanup. no functional change.
remove old comment about initialize to zero.
remove ifdef and replace with macro defined to zero.

BUG=None
TEST=try bots
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2425623004 .
2016-10-18 12:26:02 -07:00
Frank Barchard
2d80fc3133 Port HalfFloatRow_SSE2 to AVX2 but not using F16C.
R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2421993002 .
2016-10-14 19:01:41 -07:00
Frank Barchard
fdcf524aac Add f16c (halffloat) cpuid
R=wangcheng@google.com, hubbe@chromium.org
BUG=libyuv:560

Review URL: https://codereview.chromium.org/2418763006 .
2016-10-14 16:34:08 -07:00
Frank Barchard
5333e94e70 Port ARGBExtractAlpha_AVX2 function to windows.
BUG=libyuv:572
TEST=try bots
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2416783004 .
2016-10-13 23:20:57 -07:00
Frank Barchard
a5e93766a2 Add ARGBExtractAlpha_AVX2 function
Port SSE2 version to AVX2.
BUG=libyuv:572
TEST=/usr/local/google/home/fbarchard/intelsde/sde -skx -- out/Release/libyuv_unittest --gtest_filter=*Extract*
R=wangcheng@google.com, magjed@chromium.org

Review URL: https://codereview.chromium.org/2420553002 .
2016-10-13 16:03:43 -07:00
Frank Barchard
d363ea6527 Remove I411 support.
YUV 411 is very uncommon format.  Remove support.

Update documentation to reflect that 411 is deprecated.

Simplify tests for YUV to only test with the new side by side YUV but keep old 3 plane test around with a macro for now.

BUG=libyuv:645
R=kjellander@chromium.org

Review URL: https://codereview.chromium.org/2406123002 .
2016-10-11 11:14:16 -07:00
Frank Barchard
af87c11c9a YUY2ToI422 coalesce rows for small images
TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToI422_Opt

Review URL: https://codereview.chromium.org/2393393006 .
2016-10-07 18:35:42 -07:00
Frank Barchard
edd3a84d05 libyuv::YUY2ToY for isolating Y channel of YUY2.
This function is the first step of YUY2 To I420.
Provided primarily for diagnostics.

TBR=wangcheng@google.com
BUG=libyuv:647
TESTED=LibYUVConvertTest.YUY2ToY_Opt

Review URL: https://codereview.chromium.org/2399153004 .
2016-10-07 17:20:30 -07:00
Frank Barchard
a2891ec77c Add MSA optimized YUY2ToI422, YUY2ToI420, UYVYToI422, UYVYToI420 functions
R=fbarchard@google.com
BUG=libyuv:634

Performance gains as below,

YUY2ToI422, YUY2ToI420 :-

YUY2ToYRow_MSA          : ~10x
YUY2ToUVRow_MSA         : ~11x
YUY2ToUV422Row_MSA      : ~9x
YUY2ToYRow_Any_MSA      : ~6x
YUY2ToUVRow_Any_MSA     : ~5x
YUY2ToUV422Row_Any_MSA  : ~4x

UYVYToI422, UYVYToI420 :-

UYVYToYRow_MSA          : ~10x
UYVYToUVRow_MSA         : ~11x
UYVYToUV422Row_MSA      : ~9x
UYVYToYRow_Any_MSA      : ~6x
UYVYToUVRow_Any_MSA     : ~5x
UYVYToUV422Row_Any_MSA  : ~4x

Review URL: https://codereview.chromium.org/2397693002 .
2016-10-07 10:37:22 -07:00
Frank Barchard
3b88a19ab1 YUY2ToI422_Any_Neon clean up to not require 16 pixels
YUY2ToI422_Any_Neon previously required 16 pixels and duplicated
the last pixel.  The replication was not necessary after a previous
change to treat YUY2 to 4 byte macro pixels.

TBR=harryjin@google.com
BUG=libyuv:648
TESTED=util/android/test_runner.py gtest -s libyuv_unittest -t 7200 --verbose --release --gtest_filter=*YUY2ToI422* -a "--libyuv_width=17 --libyuv_height=7 --libyuv_repeat=999 --libyuv_flags=1"

Review URL: https://codereview.chromium.org/2399143002 .
2016-10-06 12:11:40 -07:00
Frank Barchard
7018f5be0f Add MSA optimized I422ToYUY2Row, I422ToUYVYRow functions
R=fbarchard@google.com
BUG=libyuv:634

Performance gains :-

I422ToYUY2Row_MSA     - ~12x
I422ToYUY2Row_Any_MSA - ~7x

I422ToUYVYRow_MSA     - ~12x
I422ToUYVYRow_Any_MSA - ~7x

Review URL: https://codereview.chromium.org/2378753004 .
2016-10-03 18:21:31 -07:00
Frank Barchard
aa197ee1a3 HalfFloat_SSE2 for Visual C
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=LibYUVPlanarTest.TestHalfFloatPlane on windows
R=hubbe@chromium.org, wangcheng@google.com

Review URL: https://codereview.chromium.org/2387713002 .
2016-10-03 10:33:38 -07:00
Frank Barchard
4a14cb2e81 HalfFloat_SSE2 port from C algorithm to SSE2
Low level support for 12 bit 420, 422 and 444 YUV video frame conversion.

BUG=libyuv:560, chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2381493006 .
2016-09-30 09:47:16 -07:00
Frank Barchard
7fc932ddd3 Add low level support for 12 bit 420, 422 and 444 YUV video frame conversion.
BUG=libyuv:560,chromium:445071
TEST=untested
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2371293002 .
2016-09-29 15:06:30 -07:00
Frank Barchard
c11e9b7fb7 bt709 coefficients for video constrained space
Original bt709 color space coefficients were full range yuv for higher
quality.  This change makes the coefficients use the video constrained
color space the same as bt601 which is 16 to 240 for Y and 16 to 235 for
chroma channels.

BUG=libyuv:639
TEST=libyuv unittests run locally
R=hubbe@chromium.org

Review URL: https://codereview.chromium.org/2367253003 .
2016-09-28 15:07:46 -07:00
Frank Barchard
6732bcbde9 ShortToHalfFloat_AVX2 function
BUG=libyuv:560
TEST=local compile for windows
R=wangcheng@google.com

Review URL: https://codereview.chromium.org/2364293002 .
2016-09-27 14:18:32 -07:00
Frank Barchard
618149084e Add MIPS SIMD Arch (MSA) optimized ARGBMirrorRow function
This patch adds MSA optimized ARGBMirrorRow function in libYUV project.

Performance gain ~3x

R=fbarchard@google.com
BUG=libyuv:634

Review URL: https://codereview.chromium.org/2368313003 .
2016-09-26 16:28:01 -07:00
Frank Barchard
34a29bf756 fix warning on visual C for mips cpu detect
follow up warning fixs
cpu_id.cc(167): warning C4267: 'initializing': conversion from 'size_t' to 'int', possible loss of data
lint warning: cpu_id.cc:171:  Missing space before ( in if(  [whitespace/parens] [5]

TBR=manojkumar.bhosale@imgtec.com
BUG=libyuv:634
TEST=try bots for windows.

Review URL: https://codereview.chromium.org/2365813002 .
2016-09-22 18:25:52 -07:00
Frank Barchard
c5323b0fdc Add MIPS SIMD Arch (MSA) optimized MirrorRow function
As per the preparation patch added in Chromium sources at,
2150943003: Add MIPS SIMD Arch (MSA) build flags for GYP/GN builds

This patch adds first MSA optimized function in libYUV project.

BUG=libyuv:634
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/2285683002 .
2016-09-22 16:12:22 -07:00
Frank Barchard
6ad3aa6ae4 fix multi-line comment warning
../../source/scale_neon.cc:576:1: error: multi-line comment [-Werror=comment]
 // #define BLENDER(a, b, f) (uint8)((int)(a) + \
 ^

BUG=None
TEST=try bots

Review URL: https://codereview.chromium.org/2344203003 .
2016-09-16 15:16:39 -07:00
Frank Barchard
8279df963e Scale by 3/8 only if source is multiple of 8 tall.
BUG=libyuv:635
TEST=try bots
R=harryjin@google.com

Review URL: https://codereview.chromium.org/2347733002 .
2016-09-16 14:57:47 -07:00
Frank Barchard
137aa63afe Fix some comment typos
BUG=None
TEST=try bots

Review URL: https://codereview.chromium.org/2346633002 .
2016-09-15 15:38:19 -07:00
Frank Barchard
de944ed8c7 YuvConstants declare alignment for externs as well as declarations
On visual c 2013 and earlier a warning is generated if externs
are not declared with the same alignment as the declaration, when
using /ltcg

BUG=libyuv:633
TEST=standalong test built with cl /Bv /GL /Ox /nologo a.cc b.cc /link /ltcg
R=skal@google.com

Review URL: https://codereview.chromium.org/2291533004 .
2016-08-30 11:06:46 -07:00
Frank Barchard
c244a3e9a0 Add SplitUVPlanes and MergeUVPlanes
Add public methods SplitUVPlanes and MergeUVPlanes based on the
optimized assembly functions that already exists. Also, de-duplicate the
CPU dispatching code for these functions by moving them to helper
functions.

BUG=libyuv:629
R=braveyao@chromium.org

Review URL: https://codereview.chromium.org/2277603004 .
2016-08-24 16:47:24 -07:00
Frank Barchard
161e5c4569 Allow NULL for dst_y in planar formats. BUG=libyuv:631 TEST=unittests build/pass
BUG=libyuv:631
TEST=unittests build/pass
R=harryjin@google.com

Review URL: https://codereview.chromium.org/2271053003 .
2016-08-24 10:19:14 -07:00
Frank Barchard
17d31e6a4a NV12 allow NULL for Y
The conversion from NV12 and other Bi or Tri planar formats, differs only in the UV handling.  The helper function supports passing a NULL for the dst_y channel indicating you only want to do the UV conversion.

TBR=harryjin@google.com
TEST=LibYUVConvertTest.NV12ToI420_NullY (601 ms)
BUG=libyuv:626

Review URL: https://codereview.chromium.org/2276703002 .
2016-08-23 19:05:25 -07:00
Frank Barchard
d58297a2df NV12ToI420 use SplitPlane function
TBR=magjed@chromium.org
BUG=libyuv:629
TEST=LibYUVConvertTest.NV12ToI420_Opt

Review URL: https://codereview.chromium.org/2267303002 .
2016-08-22 18:35:55 -07:00
Frank Barchard
36ae08ce1c Suppress MJPEG fprintf() runtime warning
TBR=harryjin@google.com
BUG=libyuv:630
TEST=local build and try bots pass

Review URL: https://codereview.chromium.org/2264293002 .
2016-08-22 16:30:36 -07:00
Frank Barchard
46a8eaaf0c fix typo in YUV
R=braveyao@chromium.org
BUG=None

Review URL: https://codereview.chromium.org/2152623002 .
2016-07-13 17:17:19 -07:00
Frank Barchard
1aa4ddd21c Attribute aligned 32 for YUV conversion structure on Intel
Fix for unaligned memory exception.

R=braveyao@chromium.org
BUG=libyuv:616

Review URL: https://codereview.chromium.org/2152553002 .
2016-07-13 12:19:26 -07:00
Frank Barchard
abcb70f183 Test nv21 layout of Android420ToI420 function.
to Y,U,V and a pixel stride for U and V.  The pixel stride is expected to be 1 or 2.

[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Any
[       OK ] LibYUVConvertTest.Android420ToI420_1_Any (253 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Unaligned
[       OK ] LibYUVConvertTest.Android420ToI420_1_Unaligned (250 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Invert
[       OK ] LibYUVConvertTest.Android420ToI420_1_Invert (254 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_1_Opt
[       OK ] LibYUVConvertTest.Android420ToI420_1_Opt (247 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Any
[       OK ] LibYUVConvertTest.Android420ToI420_2_Any (132 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Unaligned
[       OK ] LibYUVConvertTest.Android420ToI420_2_Unaligned (122 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Invert
[       OK ] LibYUVConvertTest.Android420ToI420_2_Invert (124 ms)
[ RUN      ] LibYUVConvertTest.Android420ToI420_2_Opt
[       OK ] LibYUVConvertTest.Android420ToI420_2_Opt (119 ms)

TEST=LibYUVConvertTest.Android420ToI420_Opt
BUG=libyuv:604
R=braveyao@chromium.org

Review URL: https://codereview.chromium.org/2146733002 .
2016-07-12 18:34:04 -07:00
Frank Barchard
84e04699c2 Add libyuv:Android420ToI420 function which takes 3 pointers
to Y,U,V and a pixel stride for U and V.  The pixel stride is expected to be 1 or 2.

TEST=LibYUVConvertTest.Android420ToI420_Opt
BUG=libyuv:604
R=braveyao@chromium.org

Review URL: https://codereview.chromium.org/2114843002 .
2016-07-12 16:23:51 -07:00
Frank Barchard
2f101fdbda mingw64 fix - guard row_win.cc against mingw build.
The old guard only checked for defined(_M_X64) which is defined by mingw64.  Add a test for defined(_MSC_VER) which is defined for clangcl and visual c but not mingw.  mingw should use row_gcc.cc for both 32 and 64 bit.

R=harryjin@google.com
BUG=webm:1252,libyuv:613
TEST=local gcc/clang builds on linux tested and try bots for others.

Review URL: https://codereview.chromium.org/2105603002 .
2016-06-28 10:21:27 -07:00
Frank Barchard
b8ddb5a2a7 rounding for arm filter
R=wangcheng@google.com, harryjin@google.com
BUG=libyuv:607

Review URL: https://codereview.chromium.org/2093913004 .
2016-06-24 16:07:49 -07:00
Frank Barchard
1b3e4aee47 make count a memory variable for 32 bit
32 bit clang runs out of registers and compiler does core dump.
force 32 bit build to use memory variable for counter.

BUG=libyuv:612
TBR=harryjin@google.com

Review URL: https://codereview.chromium.org/2091913003 .
2016-06-23 20:42:10 -07:00
Frank Barchard
cc88adc620 YUV scale filter columns improved filtering accuracy
upscale a YUV image.  observe change in hue.. green especially.
disable ScaleFilterCols_SSSE3, falling back on ScaleFilterCols_C
observe hue.. green especially, is better.

was ScaleFrom1280x720_Bilinear (1620 ms)
now ScaleFrom1280x720_Bilinear (1907 ms)

BUG=libyuv:605
TEST=try bots
R=harryjin@google.com, wangcheng@google.com

Review URL: https://codereview.chromium.org/2084533006 .
2016-06-23 20:16:55 -07:00
Niels Möller
365ed3851c Treat YU12 as an alias for I420. Simplify setting of inv_crop_height.
BUG=
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/2020193002 .
2016-06-16 12:49:17 +02:00
Frank Barchard
fd3e676e91 android_full_debug x86 fix - use +rm for width count
Work around for android full debug build runnign out of registers.
5 functions were running out of registers causing the compiler error
error: 'asm' operand has impossible constraints
These functions mostly have 4 pointers, a counter (width) and a tempory
eax register.  With fpic and debug using stackframes, 2 registers are
unavailable.  So a total of 8 registers are used.
Although fpic and stack frame dont apply to assembly, the compiler
reserves 2 registers.  The optimized version builds, so its likely
freeing up the registers once it knows they are not used.
These functions used to build, so compile options and/or compiler may
have updated.. likely fpic was turned on.
An attribute can be done to disable each, and will avoid using the
2 GPR registers, but they are still reserved and unavailable in debug
builds on current compilers (gcc 4.9 and clang 3.8).

R=dhrosa@google.com
BUG=libyuv:602

Review URL: https://codereview.chromium.org/2066933002 .
2016-06-14 15:25:28 -07:00
Frank Barchard
026be3cd85 neon64 use width int directly.
width %w size modifier the int width can be passed directly to arm assembly.
For functions that take input constants, the outputs are declared as early
write using &, meaning the outputs use used before all inputs are consumed.

R=harryjin@google.com
BUG=libyuv:598

Review URL: https://codereview.chromium.org/2043073003 .
2016-06-08 10:26:53 -07:00
Frank Barchard
17e8a4d3df Remove ifdefs for neon in row_neon*.cc
ifdefs on a function level are not needed for neon functions, unless
they are conditionally enabled in row.h.  No functions are conditionally
enabled at this time, so all ifdefs can be removed from row_neon.cc and
row_neon64.cc

TBR=kjellander@chromium.org
BUG=libyuv:599

Review URL: https://codereview.chromium.org/2044223002 .
2016-06-07 14:34:13 -07:00
Frank Barchard
6546096269 ARGBExtractAlpha 16 pixels at a time for ARM
arm64   8     TestARGBExtractAlpha (10019 ms) <-original 64 bit code
arm64   8 x2  TestARGBExtractAlpha (7639 ms)
arm64   16    TestARGBExtractAlpha (7369 ms) <- new 64 bit code
thumb32 8     TestARGBExtractAlpha (9505 ms) <- original 32 bit code
thumb32 8 x2  TestARGBExtractAlpha (7400 ms)
thumb32 8 x2i TestARGBExtractAlpha (7266 ms) <- new 32 bit code
arm32   8     TestARGBExtractAlpha (10002 ms)

BUG=libyuv:572
TESTED=local test on nexus 9
R=harryjin@google.com, wangcheng@google.com

Review URL: https://codereview.chromium.org/2035573002 .
2016-06-07 10:44:28 -07:00
Frank Barchard
b00d40160a make unittest allocator align to 64 bytes.
blur requires memory be aligned.  change the unittest allocator to guarantee 64 byte alignment.
re-enable blur any test that fails if memory is unaligned.

TBR=harryjin@google.com
BUG=libyuv:596,libyuv:594
TESTED=local build passes with row.h removed from tests.

Review URL: https://codereview.chromium.org/2019753002 .
2016-05-27 18:02:47 -07:00
Frank Barchard
ade85fb55c remove row.h from unittests
add SIMD_ALIGNED to unittest header.

BUG=libyuv:594
TESTED=local build passes with row.h removed from tests.
R=harryjin@google.com

Review URL: https://codereview.chromium.org/2001373002 .
2016-05-27 10:57:49 -07:00
Magnus Jedvert
942db3016a Add ARGBExtractAlpha function
BUG=libyuv:572
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/1995293002 .
2016-05-26 10:30:57 +02:00
Frank Barchard
74a69522da white space fixes for MIPS
TBR=kjellander@chromium.org
BUG=None

Review URL: https://codereview.chromium.org/2005053004 .
2016-05-24 14:17:18 -07:00
Frank Barchard
7edf572e28 remove includes for duplicate functions
R=harryjin@google.com
BUG=libyuv:592
TESTED=local builds work with fewer headers

Review URL: https://codereview.chromium.org/2006943002 .
2016-05-23 17:38:26 -07:00
Frank Barchard
fbdc43a03c fix wrong HAS_ARGBCOPYALPHAROW_SSE2 ifdef
TBR=kjellander@chromium.org
BUG=libyuv:593
TESTED=try bots pass.

Review URL: https://codereview.chromium.org/2000393002 .
2016-05-23 16:26:02 -07:00
Frank Barchard
cf101116c9 Remove initialize to zero on output variables for inline.
Inline that uses temporary variables is currently initializing them
to 0 and passing in as output "+r".
This CL replaces the output constraint to "=&r" for most meaning an
output with early write (before inputs).  This allows the initialize
to zero step to be removed, saving 1 instruction.

BUG=libyuv:580
TESTED=local libyuv build on gcc/linux and try bots
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1895743008 .
2016-04-18 16:24:26 -07:00
Frank Barchard
9c53ff2c57 Fix temporary stride for ConvertToARGB with rotation.
BUG=libyuv:578
TESTED=local unittests pass
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1879783002 .
2016-04-11 15:21:04 -07:00
Frank Barchard
3c862e3d29 Fix stride bug for msan on I420Interpolate.
When using C version of I420Interpolate for msan, a 50% interpolation
would cause stride to be cast to int, which could cause erroneous
memory reads on 64 bit build.
This CL makes the stride use ptrdiff_t for HalfRow_C

BUG=libyuv:582
TESTED=try bots tests
R=dhrosa@google.com

Review URL: https://codereview.chromium.org/1872953002 .
2016-04-08 15:58:53 -07:00
Frank Barchard
c7372a323a add if defined(_MSC_FULL_VER) for NaCL
TBR=kjellander@chromium.org
BUG=libyuv:573
TESTED=try bots

Review URL: https://codereview.chromium.org/1850053002 .
2016-04-01 17:48:23 -07:00
Frank Barchard
76aee8ced7 Remove most clang-cl special cases from cpu_id.cc
They are not needed, and due to them there was a call to _xgetbv()
without a declaration of the function.  This used to work because we
implicitly included intrin.h in all translation units with clang-cl, but
we want to stop doing that.

BUG=chromium:592745
R=fbarchard@google.com

Review URL: https://codereview.chromium.org/1780473003 .
2016-03-10 14:01:26 -08:00
Frank Barchard
ee99b85126 Port ARGBToRGB565 from aarch64 neon to 32 bit
The 64 bit version of ARGBToRGB565 to 32 bit. 64 bit is using sri which shifts and inserts, saving some masking.  The instruction is available for neon 32 bit as well.

R=magjed@chromium.org, harryjin@google.com
BUG=libyuv:571

Review URL: https://codereview.chromium.org/1724393002 .
2016-02-29 12:22:25 -08:00
Frank Barchard
22e062a448 Port ARGBToJ420 to AVX2
ARGBToJ420 had an SSSE3 version, but not AVX2.
ARGBToI420 had an AVX2, so adapt that code to J420.

TBR=harryjin@google.com
BUG=libyuv:553

Review URL: https://codereview.chromium.org/1702373004 .
2016-02-17 23:16:39 -08:00
Frank Barchard
127ff512b3 add perf data files to ignores
document play services update

R=jkellander@chromium.org
BUG=none

Review URL: https://codereview.chromium.org/1712463002 .
2016-02-17 21:37:09 -08:00
Frank Barchard
cc33dc68c7 Port I411ToARGBRow to AVX2.
An SSSE3 version already exists, and an AVX2 version is available for
Visual C.  This ports the function to AVX2 completing the AVX2 ports of
all YUV to RGB functions for AVX2 on gcc.

TBR=harryjin@google.com
BUG=libyuv:555

Review URL: https://codereview.chromium.org/1687253002 .
2016-02-12 10:26:10 -08:00
Frank Barchard
0e554b18fe port NV12ToRGB565Row_AVX2 to gcc
NV12ToRGB565Row for Intel is implemented as a 2 step conversion:
NV12ToARGBRow_SSSE3 and ARGBToRGB565Row_SSE2

NV12ToARGBRow has an AVX2 version, so this CL implements
NV12ToRGB565Row_AVX2 with call to NV12ToARGBRow_AVX2 and
ARGBToRGB565Row_SSE2.

R=harryjin@google.com
BUG=libyuv:554

Review URL: https://codereview.chromium.org/1687953002 .
2016-02-10 11:13:41 -08:00
Frank Barchard
c39509c8e5 add avx2 wrappers for functions that can call I422ToARGBRow_AVX2
R=harryjin@google.com
BUG=libyuv:557

Review URL: https://codereview.chromium.org/1687713002 .
2016-02-09 17:14:29 -08:00
Frank Barchard
0d880e5bc0 rename MIPS_DSPR2 to DSPR2 for consistency
When attempting to normalize function names to end in Row_SIMD it was made
harder with MIPS_DSPR2 naming convention.
Other CPUs do not include the vendor.  This should be named consistently.

Removed the DISABLE_MIPS in favour of DISABLE_ASM for consistency with other
processors.

TBR=harryjin@google.com
BUG=libyuv:562

Review URL: https://codereview.chromium.org/1677633002 .
2016-02-05 14:49:54 -08:00
Frank Barchard
05ed0c539c rework scale code for ubsan
For more info on ubsan, see
http://dev.chromium.org/developers/testing/undefinedbehaviorsanitizer

TESTED=Passing compilation using:
GYP_DEFINES="ubsan=1"
GYP_DEFINES="ubsan_vptr=1"

R=harryjin@google.com, pbos@webrtc.org
BUG=libyuv:563

Review URL: https://codereview.chromium.org/1654253004 .
2016-02-02 11:01:49 -08:00
Frank Barchard
9e39c1f271 ubsan overflow fix for multiply by 0x01010101
This is an UBSan error reported by libjingle

[ RUN      ] WebRtcVideoFrameTest.ConvertToYUY2BufferStride
[000:000] (videoframe.cc:375): Validate frame passed. format: I420 bpp: 12 size: 1280x720 bytes: 1382400 expected: 1382400 sample[0..3]: 73, 73, 73, 73
../../chromium/src/third_party/libyuv/source/row_gcc.cc:2903:25: runtime error: signed integer overflow: 128 * 16843009 cannot be represented in type 'int'
[8/614] WebRtcVideoFrameTest.ConvertToYUY2BufferStride returned/aborted with exit code 1 (32 ms)
[9/614] WebRtcVideoFrameTest.ConvertToYUY2BufferInverted (29 ms)
Note: Google Test filter = WebRtcVideoFrameTest.ConvertToYUY2BufferInverted

The source is uint8 and the multiply is by 0x01010101 to replicate the byte to 4 bytes.
Changing the constant to 0x01010101u should avoid overflow.

R=harryjin@google.com
TBR=harryjin@google.com
BUG=libyuv:563

Review URL: https://codereview.chromium.org/1657533005 .
2016-02-01 12:29:04 -08:00
Frank Barchard
58cb534962 Fix memory overwrite in YUY2ToNV12 odd wdiths
When width was odd Y channel wrote an extra pixel.
This change splits the Y from UV into a temporary
buffer and memcpy's to the destination.  Performance
is slower.

Was
YUY2ToNV12_Any (307 ms)
YUY2ToNV12_Unaligned (213 ms)
TestYUY2ToNV12 (181 ms)
YUY2ToNV12_Opt (177 ms)
YUY2ToNV12_Invert (177 ms)

Npw
YUY2ToNV12_Any (300 ms)
YUY2ToNV12_Unaligned (226 ms)
YUY2ToNV12_Invert (206 ms)
TestYUY2ToNV12 (184 ms)
YUY2ToNV12_Opt (181 ms)
TBR=harryjin@google.com
BUG=libyuv:545

Review URL: https://codereview.chromium.org/1593833002 .
2016-01-19 11:28:09 -08:00
Frank Barchard
8377c798fb Fix I420ToNV21 for wrong dst_stride_y parameter.
I420ToNV21 passes the wrong dst_stride_y when it calls I420ToNV12; parameter 8 (convert_from.cc:448) is src_stride_y but should be dst_stride_y.  This causes image corruption when converting I420 -> NV21 with mismatched luminance strides.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:547

Review URL: https://codereview.chromium.org/1582793008 .
2016-01-14 17:38:54 -08:00
Frank Barchard
081475b3c8 refactor ARGBToI422 using ARGBToI420 internally
R=harryjin@google.com
BUG=libyuv:546

Review URL: https://codereview.chromium.org/1574253004 .
2016-01-12 17:05:49 -08:00
Frank Barchard
23c6a83561 Fix ifdef mismatch for mirroruv
Macro define and macro ifdef didnt match, leading to C code
being used.  Make macro match function name.

TBR=harryjin@google.com
BUG=libyuv:543

Review URL: https://codereview.chromium.org/1579023002 .
2016-01-11 16:33:36 -08:00
Frank Barchard
0e462e6f45 Remove use_sysroot=0
use_sysroot=0 is required for webrtc on linux intel builds, but
libyuv doesnt use the affected libraries, so removing this.

R=harryjin@google.com, sbc@chromium.org
BUG=libyuv:534,libyuv:542

Review URL: https://codereview.chromium.org/1566303002 .
2016-01-11 14:57:50 -08:00
Frank Barchard
fc52d8ded2 Odd width variation of scale down by 2 for subsampling
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:538

Review URL: https://codereview.chromium.org/1558093003 .
2016-01-06 15:12:17 -08:00
Frank Barchard
36615d62a0 fix for InterpolateRow_AVX2
port scaledownby4_avx2 to gcc

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1546763002 .
2015-12-22 12:29:54 -08:00
Frank Barchard
71deb7ba3a bug fix - remove shift from InterpolateRow_AVX2
TBR=harryjin@google.com
BUG=libyuv:537

Review URL: https://codereview.chromium.org/1547703002 .
2015-12-22 10:28:48 -08:00
Frank Barchard
2cb2e9e1ad fix for InterpolateRow_AVX2
TBR=harryjin@google.com
BUG=libyuv:535

Review URL: https://codereview.chromium.org/1543773002 .
2015-12-21 18:35:12 -08:00
Frank Barchard
3f4d86053e avx2 interpolate use 8 bit
BUG=libyuv:535
R=dhrosa@google.com

Review URL: https://codereview.chromium.org/1535833003 .
2015-12-21 10:57:32 -08:00
Frank Barchard
f4447745ae Add rounding to InterpolateRow for improved quality and consistency.
Remove inaccurate specializations for 1/4 and 3/4, since they round
incorrectly.  Specialize for 100% and 50% are kept due to performance.
Make C and ARM code match SSSE3.
Make unittests expect zero difference.

BUG=libyuv:535
R=harryjin@google.com

Review URL: https://codereview.chromium.org/1533643005 .
2015-12-17 15:24:06 -08:00
Frank Barchard
1ccbf8fb7b use memory for loop counter to work around nearly out of registers
TBR=harryjin@google.com
BUG=libyuv:533

Review URL: https://codereview.chromium.org/1535433003 .
2015-12-16 17:13:37 -08:00
Frank Barchard
80ca4514ef change scale down by 4 to use rounding.
TBR=harryjin@google.com
BUG=libyuv:447

Review URL: https://codereview.chromium.org/1525033005 .
2015-12-15 21:25:18 -08:00
Frank Barchard
70445ef2ef avx2 scale down by 2 for gcc
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1520423003 .
2015-12-15 10:59:20 -08:00
Frank Barchard
ae55e41851 use rounding in scaledown by 2
When scaling down by 2 the formula should round consistently.
(a+b+c+d+2)/4
The C version did but the SSE2 version was doing 2 averages.
avg(avg(a,b),avg(c,d))
This change uses a sum, then rounds.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:447,libyuv:527

Review URL: https://codereview.chromium.org/1513183004 .
2015-12-14 17:25:36 -08:00
Frank Barchard
b3bbcc1f4e add ifdef for AVX2 so vs2010 can still compile
R=harryjin@google.com
BUG=libyuv:531

Review URL: https://codereview.chromium.org/1515503005 .
2015-12-09 15:23:51 -08:00
Frank Barchard
cb44936403 fix typo in avx2 gcc blend.
was using wrong register on 32 pixel version.

R=harryjin@google.com, dhrosa@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1511433006 .
2015-12-09 10:38:46 -08:00
Frank Barchard
353ffbab80 fix for gcc compile error: variable duplicate define
TBR=harryjin@google.com
BUG=libyuv:529

Review URL: https://codereview.chromium.org/1512793002 .
2015-12-08 19:03:43 -08:00
Frank Barchard
a2ea905679 BlendPlane any width.
Benchmark
out\release\libyuv_unittest --libyuv_width=1279 --libyuv_height=719 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms

Was
I420Blend_Any (2321 ms)
I420Blend_Unaligned (1684 ms)
I420Blend_Opt (1675 ms)
I420Blend_Invert (1653 ms)
BlendPlane_Invert (1556 ms)
BlendPlane_Any (1552 ms)
BlendPlane_Unaligned (1548 ms)
BlendPlane_Opt (1535 ms)
ARGBBlend_Unaligned (659 ms)
ARGBBlend_Any (596 ms)
ARGBBlend_Invert (591 ms)
ARGBBlend_Opt (508 ms)
BlendPlaneRow_Unaligned (186 ms)
BlendPlaneRow_Opt (171 ms)

Now
ARGBBlend_Any (621 ms)
ARGBBlend_Unaligned (585 ms)
ARGBBlend_Invert (564 ms)
ARGBBlend_Opt (512 ms)
I420Blend_Unaligned (347 ms)
I420Blend_Invert (345 ms)
I420Blend_Any (337 ms)
I420Blend_Opt (327 ms)
BlendPlane_Unaligned (187 ms)
BlendPlaneRow_Unaligned (187 ms)
BlendPlane_Invert (186 ms)
BlendPlane_Any (186 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (171 ms)

which is comparable to aligned case
out\release\libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --gtest_filter=*Blend* | sortms
ARGBBlend_Any (625 ms)
ARGBBlend_Unaligned (602 ms)
ARGBBlend_Invert (508 ms)
ARGBBlend_Opt (506 ms)
I420Blend_Any (353 ms)
I420Blend_Unaligned (322 ms)
I420Blend_Invert (304 ms)
I420Blend_Opt (301 ms)
BlendPlaneRow_Unaligned (188 ms)
BlendPlane_Unaligned (186 ms)
BlendPlane_Invert (185 ms)
BlendPlane_Any (184 ms)
BlendPlaneRow_Opt (173 ms)
BlendPlane_Opt (169 ms)

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1513443002 .
2015-12-08 18:59:48 -08:00
Frank Barchard
dee77a4ebe Optimize yuv alpha blend AVX2 code to do 32 pixels at time.
out/Release/libyuv_unittest --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --gtest_filter=*I420Blend_Opt

Was LibYUVPlanarTest.I420Blend_Opt (2335 ms)
Now LibYUVPlanarTest.I420Blend_Opt (1937 ms)

vs SSSE3
LibYUVPlanarTest.I420Blend_Opt (2599 ms)

BUG=libyuv:527
R=dhrosa@google.com

Review URL: https://codereview.chromium.org/1505673003 .
2015-12-08 18:20:30 -08:00
Frank Barchard
fae1a10545 Work around bug in xgetbv for Visual Studio.
xgetbv is generating bad code, falsely disabling AVX2 and AVX512.
disable optimization for the function affected on older versions of Visual C 32 bit.

R=brucedawson@chromium.org, dhrosa@google.com, harryjin@google.com
BUG=libyuv:529

Review URL: https://codereview.chromium.org/1503393004 .
2015-12-08 18:13:32 -08:00
Frank Barchard
2657688e70 Add support for odd height YUVA alpha blending.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1507683003 .
2015-12-07 12:03:20 -08:00
Frank Barchard
b0b22f88b9 Unroll C version of YUV blender for improved performance.
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1502343003 .
2015-12-07 12:02:45 -08:00
Frank Barchard
48a919d86e Bug fix for UYVYToNV12 odd height
TBR=harryjin@google.com
BUG=libyuv:528

Review URL: https://codereview.chromium.org/1506973002 .
2015-12-07 11:39:48 -08:00
Frank Barchard
bea690b3e0 AVX2 YUV alpha blender and improved unittests
AVX2 version can process 16 pixels at a time for improved memory bandwidth and fewer instructions.

unittests improved to test unaligned memory, and test exactness when alpha is 0 or 255.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1505433002 .
2015-12-05 22:23:29 -08:00
Frank Barchard
fa2618ee26 Port BlendPlaneRow_SSSE3 to GCC
R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1490273006 .
2015-12-04 11:19:41 -08:00
Frank Barchard
8af0ebf816 planar blend use signed images
R=dhrosa@google.com, harryjin@google.com, jzern@chromium.org
BUG=libyuv:527

Review URL: https://codereview.chromium.org/1491533002 .
2015-12-02 14:20:17 -08:00
Frank Barchard
b6f37bd8ec Interpolate plane initial implementation.
YUV version of interpolation between two images.

R=dhrosa@google.com, harryjin@google.com
BUG=libyuv:526

Review URL: https://codereview.chromium.org/1479593002 .
2015-11-25 16:11:42 -08:00
Frank Barchard
526558b2d8 disable debug build of 411 to work around compiler bug
TBR=harryjin@google.com
BUG=libyuv:524

Review URL: https://codereview.chromium.org/1461013002 .
2015-11-19 02:25:00 -08:00
Frank Barchard
b7dfb72559 fix for I411 build error on 32 bit x86
TBR=harrjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1461693004 .
2015-11-19 01:45:14 -08:00
Frank Barchard
528356a128 syntax fix for gcc movzwl
TBR=harryjin@google.com
BUG=libtyv:525

Review URL: https://codereview.chromium.org/1460723003 .
2015-11-18 13:14:15 -08:00
Frank Barchard
50f8cb2db3 port I411 movzx 2 byte reader to gcc
previously the I411 format used movd to read U, V pixels.
But this reads 4 bytes, and can cause a memory exception.
pinsrw can be used, but fails on drmemory 1.5, and is slow.
So in this change a movzxw is used to read 2 bytes into EBX,
then copy to xmm0 with movd.
Slightly slower, but no memory exception
Was LibYUVConvertTest.I411ToARGB_Opt (577 ms)
Now LibYUVConvertTest.I411ToARGB_Opt (608 ms)

TBR=harryjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1457783004 .
2015-11-18 13:05:39 -08:00
Frank Barchard
5eefbe2330 Fix for drmemory failure on I411ToARGB
Before
I420ToARGB_Opt (594 ms)
I422ToARGB_Opt (483 ms)
I411ToARGB_Opt (748 ms) ***
I444ToARGB_Opt (452 ms)
I400ToARGB_Opt (218 ms)

After
I420ToARGB_Opt (591 ms)
I422ToARGB_Opt (454 ms)
I411ToARGB_Opt (502 ms)  ***
I444ToARGB_Opt (441 ms)
I400ToARGB_Opt (216 ms)

TBR=harryjin@google.com
BUG=libyuv:525

Review URL: https://codereview.chromium.org/1459513002 .
2015-11-17 18:00:52 -08:00
Frank Barchard
0815568a50 test for unaligned vs aligned for CopyRow_SSE2
improves performance on older CPUs where movdqa is faster.
TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1455463002 .
2015-11-17 00:04:03 -08:00
Frank Barchard
1019e4537f port I444ToARGB avx2 code from Visual C to GCC.
SSSE3
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (418 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (417 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (411 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (419 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (432 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (435 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (421 ms)
[----------] 8 tests from LibYUVConvertTest (3389 ms total)

AVX2
Note: Google Test filter = *I444ToARGB*
[==========] Running 8 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 8 tests from LibYUVConvertTest
[ RUN      ] LibYUVConvertTest.I444ToARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_Any (340 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_Unaligned (325 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_Invert (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_Opt (316 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Any
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Any (315 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Unaligned (341 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Invert
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Invert (331 ms)
[ RUN      ] LibYUVConvertTest.I444ToARGB_ARGB_Opt
[       OK ] LibYUVConvertTest.I444ToARGB_ARGB_Opt (329 ms)
[----------] 8 tests from LibYUVConvertTest (2615 ms total)

TBR=harryjin@google.com
BUG=libyuv:492

Review URL: https://codereview.chromium.org/1445893002 .
2015-11-13 18:31:22 -08:00
Frank Barchard
60adcbaf32 scale with conversion using 2 steps with unittest
a prototype function to implement the yuv to rgb with conversion and scale.
replace with 1 step function in future version, using same API.

R=harryjin@google.com
BUG=libyuv:471

Review URL: https://codereview.chromium.org/1421553016 .
2015-11-13 11:25:56 -08:00
Frank Barchard
6100f50f13 fix yvu constants for avx2 yuv to rgb
the yvu matrix for yuv to rgb had an incorrect entry, affecting yuv to bgra,
yuv to abgr and yuv to raw.
fix the matrix and reenable avx2 functions.

R=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1411763004 .
2015-11-10 10:45:44 -08:00
Frank Barchard
72a9e282ec disable more avx2 functions that dont link in chrome
libyuv builds/runs, but when integrated into chromium, produces link errors.  unclear why but this disables affected functions.
will followup with re-enabling them once the root cause in the runtime error is found.

TBR=harryjin@google.com
BUG=libyuv:522

Review URL: https://codereview.chromium.org/1427683004 .
2015-11-09 17:20:02 -08:00
Frank Barchard
98eb102bea set d19 alpha on inner loop
TBR=harryjin@google.com
BUG=libyuv:521

Review URL: https://codereview.chromium.org/1429263004 .
2015-11-06 11:38:21 -08:00
Frank Barchard
431cb3667a YUV to RGB for x64 use registers instead of memory.
On Arm the YVU to RGB conversions move constants into registers.
This change does the same for 64 bit intel builds where additional
registers are available.
The AVX2 saves 3 instructions by because the 2nd argument needs to be a register, so a vmovdqu was avoided.

x64 builds using memory:
AVX2  I420ToARGB_Opt (3059 ms)
SSSE3 I420ToARGB_Opt (3959 ms)

Now using registers
AVX2  I420ToARGB_Opt (2906 ms)
SSSE3 I420ToARGB_Opt (3928 ms)

TBR=harryjin@google.com
BUG=libyuv:520

Review URL: https://codereview.chromium.org/1407353010 .
2015-11-04 16:16:18 -08:00
Frank Barchard
860cc0357a Neon versions of I420AlphaToARGB
Add alpha version of YUV to RGB to neon code for ARMv7 and aarch64.
For other YUV to RGB conversions, hoist alpha set to 255 out of loop.

TBR=harryjin@google.com
BUG=libyuv:516

Review URL: https://codereview.chromium.org/1413763017 .
2015-11-03 19:21:36 -08:00
Frank Barchard
d95d2169d9 rename yuv matrix constants to be more clear about what they are
R=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1429693006 .
2015-11-03 17:09:53 -08:00
Frank Barchard
1f1d140bb6 remove mips dsp detect
DSP code is not actually used, only DSPR2.  Remove the detect.

TBR=harryjin@google.com
BUG=none

Review URL: https://codereview.chromium.org/1405043008 .
2015-11-03 16:57:40 -08:00
Frank Barchard
ce4c2fad1d Raw 24 bit RGB to RGB24 (bgr)
Add unittests that do 1 step conversion vs 2 step conversion.

Tests end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1419103007 .
2015-11-03 10:30:30 -08:00
Frank Barchard
87926cec8b remove store bgra, abgr, raw unused macros
TBR=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1420033004 .
2015-11-02 10:40:03 -08:00
Frank Barchard
2c7aa0070a remove I422ToBGRA and use I422ToRGBA internally
Removes low levels for I420ToBGRA and I420ToRAW and reimplements them as I420ToRGBA and I420ToRGB24 with transposed color matrix.

Adds unittests that do 1 step conversion vs 2 steps to test end swapping versions match direct conversions.

R=harryjin@google.com
BUG=libyuv:518

Review URL: https://codereview.chromium.org/1427993004 .
2015-11-02 10:24:12 -08:00
Frank Barchard
5d97b93369 refactor I420ToABGR to use I420ToARGBRow
Using a transposed conversion matrix, I420ToARGB can output ABGR.

R=harryjin@google.com, xhwang@chromium.org
BUG=libyuv:473

Review URL: https://codereview.chromium.org/1413573010 .
2015-10-30 11:56:57 -07:00