332 Commits

Author SHA1 Message Date
George Steed
f2e78e1304 [AArch64] Use Neon dot-product instructions in ARGBToYMatrixRow
Using the dot-product instructions here allows us to avoid needing LD4
for loading individual colour channels, which gives a big benefit on
some micro-architectures where such instructions perform significantly
worse than LD1. In addition the dot-product instructions have higher
throughput compared to the Neon

Observed reduction in runtimes for selected kernels moving from *_NEON
to *_NEON_DotProd:

     Kernel | Cortex-A55 | Cortex-A510 | Cortex-A76 | Cortex-X2
ABGRToYJRow |      -6.5% |      -22.5% |     -43.5% |    -71.2%
 ABGRToYRow |      -6.5% |      -22.5% |     -43.5% |    -68.3%
ARGBToYJRow |      -6.5% |      -22.5% |     -43.5% |    -68.1%
 ARGBToYRow |      -6.5% |      -22.5% |     -43.5% |    -68.1%
 BGRAToYRow |      -6.5% |      -22.5% |     -42.3% |    -68.4%
RGBAToYJRow |      -6.5% |      -22.5% |     -42.2% |    -73.7%
 RGBAToYRow |      -6.5% |      -22.5% |     -42.3% |    -64.9%

Bug: libyuv:977
Change-Id: If244190a7bdacf7e6e6b16af7e6853ee13ff6585
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/5424737
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2024-04-09 03:09:36 +00:00
Lu Wang
8670bcf17f Optimize the following 19 functions with LSX in row_lsx.cc.
UYVYToYRow_LSX, UYVYToUVRow_LSX, UYVYToUV422Row_LSX,
ARGBToUVRow_LSX, ARGBToRGB24Row_LSX, ARGBToRAWRow_LSX,
ARGBToRGB565Row_LSX, ARGBToARGB1555Row_LSX, ARGBToARGB4444Row_LSX,
ARGBToUV444Row_LSX, ARGBMultiplyRow_LSX, ARGBAddRow_LSX,
ARGBSubtractRow_LSX, ARGBAttenuateRow_LSX, ARGBToRGB565DitherRow_LSX,
ARGBShuffleRow_LSX, ARGBShadeRow_LSX, ARGBGrayRow_LSX,
ARGBSepiaRow_LSX

Bug: libyuv:913
Change-Id: I02c0c9d68b229c4a66c96837e9b928c2f5dda1f3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4546814
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-19 18:55:58 +00:00
Lu Wang
1d940cc570 Optimize the following functions with LSX.
MirrorRow_LSX, MirrorUVRow_LSX, ARGBMirrorRow_LSX,
I422ToYUY2Row_LSX, I422ToUYVYRow_LSX, I422ToARGBRow_LSX,
I422ToRGBARow_LSX, I422AlphaToARGBRow_LSX, I422ToRGB24Row_LSX,
I422ToRGB565Row_LSX, I422ToARGB4444Row_LSX, I422ToARGB1555Row_LSX,
YUY2ToYRow_LSX, YUY2ToUVRow_LSX, YUY2ToUV422Row_LSX

Bug: libyuv:913
Change-Id: I46cec605001d7ddd73846eed6d0a77f936b6dc53
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4515191
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-10 00:25:48 +00:00
Frank Barchard
ee3e71c7ce Any functions use memset(vin, 0, sizeof(vin)) for GCC warning fix
- Fix -Wmemset-elt-size warning for GCC
- Use vin for inputs and vout for outputs

Bug: None
Change-Id: Iefd418dc884b4d062e1fdd9215319c8838c49eaa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4412065
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2023-04-10 20:44:10 +00:00
James Zern
0200037a5a row_any,ANYDETILE: fix -Wmemset-elt-size warning
under gcc 12.2.0 using -Wall:

source/row_any.cc: In function ‘void libyuv::DetileRow_16_Any_SSE2(const
                       uint16_t*, ptrdiff_t, uint16_t*, int)’:
source/row_any.cc:2287:11: warning: ‘memset’ used with length equal to
number of elements without multiplication by element size
[-Wmemset-elt-size]
 2287 |     memset(temp, 0, 16 * BPP); /* for msan */
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
source/row_any.cc:2308:1: note: in expansion of macro ‘ANYDETILE’
 2308 | ANYDETILE(DetileRow_16_Any_SSE2, DetileRow_16_SSE2, uint16_t, 2, 15)

This increases the memset to the full buffer size, which may not be
strictly necessary.

Change-Id: Iea2fc649990ee84ea9aa8020d6f6b25e012b18fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4406599
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-08 19:01:02 +00:00
Frank Barchard
88b050f337 MergeUV AVX512BW use assembly
- Convert MergeUVRow_AVX512BW to assembly
- Enable MergeUVRow_AVX512BW for Windows with clangcl
- MergeUVRow_AVX2 use vpmovzxbw and vpsllw
- MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V

AMD Zen 4 640x360 100000 iterations
Was
AVX512 MergeUVPlane_Opt (884 ms)
AVX2   MergeUVPlane_Opt (945 ms)
AVX2   MergeUVPlane_16_Opt (2167 ms)

Now
AVX512 MergeUVPlane_Opt (865 ms)
AVX2   MergeUVPlane_Opt (943 ms)
SSE2   MergeUVPlane_Opt (973 ms)
AVX2   MergeUVPlane_16_Opt (2102 ms)

Bug: None
Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230
Reviewed-by: Fritz Koenig <frkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-22 21:19:08 +00:00
Frank Barchard
2bdc210be9 MergeUV_AVX512BW for I420ToNV12
On Skylake Xeon 640x360 100000 iterations
AVX512   MergeUVPlane_Opt (1196 ms)
AVX2     MergeUVPlane_Opt (1565 ms)
SSE2     MergeUVPlane_Opt (1780 ms)
Pixel 7  MergeUVPlane_Opt (1177 ms)

Bug: None
Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-13 20:14:57 +00:00
Hao Chen
0809713775 Refine some functions on the Longarch platform.
Add ARGBToYMatrixRow_LSX/LASX, RGBAToYMatrixRow_LSX/LASX and
RGBToYMatrixRow_LSX/LASX functions with RgbConstants argument.

Bug: libyuv:912
Change-Id: I956e639d1f0da4a47a55b79c9d41dcd29e29bdc5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4167860
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-18 18:54:14 +00:00
Frank Barchard
ea26d7adb1 DetilePlane_16 AVX version
- fix ifdefs for DetilePlane_16 to use 16 bit versions, not 8 bit.  (no functional change)

Bug: b/258474032
Change-Id: Ic07e02d9801e21126ebee0ceb5779aa712a493ce
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4034812
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-18 23:59:06 +00:00
Frank Barchard
2d2cee418a Add Detile_16 planar function for 10 bit MT2T format
- Neon and SSE2
- Any for odd widths

Pixel 2 little core AArch32 build
C
TestDetilePlane_16 (1275 ms)
TestDetilePlane (1203 ms)
Neon
TestDetilePlane_16 (693 ms)
TestDetilePlane (660 ms)

Bug: b/258474032
Change-Id: Idbd09c5e9324e4deef5f1d54090d4b63cc7db812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4031848
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-17 02:47:57 +00:00
Frank Barchard
00950840d1 YUY2ToNV12 using YUY2ToY and YUY2ToNVUV
- Optimized YUY2ToNV12 that reduces it from 3 steps to 2 steps
  - Was SplitUV, memcpy Y, InterpolateUV
  - Now YUY2ToY, YUY2ToNVUV
- rollback LIBYUV_UNLIMITED_DATA

3840x2160 1000 iterations:

Pixel 2 Cortex A73
Was YUY2ToNV12_Opt (6515 ms)
Now YUY2ToNV12_Opt (3350 ms)

AB7 Mediatek P35 Cortex A53
Was YUY2ToNV12_Opt (6435 ms)
Now YUY2ToNV12_Opt (3301 ms)

Skylake AVX2 x64
Was YUY2ToNV12_Opt (1872 ms)
Now YUY2ToNV12_Opt (1657 ms)

SSE2 x64
Was YUY2ToNV12_Opt (2008 ms)
Now YUY2ToNV12_Opt (1691 ms)

Windows Skylake AVX2 32 bit x86
Was YUY2ToNV12_Opt (2161 ms)
Now YUY2ToNV12_Opt (1628 ms)

Bug: libyuv:943
Change-Id: I6c2ba2ae765413426baf770b837de114f808f6d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3929843
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-30 22:41:21 +00:00
Frank Barchard
248172e2ba I422ToRGB24, I422ToRAW, I422ToRGB24MatrixFilter conversion functions added.
- YUV to RGB use linear for first and last row.
- add assert(yuvconstants)
- rename pointers to match row functions.
- use macros that match row functions.
- use 12 bit upsampler for conversions of 10 and 12 bits

Cortex A53 AArch32
I420ToRGB24_Opt (3627 ms)
I422ToRGB24_Opt (4099 ms)
I444ToRGB24_Opt (4186 ms)
I420ToRGB24Filter_Opt (5451 ms)
I422ToRGB24Filter_Opt (5430 ms)

AVX2
Was I420ToRGB24Filter_Opt (583 ms)
Now I420ToRGB24Filter_Opt (560 ms)

Neon Cortex A7
Was I420ToRGB24Filter_Opt (5447 ms)
Now I420ToRGB24Filter_Opt (5439 ms)

Bug: libyuv:938


Change-Id: I1731f2dd591073ae11a756f06574103ba0f803c7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906082
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-20 02:00:52 +00:00
Frank Barchard
3e38ce5058 SSE2 MM21->YUY2 conversion
Add SSE2 optimization for MM21ToYUY2 conversion.

Bug: b/238137982
Change-Id: I189f712514308322f651b082b496bce9c015c4ee
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832525
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-08-17 18:39:05 +00:00
Frank Barchard
65e7c9d570 MM21ToYUY2 and ABGRToJ420 conversion
MM21 to YUY2 use zip1 for performance

Cortex A510
Was MM21ToYUY2 (612 ms)
Now MM21ToYUY2 (573 ms)

Prefetches help Cortex A53
Was MM21ToYUY2 (4998 ms)
Now MM21ToYUY2 (1900 ms)

Pixel 4 Cortex A76
Was MM21ToYUY2 (215 ms)
Now MM21ToYUY2 (173 ms)

ABGRToJ420
- NEON, SSSE3 and AVX2 row functions
- J400, J420 and J422 formats.
- Added AVX2 for UV on ARGBToJ420.  Was SSSE3

Same code/performance as ARGBToJ420 but with constants re-ordered.
Pixel 4
ABGRToJ420_Opt (623 ms)
ABGRToJ422_Opt (702 ms)
ABGRToJ400_Opt (238 ms)

Skylake Xeon
With LIBYUV_BIT_EXACT which uses C for UV
ABGRToJ420_Opt (988 ms)
ABGRToJ422_Opt (1872 ms)
ABGRToJ400_Opt (186 ms)
Skylake Xeon using AVX2
ABGRToJ420_Opt (251 ms)
ABGRToJ422_Opt (245 ms)
ABGRToJ400_Opt (184 ms)
Skylake Xeon using SSSE3
ABGRToJ420_Opt (328 ms)
ABGRToJ422_Opt (362 ms)
ABGRToJ400_Opt (185 ms)

Bug: b/238137982
Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-16 22:07:38 +00:00
Frank Barchard
6900494d90 Merge/SplitRGB fix -mcmodel=large x86 and InterpolateRow_16To8_NEON
MergeRGB and SplitRGB use a register to point to 9 shuffle tables.

- fixes an out of registers error with -mcmodel=large

InterpolateRow_16To8_NEON improves performance for I210ToI420:

On Pixel 4 for 720p x1000 images
Was I210ToI420_Opt (608 ms)
Now I210ToI420_Opt (336 ms)

On Skylake Xeon
Was I210ToI420_Opt (259 ms)
Now I210ToI420_Opt (209 ms)


Bug: libyuv:931, libyuv:930
Change-Id: I20f8244803f06da511299bf1a2ffc7945eb35221
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717054
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-06-29 00:00:46 +00:00
Frank Barchard
e906ba9fe9 InterpolateRow_Any test if fraction is 0 and dont memcpy 2nd row.
Bug: b/228605787
Change-Id: Ia8912e4c1599401320ee82882a2593e78bf56582
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3708833
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-06-17 18:15:09 +00:00
Frank Barchard
30f9b28048 Add I210ToI420
Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: Ib135d0b4ff17665f6a4ab60edb782a7b314219a4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3696042
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2022-06-09 08:07:50 +00:00
Frank Barchard
d011314f14 Revert "I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix"
This reverts commit 60254a1d846a93a4d7559009004cdd91bcc04d82.

Reason for revert: breaks PaintCanvasVideoRendererTest.HighBitDepth

Original change's description:
> I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix
>
> - Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit
> - Add NEON InterpolateRow_16 for fast 10 bit scaling
> - When scaling up, set step to interpolate toward height - 1 to avoid buffer overread
> - When scaling down, center the 2 rows used for source to achieve filtering.
> - CopyPlane check for 0 size and return
>
> Bug:  libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
> Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1
> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552
> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
> Reviewed-by: richard winterton <rrwinterton@gmail.com>

Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: Icc05bb340db0e7fe864061fb501d0a861c764116
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3692886
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2022-06-07 09:16:05 +00:00
Frank Barchard
60254a1d84 I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix
- Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit
- Add NEON InterpolateRow_16 for fast 10 bit scaling
- When scaling up, set step to interpolate toward height - 1 to avoid buffer overread
- When scaling down, center the 2 rows used for source to achieve filtering.
- CopyPlane check for 0 size and return

Bug:  libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2022-06-07 01:41:56 +00:00
Frank Barchard
eb2c88e499 Convert16To8 NEON
Pixel 3
Was C    I010ToI420_Opt (749 ms)
Now NEON I010ToI420_Opt (356 ms)

Pixel 4
Was C    I010ToI420_Opt (581 ms)
Now NEON I010ToI420_Opt (163 ms)

Bug: b/233233302, b/233634772
Change-Id: I60a84648a66f77d97c0a7822b29bd18b8e3a3355
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3661401
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-05-24 18:07:16 +00:00
Frank Barchard
95b14b2446 RAWToJ400 faster version for ARM
- Unrolled to 16 pixels
- Take constants via structure, allowing different colorspace and channel order
- Use ADDHN to add 16.5 and take upper 8 bits of 16 bit values, narrowing to 8 bits
- clang-format applied, affecting mips code

On Cortex A510
Was RAWToJ400_Opt (1623 ms)
Now RAWToJ400_Opt (862 ms)

C   RAWToJ400_Opt (1627 ms)

Bug: b/220171611
Change-Id: I06a9baf9650ebe2802fb6ff6dfbd524e2c06ada0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-18 07:22:36 +00:00
Hao Chen
91bae707e1 Optimize functions for LASX in row_lasx.cc.
1. Optimize 18 functions in source/row_lasx.cc file.
2. Make small modifications to LSX.
3. Remove some unnecessary content.

Bug: libyuv:912
Change-Id: Ifd1d85366efb9cdb3b99491e30fa450ff1848661
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3507640
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 08:52:54 +00:00
Justin Green
b4ddbaf549 Add support for MM21.
Add support for MM21 to NV12 and I420 conversion, and add SIMD
optimizations for arm, aarch64, SSE2, and SSSE3 machines.

Bug: libyuv:915, b/215425056
Change-Id: Iecb0c33287f35766a6169d4adf3b7397f1ba8b5d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3433269
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Justin Green <greenjustin@google.com>
2022-02-03 17:01:49 +00:00
Frank Barchard
2c6bfc02d5 Remove MMI support
Bug: libyuv:916
Change-Id: I345b7e271ceb4b32fe91e292915e66be40812810
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3415817
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-26 08:41:33 +00:00
Hao Chen
dfe046d272 Add optimization functions in row_lsx.cc file.
Optimize 44 functions in source/row_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913
Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
de8ae8c679 Add optimization functions in row_lasx.cc file.
Optimize 32 functions in source/row_lasx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:912
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Change-Id: I7d3f649f753f72ca9bd052d5e0562dbc6f6ccfed
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351466
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
51de1e16f2 Add supports for loongarch LSX and LASX.
1. Add supports for LSX and LASX.
2. Three optimization functions are added in loongarch/row_lasx.cc file:
   I422ToARGBRow_LASX,I422ToRGBARow_LASX,I422AlphaToARGBRow_LASX.

Bug: libyuv:912, Bug: libyuv:913
Change-Id: I043c2704f99a5215724b5c0b7f97e6bf5f7a199b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329189
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-20 19:25:38 +00:00
Frank Barchard
90ffd5cba9 I420ToARGB for AVX512
On Skylake Xeon
AVX512  I420ToARGB_Opt (2050 ms)
AVX2    I420ToARGB_Opt (2533 ms)
SSSE3   I420ToARGB_Opt (3688 ms)

Bug: libyuv:911
Change-Id: I2214cc15dec24b06541895ca59d88990edbb2216
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3382100
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-14 09:17:33 +00:00
Frank Barchard
d7a2d5da87 J400ToARGB optimized for Exynos using ZIP+ST1
Bug: 204562143
Change-Id: I56c98198c02bd0dd1283f1c14837730c92832c39
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3328702
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-10 01:00:07 +00:00
Frank Barchard
000806f373 NV21ToYUV24 replace ST3 with ST1. ARGBToAR64 replace ST2 with ST1
On Samsung S8 Exynos M2
Was ST3 NV21ToYUV24_Opt (769 ms)
Now ST1 NV21ToYUV24_Opt (473 ms)
Was ST2 ARGBToAR64_Opt (1759 ms)
Now ST1 ARGBToAR64_Opt (987 ms)

Skylake Xeon, AVX2 version:
Was NV21ToYUV24_Opt (885 ms)
Now NV21ToYUV24_Opt (194 ms)

Bug: b/204562143, b/124413599
Change-Id: Icc9cb64d822cd11937789a4e04fbb773b3e33aa3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3290664
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-11-24 07:38:49 +00:00
Frank Barchard
b179f1847a Enable SIMD for exact RGB to Y conversions
Bug: libyuv:908, b/202888439
Change-Id: Icc5470b85d91b441ded9958ee04b4f32246646f0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3230489
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-10-19 07:54:50 +00:00
Frank Barchard
55b97cb48f BIT_EXACT for unattenuate and attenuate.
- reenable Intel SIMD unaffected by BIT_EXACT
- add bit exact version of ARGBAttenuate, which uses ARM version of formula.
- add bit exact version of ARGBUnatenuate, which mimics the AVX code.

Apply clang format to cleanup code.

Bug: libyuv:908, b/202888439
Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-10-15 19:46:02 +00:00
Frank Barchard
daf9778a24 Fix for failed compile with armv-7a neon gcc
Bug: libyuv:907
Change-Id: I955e83c72b57ce5ba45730030b32f337be610a21
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3216739
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-10-12 18:17:50 +00:00
Frank Barchard
287158925b use width + 1 for odd width tests
Bug: libyuv:894, libyuv:898, libyuv:899
Change-Id: Ieba8eaeb8b06f0323824967776673e339b263220
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2809701
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-04-09 20:17:55 +00:00
Yuan Tong
2cd098f83b Fix MergeAR64Plane on odd width
R=fbarchard@chromium.org

Bug: libyuv:898
Change-Id: I031e008ea91baba1c7598efa0eda70750cbfce85
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2810066
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-04-08 09:18:34 +00:00
Frank Barchard
60db98b6fa clang-tidy applied
Bug: libyuv:886, libyuv:889
Change-Id: I2d14d03c19402381256d3c6d988e0b7307bdffd8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2800147
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-04-01 21:42:47 +00:00
Yuan Tong
8a13626e42 Add MergeAR30Plane, MergeAR64Plane, MergeARGB16To8Plane
These functions merge high bit depth planar RGB pixels into packed format.

Change-Id: I506935a164b069e6b2fed8bf152cb874310c0916
Bug: libyuv:886, libyuv:889
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-03-31 20:46:02 +00:00
Yuan Tong
f37014fcff Add support for AR64 format
Add following conversions:
ARGB,ABGR <-> AR64,AB64
AR64 <-> AB64

R=fbarchard@chromium.org

Change-Id: I5ca5b40a98bffea11981e136afae4a511ba6c564
Bug: libyuv:886
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2746780
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-03-13 20:55:21 +00:00
Frank Barchard
ba033a11e3 Add 12 bit YUV to 10 bit RGB
Bug: libyuv:843
Change-Id: I0104c8fcaeed09e83d2fd654c6a5e7d41bcb74cf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2727775
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-03-05 01:09:37 +00:00
Yuan Tong
cdabad5bfa Add more 10 bit YUV To RGB function
The following functions are added:
planar YUV:
 I410ToAR30, I410ToARGB
planar YUVA:
 I010AlphaToARGB, I210AlphaToARGB, I410AlphaToARGB
biplanar YUV:
 P010ToARGB, P210ToARGB
 P010ToAR30, P210ToAR30

biplanar functions can also handle 12 bit and 16 bit samples.

libyuv_unittest --gtest_filter=LibYUVConvertTest.*10*ToA*:LibYUVConvertTest.*P?1?ToA*

R=fbarchard@chromium.org

Bug: libyuv:751, libyuv:844
Change-Id: I2be02244dfa23335e1e7bc241fb0613990208de5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2707003
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-03 15:48:47 +00:00
Yuan Tong
a8c181050c Add 10/12 bit YUV To YUV functions
The following functions (and their 12 bit variant) are added:

planar, 10->10:
 I410ToI010, I210ToI010

planar, 10->8:
 I410ToI444, I210ToI422

planar<->biplanar, 10->10:
 I010ToP010, I210ToP210, I410ToP410
 P010ToI010, P210ToI210, P410ToI410

R=fbarchard@chromium.org

Change-Id: I9aa2bafa0d6a6e1e38ce4e20cbb437e10f9b0158
Bug: libyuv:834, libyuv:873
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2709822
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-02-25 23:16:54 +00:00
Frank Barchard
c28d404936 win32 build fix for I422ToRGBA
Bug:  libyuv:877, b/178713286
Change-Id: Iad55df99083b9a4bb9306e052e0e687e58570d96
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2657701
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-29 10:07:08 +00:00
Yuan Tong
a85cc26fde Add MergeARGBPlane and SplitARGBPlane
These functions convert between planar and interleaved ARGB,
optionally fill 255 to alpha / discard alpha.

This can help handle YUV(A) with Identity matrix, which is
basically planar ARGB.

libyuv_unittest --gtest_filter=LibYUVPlanarTest.*ARGBPlane*:LibYUVPlanarTest.*XRGBPlane*

R=fbarchard@google.com

Change-Id: I522a189b434f490ba1723ce51317727e7c5eb112
Bug: libyuv:877
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2649887
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-27 19:33:51 +00:00
Yuan Tong
08d0dce5fc Add I422AlphaToARGB and I444AlphaToARGB
Bug: libyuv:878
Change-Id: I64c314326ac7ae5242acc64e20016e30adc6d17f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2639439
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-23 00:40:33 +00:00
Frank Barchard
d6833cda38 ARGBSetRow_Any do memset for msan
Bug: b/169296991
Change-Id: Ia000cdbca0d0d95465e09535b67775ad3b885038
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2434383
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-09-28 19:16:12 +00:00
Shiyou Yin
ce5b333853 ARGBToI420 MMI and MSA version match C.
In commit 0b8bb6, C version has been updated.
This patch update the MMI and MSA version to mach C version.

Change-Id: Ib28da3629a8465990c8e2185278a95af8c27a31d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227754
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2020-06-04 04:51:05 +00:00
Shiyou Yin
db63668a24 Add MirrorUVRow_MSA.
Change-Id: Ic498d1175c3f916d0101b0fd8603b5cae994138b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227753
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-06-04 04:12:24 +00:00
Frank Barchard
8869628c24 Remove unnecessary include of convert_argb
Bug: libyuv:861, b/156642185
Change-Id: I3ddbe2f7b61629ed18b6879203203a51b3700773
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2219047
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-05-28 18:58:37 +00:00
Frank Barchard
da41bca02b I400ToARGBMatrix Pass a color matrix to use different coefficients
32 bit
Neon I400ToARGB_Opt (1937 ms)
64 bit
C I400ToARGB_Opt (8957 ms)
NEON I400ToARGB_Opt (2147 ms)

x86
cI400ToARGB_Opt (1110 ms)
AVX2 I400ToARGB_Opt (213 ms)
SSE2 I400ToARGB_Opt (225 ms)

Bug: libyuv:861, b/156642185
Change-Id: I96b6f4ebba6ff9c4ed8803291ce098de6f93fa4f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2209718
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-20 20:33:12 +00:00
Frank Barchard
d426247a3b YUV to RGB Matrix functions for color space support
Make all Matrix versions of conversions public.

Bug: libyuv:861, b/156642185
Change-Id: Ida067c95dd041b612e2bab64dbface58b257038a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2202748
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
2020-05-19 16:59:29 +00:00
Frank Barchard
7a61759f78 NV12Mirror and MirrorUVPlane functions added
HalfMergeUV AVX2 version

Skylake Xeon performance for 1280x720
NV12Mirror_Any (109 ms)
NV12Mirror_Unaligned (113 ms)
NV12Mirror_Invert (107 ms)
NV12Mirror_Opt (108 ms)
NV12Mirror_NullY (19 ms)

Slightly faster than comparable I420Mirror
I420Mirror_Any (113 ms)
I420Mirror_Unaligned (110 ms)
I420Mirror_Invert (109 ms)
I420Mirror_Opt (110 ms)

BUG=libyuv:840, libyuv:858

Change-Id: I686b1b778383bfa10ecd1655e986bdc99e76d132
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2176066
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-04 22:32:14 +00:00
Frank Barchard
aabcc477bd RGB24Mirror function
Bug: b/151960427
Change-Id: I413db0011a4ed87eefc0dd166bb8e076b5aa4b1d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2116639
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-03-24 20:13:08 +00:00
Frank Barchard
3db22ebc4b RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996 ms, now 3309. 20.7% faster.
Call a row function for each row, based on ARGBToI400 code.
But implement row functions as 2 step conversion.  Adds the
row functions:
RAWToYJ, RGBToYJ, SSSE3 and AVX2 versions, and Any versions.
The smaller row buffer is more cache friendly on large images.

The max cache size can be configured, and is currently:
// Maximum temporary width for wrappers to process at a time, in pixels.
And the row buffer is
  SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4]);
So 8192 bytes are used for the row buffer, leaving the rest for source
and destination buffers.

blaze-bin/third_party/libyuv/libyuv_test '--gunit_filter=*R*To?400_Opt' --libyuv_width=3600 --libyuv_height=2500 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 | sortms

Was
RAWToJ400_Opt (3996 ms)
ARGBToI400_Opt (3964 ms)
RGB24ToJ400_Opt (3960 ms)
ARGBToJ400_Opt (3909 ms)
RGBAToJ400_Opt (3885 ms)

Now
ARGBToJ400_Opt (4091 ms)
ARGBToI400_Opt (3936 ms)
RGBAToJ400_Opt (3428 ms)
RGB24ToJ400_Opt (3324 ms)
RAWToJ400_Opt (3309 ms)

Bug: libyuv:854, b/147753855
Change-Id: Ieb65fbda94e812c737f4c3c74107354b73c4bcd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2016203
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-01-23 03:23:38 +00:00
Frank Barchard
1cea4235af RAWToJ400 for big endian RGB to grey scale.
On Pixel 3
Was
BM_ConvertToGray/1280/720/3                        2360958 ns      2334984 ns         2999
BM_ConvertToGray/1279/721/3                        2360289 ns      2334329 ns         2994
BM_ConvertGrayTensorflowCoefficients/1280/720/3    2983296 ns      2947113 ns         2259
BM_ConvertGrayTensorflowCoefficients/1279/721/3    2871205 ns      2835359 ns         2170

Now
BM_ConvertToGray/1280/720/3                        2358469 ns      2334068 ns         2997
BM_ConvertToGray/1279/721/3                        2364584 ns      2336892 ns         2995
BM_ConvertGrayTensorflowCoefficients/1280/720/3     281312 ns       278244 ns        25170
BM_ConvertGrayTensorflowCoefficients/1279/721/3     351310 ns       347229 ns        20217

BUG=libyuv:854

Change-Id: If2192affc2d3219e0fb824737d75b9374a25d709
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2003236
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-01-16 00:29:11 +00:00
Frank Barchard
d82f4baf5f Upstream minor changes. Faster tests, Faster YUV Rotate180 and Mirror
Bug: libyuv:840, libyuv:849: b/144318948
Change-Id: I303c02ac2b838a09d3e623df7a69ffc085fe3cd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1914781
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-13 20:02:40 +00:00
Frank Barchard
22f8aad8bc RAWToRGBA for 3 channel OCR
Replace ARM64 only row function with high level function
that implements SSSE3, 32 bit Neon and C.

Compared to 2 step RAWToARGB + ARGBToRGBA on row level:
3.1x faster on ARM
6.2% faster on Intel

BUG=b/140748379

Change-Id: Ia8636d9e4fcdbe10b8c2e81610a54728e29845cd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1860914
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-10-14 22:27:37 +00:00
Frank Barchard
fce0fed542 ARGBToY use 8 bit precision instead of 7 bit.
Neon and GCC Intel optimized, but win32 and mips not optimized.

BUG=libyuv:842, b/141482243

Change-Id: Ia56fa85c8cc1db51f374bd0c89b56d21ec94afa7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1825642
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2019-10-07 23:01:10 +00:00
Frank Barchard
c85a7b3ae3 MMI Optimized functions I422ToARGB for 1080p video
Improves playback performance for 1080p video on www.youku.com

BUG=libyuv:841

Change-Id: Iabe7693fba276162af0290863f46e214ab86fb6c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1790959
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2019-09-11 21:06:21 +00:00
Frank Barchard
0bb2773a39 AVX2 versions of ABGRToNV12 and ABGRToNV21
BUG=libyuv:833

Change-Id: I9b6653e9c304b4e0805b7d3c8408ce57009c8559
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1740682
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-08-07 18:16:34 +00:00
Frank Barchard
fec9121b67 SwapUV AVX2 and SSSE3
Based on ARGBShuffle but with count adjusted and new shuffle mask

BUG=libyuv:809

Change-Id: Idd936ee6bedcf285607a68c2fc54d876b4becc01
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1711882
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-26 18:41:40 +00:00
Frank Barchard
f9aacffa02 Fix arm unittest failure by removing unused FloatDivToByteRow.
Apply clang-format to fix jpeg if() for lint fix.
Change comments about 4th pixel for open source compliance.
Rename UVToVU to SwapUV for consistency with MergeUV.

BUG=b/135532289, b/136515133

Change-Id: I9ce377c57b1d4d8f8b373c4cb44cd3f836300f79
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1685936
Reviewed-by: Chong Zhang <chz@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-02 20:00:30 +00:00
Frank Barchard
413a8d8041 Add AYUVToNV12 and NV21ToNV12
BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*ToNV12* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1

R=rrwinterton@gmail.com

Change-Id: Id03b4613211fb6a6e163d10daa7c692fe31e36d8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1560080
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2019-04-12 17:48:45 +00:00
Frank Barchard
5b6042fa0d add YUV24 and AYUV formats
Alternatives to RGB24 and AYUV for working with GPU.

BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*NV21To???24* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
R=rrwinterton@gmail.com

Change-Id: I5559c63f4bd4c847492fcb1571f7b03c58146689
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1501735
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-03-05 02:53:56 +00:00
Martin Storsjö
9b772abf97 Restore the file mode for source files
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.

Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-08-06 18:53:32 +00:00
lixia zhang
21be9122aa libyuv:loongson optimize compare/row/scale/rotate files with mmi.
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).

In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.

BUG=libyuv:804

Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-07-20 22:53:04 +00:00
Frank Barchard
a7fb978e30 ARGBExtractAlphaRow_Any_AVX2 fix pixel count mask
Mask was set to 32, but should have been 31.
BUG=libyuv:798
TESTED=try bots tested

Change-Id: I6120928873a4a2f1efef907d8e8296ca8c20bb03
Reviewed-on: https://chromium-review.googlesource.com/1054830
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-05-11 07:13:58 +00:00
Frank Barchard
83aa7512c1 AVX512 VMBI version of ARGBToRGB24
Use VMBI instructions but on AVX2 registers to avoid clockrate change.

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: Id4f8ad1e0e142a380c8a46c5eab90ce145a10edd
Reviewed-on: https://chromium-review.googlesource.com/956609
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-10 02:04:48 +00:00
Frank Barchard
1d509f2178 ARGBToRGB24_AVX2 version
AVX2 port of SSSE3 conversion to output 24 bit RGB

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I14f7815522d1b790ecd2bb39d9a3441e803b694a
Reviewed-on: https://chromium-review.googlesource.com/953303
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-08 02:38:21 +00:00
Frank Barchard
3009890c11 NV21ToRGB24_AVX2 and SSSE3
Use 2 step conversion for NV21ToRGB24 to leverage AVX2
low levels instead of C.

Was C
NV21ToRGB24_Opt (882 ms)

Now SSSE3
NV21ToRGB24_Opt (218 ms)

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I58faf766bbec4cc595aab2e217f6c874dd4b4363
Reviewed-on: https://chromium-review.googlesource.com/951629
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-07 03:58:48 +00:00
Frank Barchard
85722f5d93 ByteToFloatRow_NEON to convert and scale bytes to floats
Each byte is converted to float (0.0 to 255.0) and then multiplied
by a scale parameter.

Bug: None
Test: arm 64 build passes.
Change-Id: I04736798540b8d985f60abdf0388e24a209d075b
Reviewed-on: https://chromium-review.googlesource.com/930226
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ian Field <ianfield@google.com>
2018-02-24 00:34:07 +00:00
Frank Barchard
0ea50cbc74 NV21ToRGB24_NEON conversion
32 bit thumb2 performance:
NV12ToARGB_Opt (472 ms)
NV21ToARGB_Opt (466 ms)
NV12ToRGB24_Opt (457 ms)
NV21ToRGB24_Opt (457 ms)
NV12ToRGB565_Opt (501 ms)

Bug: libyuv:778
Test: add new NV21ToRGB24 test
Change-Id: I330585789835c79ee4b4da61d164716598268df3
Reviewed-on: https://chromium-review.googlesource.com/924646
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-22 22:24:24 +00:00
Frank Barchard
664c735677 I420ToYUY2_AVX2 port
I420 and I422 To YUY2 and UYVY ported from SSE2 to AVX2.

Was SSE2
I420ToYUY2_Opt (135 ms)
I420ToUYVY_Opt (148 ms)
I422ToYUY2_Opt (145 ms)
I422ToUYVY_Opt (142 ms)

Now AVX2
I420ToYUY2_Opt (133 ms)
I420ToUYVY_Opt (130 ms)
I422ToYUY2_Opt (127 ms)
I422ToUYVY_Opt (137 ms)

Bug: libyuv:556
Test: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*I42?To*UY*Opt
Change-Id: Ic35f97cee02dc009fd98785589ba17c7cf50bb35
Reviewed-on: https://chromium-review.googlesource.com/892493
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-01 00:33:25 +00:00
Frank Barchard
ffec313dbe ABGRToAR30 used AVX2 with reversed shuffler
vpshufb is used to reverse R and B channels;
Code is otherwise the same as ARGBToAR30.

Bug: libyuv:751
Test: ABGRToAR30 unittest
Change-Id: I30e02925f5c729e4496c5963ba4ba4af16633b3b
Reviewed-on: https://chromium-review.googlesource.com/891807
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-29 22:31:31 +00:00
Frank Barchard
ed96b7b2c7 AVX2 port of H010ToAR30_AVX2
Was SSSE3 H010ToAR30_Opt (635 ms)
Now AVX2  H010ToAR30_Opt (448 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I17b1a0e3268c4a9836e09683dd3377fb1ce60932
Reviewed-on: https://chromium-review.googlesource.com/889906
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-27 00:14:27 +00:00
Frank Barchard
c95fd57993 AVX2 port of I010ToAR30_AVX2
Was SSSE3 I420ToAR30_Opt (635 ms)
Now AVX2  I420ToAR30_Opt (446 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I261be19ec981136a8f453ae0d3211532a790e5c5
Reviewed-on: https://chromium-review.googlesource.com/887750
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-26 02:12:07 +00:00
Frank Barchard
92e22cf5b6 Lint cleanup after C99 change CL
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: git cl lint
Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743
Reviewed-on: https://chromium-review.googlesource.com/882217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 19:16:03 +00:00
Frank Barchard
7e389884a1 Switch to C99 types
Append _t to all sized types.
uint64 becomes uint64_t etc

Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Frank Barchard
8af6ea4100 I420ToAR30 in 1 step SSSE3 assembly
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToAR30_Opt
Change-Id: Ie89c3eb2526354cf11175746bc8af72be83a1e00
Reviewed-on: https://chromium-review.googlesource.com/877541
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-23 01:33:10 +00:00
Frank Barchard
09db0c4ce2 H010ToAR30 in 1 step with SSSE3 assembly
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.

[ RUN      ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[       OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[       OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-19 19:46:58 +00:00
Frank Barchard
00d526d4ea H010ToARGB_AVX2 optimized conversion
AVX2 optimized 10 bit YUV to ARGB.

Bug: libyuv:751
Test: H010ToARGB unittest
Change-Id: I705630beb62714b52042c2a5dcdb8b7859e734ae
Reviewed-on: https://chromium-review.googlesource.com/852563
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-09 03:17:33 +00:00
Frank Barchard
9d2cd6a3ef H010ToAR30 optimized to 2 step conversion
Previously H010ToAR30 was done in a 3 step conversion:
H010ToH420, H420ToARGB, ARGBToAR30.
This CL merges the first 2 steps into H010ToARGB, to
improve performance.
Caveat - only 10 bit YUV is supported at this time.
Previously the low level code supported different numbers
of bits - 9, 10, 12 or 16.

Was 3 step conversion:
LibYUVConvertTest.H010ToAR30_Any (1263 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
LibYUVConvertTest.H010ToAR30_Invert (913 ms)
LibYUVConvertTest.H010ToAR30_Opt (901 ms)

Now 2 step conversion:
LibYUVConvertTest.H010ToAR30_Any (853 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
LibYUVConvertTest.H010ToAR30_Invert (781 ms)
LibYUVConvertTest.H010ToAR30_Opt (755 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
Reviewed-on: https://chromium-review.googlesource.com/853288
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-07 08:36:57 +00:00
Frank Barchard
a64658593e I210ToARGB conversion from 10 bit YUV to RGB
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.

Bug: libyuv:751
Test:  I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-05 02:43:38 +00:00
Frank Barchard
140fc0a261 Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.

Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-02 18:57:39 +00:00
Frank Barchard
768f103b8b Convert8To16 for better H010 support
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.

Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-28 22:27:24 +00:00
Frank Barchard
3b81288ece Remove Mips DSPR2 code
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-14 18:22:16 +00:00
Frank Barchard
c367751430 ARGBToAR30 SSSE3 use pmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 20:12:58 +00:00
Frank Barchard
0f98c3c1df Add ARGBToAR30Row_SSE2 to speed up H010ToAR30
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.

Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-09 00:11:20 +00:00
Frank Barchard
324fa32739 Convert16To8Row_SSSE3 port from AVX2
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used.  This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.

Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-28 19:22:39 +00:00
Frank Barchard
a98d6cdb17 ARGBToAR30 AVX2 conversion function
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: I09c13eb53ba5f1ce1740c013dc587f8300f1d9e0
Reviewed-on: https://chromium-review.googlesource.com/780437
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-21 20:37:01 +00:00
Frank Barchard
1e16cb5c38 SplitRGBPlane and MergeRGBPlane functions added
Converts packed RGB to planar and back.

TBR=kjellander@chromium.org
BUG=libyuv:728
TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added

Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc
Reviewed-on: https://chromium-review.googlesource.com/658069
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-11 21:02:04 +00:00
Manojkumar Bhosale
b6e8e9aa97 Add MSA optimized HalfFloatRow function
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: I54a2c57d66093b887c8ba31fd7a21a102165393a
Reviewed-on: https://chromium-review.googlesource.com/628557
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-29 18:40:08 +00:00
Frank Barchard
78e44628c6 Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Ie2342f841f1bb8469fc4631b784eddd804f5d53e
Reviewed-on: https://chromium-review.googlesource.com/616765
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-17 18:39:22 +00:00
Manojkumar Bhosale
dbd7c1a9c5 Add MSA optimized ARGBExtractAlpha, ARGBBlend, ARGBQuantize and ARGBColorMatrix row functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: I17bd3f87336f613ad363af7d7b9d7af49d725e56
Reviewed-on: https://chromium-review.googlesource.com/613100
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-14 17:38:31 +00:00
Frank Barchard
8cab2e31d7 I422ToRGB565 fix for odd widths
I422ToRGB565Row_Any_AVX2 uses 2 step row conversion that calls
I422ToARGBRow_AVX2 and then ARGBToRGB565.
I422ToARGBRow_AVX2 expects multiple of 16 pixels.
Adjust the I422ToRGB565Row_Any_AVX2 to do multiple of 16 with AVX2
and then remainder in a buffer.

Bug: libyuv: 657
Test: out/Release/libyuv_unittest --gtest_filter=*Convert*I*To* --libyuv_width=1280 --libyuv_height=720
Change-Id: Ice1cb6c7ff6b2295513e8b4a9f77522e1c659810
Reviewed-on: https://chromium-review.googlesource.com/474232
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
2017-04-11 17:24:05 +00:00
Frank Barchard
d59d3fcd18 Change parameter for '_Any' functions to param to avoid misnomer
BUG=None
TEST=None

Change-Id: I6940fc4753783afd25f83868635381bf801c65f5
Reviewed-on: https://chromium-review.googlesource.com/452962
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-10 23:32:39 +00:00
Frank Barchard
136aa9d37c any11p fix for buffer overrun
BUG=libyuv:686
TESTED=untested

Change-Id: Idfae93349dd78b1b633a596631e5397e11b77d0b
Reviewed-on: https://chromium-review.googlesource.com/448320
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-03-03 19:57:35 +00:00
Manojkumar Bhosale
45b176d153 Add MSA optimized Interpolate/MergeUV/Misc functions
BUG=libyuv:634

Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126

Performance Gain (vs C auto-vectorized)
InterpolateRow_MSA      - ~3.3x
InterpolateRow_Any_MSA  - ~2.5x
ARGBSetRow_MSA          - ~1.0x
ARGBSetRow_Any_MSA      - ~1.0x
ARGBToRGB24Row_MSA      - ~1.9x
ARGBToRGB24Row_Any_MSA  - ~1.6x
MergeUVRow_MSA          - ~1.6x
MergeUVRow_Any_MSA      - ~1.2x

Performance Gain (vs C non-vectorized)
InterpolateRow_MSA      - ~11.3x
InterpolateRow_Any_MSA  - ~ 7.9x
ARGBSetRow_MSA          - ~ 6.2x
ARGBSetRow_Any_MSA      - ~ 4.0x
ARGBToRGB24Row_MSA      - ~ 9.9x
ARGBToRGB24Row_Any_MSA  - ~ 8.4x
MergeUVRow_MSA          - ~12.7x
MergeUVRow_Any_MSA      - ~ 8.0x

Change-Id: If8d60bd57f01fe95bc2fd26196466574195cc126
Reviewed-on: https://chromium-review.googlesource.com/445817
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-23 01:42:22 +00:00
Manojkumar Bhosale
eed66b2028 Add MSA optimized I444/I400/J400/YUY2/UYVY to ARGB row functions
BUG=libyuv:634

Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be

Performance Gain (vs C auto-vectorized)
I444ToARGBRow_MSA       - ~1.6x
I444ToARGBRow_Any_MSA   - ~1.6x
I400ToARGBRow_MSA       - ~5.5x
I400ToARGBRow_Any_MSA   - ~5.3x
J400ToARGBRow_MSA       - ~1.0x
J400ToARGBRow_Any_MSA   - ~1.0x
YUY2ToARGBRow_MSA       - ~1.6x
YUY2ToARGBRow_Any_MSA   - ~1.6x
UYVYToARGBRow_MSA       - ~1.6x
UYVYToARGBRow_Any_MSA   - ~1.6x

Performance Gain (vs C non-vectorized)
I444ToARGBRow_MSA       - ~7.3x
I444ToARGBRow_Any_MSA   - ~7.1x
I400ToARGBRow_MSA       - ~5.5x
I400ToARGBRow_Any_MSA   - ~5.2x
J400ToARGBRow_MSA       - ~6.8x
J400ToARGBRow_Any_MSA   - ~5.7x
YUY2ToARGBRow_MSA       - ~7.2x
YUY2ToARGBRow_Any_MSA   - ~7.0x
UYVYToARGBRow_MSA       - ~7.1x
UYVYToARGBRow_Any_MSA   - ~6.9x

Change-Id: Ida80027c36a938a3bcf6f4480626f8eb9495e1be
Reviewed-on: https://chromium-review.googlesource.com/439246
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-02-21 23:22:07 +00:00
Manojkumar Bhosale
54ce8f23d6 Add MSA optimized ARGB/ABGR/BGRA/RGBA To Y/UV row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C auto-vectorized)
ARGBToYJRow_MSA       - ~3.2x
ARGBToYJRow_Any_MSA   - ~2.7x
BGRAToYRow_MSA        - ~3.2x
BGRAToYRow_Any_MSA    - ~2.7x
ABGRToYRow_MSA        - ~3.2x
ABGRToYRow_Any_MSA    - ~2.6x
RGBAToYRow_MSA        - ~3.1x
RGBAToYRow_Any_MSA    - ~2.7x
ARGBToUVJRow_MSA      - ~5.5x
ARGBToUVJRow_Any_MSA  - ~4.5x
BGRAToUVRow_MSA       - ~2.1x
BGRAToUVRow_Any_MSA   - ~2.0x
ABGRToUVRow_MSA       - ~2.1x
ABGRToUVRow_Any_MSA   - ~1.9x
RGBAToUVRow_MSA       - ~2.2x
RGBAToUVRow_Any_MSA   - ~1.9x

Performance Gain (vs C non-vectorized)
ARGBToYJRow_MSA       - ~10.9x
ARGBToYJRow_Any_MSA   -  ~9.2x
BGRAToYRow_MSA        - ~10.9x
BGRAToYRow_Any_MSA    -  ~9.3x
ABGRToYRow_MSA        - ~11.0x
ABGRToYRow_Any_MSA    -  ~9.3x
RGBAToYRow_MSA        - ~10.9x
RGBAToYRow_Any_MSA    -  ~9.1x
ARGBToUVJRow_MSA      - ~12.4x
ARGBToUVJRow_Any_MSA  - ~10.5x
BGRAToUVRow_MSA       -  ~4.7x
BGRAToUVRow_Any_MSA   -  ~4.4x
ABGRToUVRow_MSA       -  ~4.7x
ABGRToUVRow_Any_MSA   -  ~4.5x
RGBAToUVRow_MSA       -  ~4.8x
RGBAToUVRow_Any_MSA   -  ~4.4x

Review-Url: https://codereview.chromium.org/2641153003 .
2017-02-01 10:31:28 +05:30
Manojkumar Bhosale
09b8c971b3 Add MSA optimized NV12/21 To RGB row functions
R=fbarchard@google.com
BUG=libyuv:634

Performance Gain (vs C auto-vectorized)
NV12ToARGBRow_MSA       - ~1.5x
NV12ToARGBRow_Any_MSA   - ~1.4x
NV12ToRGB565Row_MSA     - ~1.4x
NV12ToRGB565Row_Any_MSA - ~1.4x
NV21ToARGBRow_MSA       - ~1.5x
NV21ToARGBRow_Any_MSA   - ~1.5x
SobelRow_MSA            - ~4.3x
SobelRow_Any_MSA        - ~3.4x
SobelToPlaneRow_MSA     - ~8.0x
SobelToPlaneRow_Any_MSA - ~4.7x
SobelXYRow_MSA          - ~3.0x
SobelXYRow_Any_MSA      - ~2.5x

Performance Gain (vs C non-vectorized)
NV12ToARGBRow_MSA       - ~6.5x
NV12ToARGBRow_Any_MSA   - ~6.5x
NV12ToRGB565Row_MSA     - ~6.2x
NV12ToRGB565Row_Any_MSA - ~6.1x
NV21ToARGBRow_MSA       - ~6.5x
NV21ToARGBRow_Any_MSA   - ~6.5x
SobelRow_MSA            - ~14.5x
SobelRow_Any_MSA        - ~11.3x
SobelToPlaneRow_MSA     - ~34.2x
SobelToPlaneRow_Any_MSA - ~19.4x
SobelXYRow_MSA          - ~11.1x
SobelXYRow_Any_MSA      - ~9.1x

Review-Url: https://codereview.chromium.org/2636483002 .
2017-01-18 09:24:39 +05:30