235 Commits

Author SHA1 Message Date
Frank Barchard
2bdc210be9 MergeUV_AVX512BW for I420ToNV12
On Skylake Xeon 640x360 100000 iterations
AVX512   MergeUVPlane_Opt (1196 ms)
AVX2     MergeUVPlane_Opt (1565 ms)
SSE2     MergeUVPlane_Opt (1780 ms)
Pixel 7  MergeUVPlane_Opt (1177 ms)

Bug: None
Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-13 20:14:57 +00:00
Frank Barchard
ea26d7adb1 DetilePlane_16 AVX version
- fix ifdefs for DetilePlane_16 to use 16 bit versions, not 8 bit.  (no functional change)

Bug: b/258474032
Change-Id: Ic07e02d9801e21126ebee0ceb5779aa712a493ce
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4034812
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-18 23:59:06 +00:00
Frank Barchard
8713ba3f0b Add vzeroupper to AVX row functions
- move power of two macro to planar functions source
- revert row.h IS_ALIGNED change

Bug: b/258474032
Change-Id: If87bb8d55c9b9930dd3e378614f8e4faae0870e9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4035166
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-11-17 23:00:08 +00:00
Frank Barchard
2d2cee418a Add Detile_16 planar function for 10 bit MT2T format
- Neon and SSE2
- Any for odd widths

Pixel 2 little core AArch32 build
C
TestDetilePlane_16 (1275 ms)
TestDetilePlane (1203 ms)
Neon
TestDetilePlane_16 (693 ms)
TestDetilePlane (660 ms)

Bug: b/258474032
Change-Id: Idbd09c5e9324e4deef5f1d54090d4b63cc7db812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4031848
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-17 02:47:57 +00:00
Frank Barchard
00950840d1 YUY2ToNV12 using YUY2ToY and YUY2ToNVUV
- Optimized YUY2ToNV12 that reduces it from 3 steps to 2 steps
  - Was SplitUV, memcpy Y, InterpolateUV
  - Now YUY2ToY, YUY2ToNVUV
- rollback LIBYUV_UNLIMITED_DATA

3840x2160 1000 iterations:

Pixel 2 Cortex A73
Was YUY2ToNV12_Opt (6515 ms)
Now YUY2ToNV12_Opt (3350 ms)

AB7 Mediatek P35 Cortex A53
Was YUY2ToNV12_Opt (6435 ms)
Now YUY2ToNV12_Opt (3301 ms)

Skylake AVX2 x64
Was YUY2ToNV12_Opt (1872 ms)
Now YUY2ToNV12_Opt (1657 ms)

SSE2 x64
Was YUY2ToNV12_Opt (2008 ms)
Now YUY2ToNV12_Opt (1691 ms)

Windows Skylake AVX2 32 bit x86
Was YUY2ToNV12_Opt (2161 ms)
Now YUY2ToNV12_Opt (1628 ms)

Bug: libyuv:943
Change-Id: I6c2ba2ae765413426baf770b837de114f808f6d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3929843
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-30 22:41:21 +00:00
Frank Barchard
f9fda6e7d8 Fix shift amount for SSSE3 assembly for I012 format conversions
Bug: libyuv:938, libyuv:942
Change-Id: I6fb6e7e17fa941785e398bc630f465baf72fcabd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906091
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 23:07:53 +00:00
Frank Barchard
8fc02134c8 10/12 bit YUV replicate upper bits to low bits before converting to RGB
- shift high bits of 10 and 12 bit into lower bits

Bug: libyuv:941, libyuv:942,
Change-Id: I14381dbf226ef27dcce06893ea88860835639baa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906085
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 20:56:43 +00:00
Frank Barchard
248172e2ba I422ToRGB24, I422ToRAW, I422ToRGB24MatrixFilter conversion functions added.
- YUV to RGB use linear for first and last row.
- add assert(yuvconstants)
- rename pointers to match row functions.
- use macros that match row functions.
- use 12 bit upsampler for conversions of 10 and 12 bits

Cortex A53 AArch32
I420ToRGB24_Opt (3627 ms)
I422ToRGB24_Opt (4099 ms)
I444ToRGB24_Opt (4186 ms)
I420ToRGB24Filter_Opt (5451 ms)
I422ToRGB24Filter_Opt (5430 ms)

AVX2
Was I420ToRGB24Filter_Opt (583 ms)
Now I420ToRGB24Filter_Opt (560 ms)

Neon Cortex A7
Was I420ToRGB24Filter_Opt (5447 ms)
Now I420ToRGB24Filter_Opt (5439 ms)

Bug: libyuv:938


Change-Id: I1731f2dd591073ae11a756f06574103ba0f803c7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906082
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-20 02:00:52 +00:00
Frank Barchard
f71c83552d I420ToRGB24MatrixFilter function added
- Implemented as 3 steps: Upsample UV to 4:4:4, I444ToARGB, ARGBToRGB24
- Fix some build warnings for missing prototypes.

Pixel 4
I420ToRGB24_Opt (743 ms)
I420ToRGB24Filter_Opt (1331 ms)

Windows with skylake xeon:
x86 32 bit
I420ToRGB24_Opt (387 ms)
I420ToRGB24Filter_Opt (571 ms)
x64 64 bit
I420ToRGB24_Opt (384 ms)
I420ToRGB24Filter_Opt (582 ms)


Bug: libyuv:938, libyuv:830
Change-Id: Ie27f70816ec084437014f8a1c630ae011ee2348c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3900298
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-16 19:46:47 +00:00
Frank Barchard
3e38ce5058 SSE2 MM21->YUY2 conversion
Add SSE2 optimization for MM21ToYUY2 conversion.

Bug: b/238137982
Change-Id: I189f712514308322f651b082b496bce9c015c4ee
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832525
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-08-17 18:39:05 +00:00
Frank Barchard
65e7c9d570 MM21ToYUY2 and ABGRToJ420 conversion
MM21 to YUY2 use zip1 for performance

Cortex A510
Was MM21ToYUY2 (612 ms)
Now MM21ToYUY2 (573 ms)

Prefetches help Cortex A53
Was MM21ToYUY2 (4998 ms)
Now MM21ToYUY2 (1900 ms)

Pixel 4 Cortex A76
Was MM21ToYUY2 (215 ms)
Now MM21ToYUY2 (173 ms)

ABGRToJ420
- NEON, SSSE3 and AVX2 row functions
- J400, J420 and J422 formats.
- Added AVX2 for UV on ARGBToJ420.  Was SSSE3

Same code/performance as ARGBToJ420 but with constants re-ordered.
Pixel 4
ABGRToJ420_Opt (623 ms)
ABGRToJ422_Opt (702 ms)
ABGRToJ400_Opt (238 ms)

Skylake Xeon
With LIBYUV_BIT_EXACT which uses C for UV
ABGRToJ420_Opt (988 ms)
ABGRToJ422_Opt (1872 ms)
ABGRToJ400_Opt (186 ms)
Skylake Xeon using AVX2
ABGRToJ420_Opt (251 ms)
ABGRToJ422_Opt (245 ms)
ABGRToJ400_Opt (184 ms)
Skylake Xeon using SSSE3
ABGRToJ420_Opt (328 ms)
ABGRToJ422_Opt (362 ms)
ABGRToJ400_Opt (185 ms)

Bug: b/238137982
Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-16 22:07:38 +00:00
Frank Barchard
6900494d90 Merge/SplitRGB fix -mcmodel=large x86 and InterpolateRow_16To8_NEON
MergeRGB and SplitRGB use a register to point to 9 shuffle tables.

- fixes an out of registers error with -mcmodel=large

InterpolateRow_16To8_NEON improves performance for I210ToI420:

On Pixel 4 for 720p x1000 images
Was I210ToI420_Opt (608 ms)
Now I210ToI420_Opt (336 ms)

On Skylake Xeon
Was I210ToI420_Opt (259 ms)
Now I210ToI420_Opt (209 ms)


Bug: libyuv:931, libyuv:930
Change-Id: I20f8244803f06da511299bf1a2ffc7945eb35221
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717054
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-06-29 00:00:46 +00:00
Justin Green
b4ddbaf549 Add support for MM21.
Add support for MM21 to NV12 and I420 conversion, and add SIMD
optimizations for arm, aarch64, SSE2, and SSSE3 machines.

Bug: libyuv:915, b/215425056
Change-Id: Iecb0c33287f35766a6169d4adf3b7397f1ba8b5d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3433269
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Justin Green <greenjustin@google.com>
2022-02-03 17:01:49 +00:00
Frank Barchard
90ffd5cba9 I420ToARGB for AVX512
On Skylake Xeon
AVX512  I420ToARGB_Opt (2050 ms)
AVX2    I420ToARGB_Opt (2533 ms)
SSSE3   I420ToARGB_Opt (3688 ms)

Bug: libyuv:911
Change-Id: I2214cc15dec24b06541895ca59d88990edbb2216
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3382100
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-14 09:17:33 +00:00
Frank Barchard
78625492cb InterpolateRow_AVX2 use AVX2 instead of ERMS for 100%
Bug: b/210066781
Change-Id: I709e403f03bd6b9f8fe693b165b242b784076fe0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329072
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-15 03:54:18 +00:00
Frank Barchard
fdc71956bd InterpolateRow_AVX2 - extend width count to 64 bits
Bug: b/210066781
Change-Id: Ib9052d8edfce29b95ca02a6f7254d3ff35d2b64d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329070
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-10 04:11:18 +00:00
Frank Barchard
d7a2d5da87 J400ToARGB optimized for Exynos using ZIP+ST1
Bug: 204562143
Change-Id: I56c98198c02bd0dd1283f1c14837730c92832c39
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3328702
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-10 01:00:07 +00:00
Frank Barchard
000806f373 NV21ToYUV24 replace ST3 with ST1. ARGBToAR64 replace ST2 with ST1
On Samsung S8 Exynos M2
Was ST3 NV21ToYUV24_Opt (769 ms)
Now ST1 NV21ToYUV24_Opt (473 ms)
Was ST2 ARGBToAR64_Opt (1759 ms)
Now ST1 ARGBToAR64_Opt (987 ms)

Skylake Xeon, AVX2 version:
Was NV21ToYUV24_Opt (885 ms)
Now NV21ToYUV24_Opt (194 ms)

Bug: b/204562143, b/124413599
Change-Id: Icc9cb64d822cd11937789a4e04fbb773b3e33aa3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3290664
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-11-24 07:38:49 +00:00
Frank Barchard
55b97cb48f BIT_EXACT for unattenuate and attenuate.
- reenable Intel SIMD unaffected by BIT_EXACT
- add bit exact version of ARGBAttenuate, which uses ARM version of formula.
- add bit exact version of ARGBUnatenuate, which mimics the AVX code.

Apply clang format to cleanup code.

Bug: libyuv:908, b/202888439
Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-10-15 19:46:02 +00:00
Frank Barchard
49ebc996aa Make 2 step transitive tests measure 2 step time.
Add tests of all macros used by libyuv public headers

When a 1 step conversion is added, a 2 step test can compare
the old 2 step method to the 1 step.  A 1 step unittest is
also added which compares C to SIMD.  Making the 2 step
conversions measure performance of the 2 steps allows the
old 2 step performance to be compared to 1 step.

All macros used in public headers are added to an ifdef test.
Showing them in a unittest allows some diagnostics when
a test is failing.

Bug: libyuv:901
Change-Id: I7ffa6ed0cb3b506fa1b7fd4b7b1b729658c3c266
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2857916
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-30 18:14:57 +00:00
Yuan Tong
c9843de02a Optimize unlimited data for Intel
Use unsigned coefficient and signed UV value in YUVTORGB.

R=fbarchard@chromium.org

Bug: libyuv:862, libyuv:863
Change-Id: I32e58b2cee383fb98104c055beb0867a7ad05bfe
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2850016
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-27 20:35:27 +00:00
Frank Barchard
5e05f26a2b Switch win32 to row_gcc for clangcl.
Bug: libyuv:900, libyuv:848, b/178283356, b/185922513
Change-Id: I7697953753391c555a778198db36412c853fb29e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2844962
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2021-04-22 19:32:32 +00:00
Yuan Tong
a1814576bf Unlimited data for Intel
Use unsigned coefficients on Intel.
Make C, NEON and AVX2 match under LIBYUV_UNLIMITED_DATA.

Bug: libyuv:862, libyuv:863
Change-Id: I6c02147ea3c1875c4fc23863435aea86dcf5880a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2830180
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-04-19 20:29:10 +00:00
Frank Barchard
287158925b use width + 1 for odd width tests
Bug: libyuv:894, libyuv:898, libyuv:899
Change-Id: Ieba8eaeb8b06f0323824967776673e339b263220
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2809701
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-04-09 20:17:55 +00:00
Frank Barchard
d1bfc6ead6 gcc fix for row_gcc.cc vbroadcastss
Bug: libyuv:893
Change-Id: I5b70e6a94356878deb348cbd19c9e1e50b2a18aa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2808793
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-06 21:31:29 +00:00
Frank Barchard
60db98b6fa clang-tidy applied
Bug: libyuv:886, libyuv:889
Change-Id: I2d14d03c19402381256d3c6d988e0b7307bdffd8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2800147
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-04-01 21:42:47 +00:00
Yuan Tong
8a13626e42 Add MergeAR30Plane, MergeAR64Plane, MergeARGB16To8Plane
These functions merge high bit depth planar RGB pixels into packed format.

Change-Id: I506935a164b069e6b2fed8bf152cb874310c0916
Bug: libyuv:886, libyuv:889
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-03-31 20:46:02 +00:00
Frank Barchard
312c02a5aa Fixes for SplitUVPlane_16 and MergeUVPlane_16
Planar functions pass depth instead of scale factor.
Row functions pass shift instead of depth.  Add assert to C.
AVX shift instruction expects a single shift value in XMM.
Neon pass shift as input (not output).
Split Neon reimplemented as left shift on shorts by negative to achieve right shift.
Add planar unitests

Bug: libyuv:888
Change-Id: I8fe62d3d777effc5321c361cd595c58b7f93807e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2782086
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-03-24 21:37:10 +00:00
Frank Barchard
d8f1bfc981 Add RAWToJ420
Add J420 output from RAW.
Optimize RGB24 and RAW To J420 on ARM by using NEON for the 2 step conversion.

Also fix sign-compare warning that was breaking Windows build

Bug: libyuv:887, b/183534734
Change-Id: I8c39334552dc0b28414e638708db413d6adf8d6e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2783382
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-03-23 23:45:54 +00:00
Yuan Tong
f37014fcff Add support for AR64 format
Add following conversions:
ARGB,ABGR <-> AR64,AB64
AR64 <-> AB64

R=fbarchard@chromium.org

Change-Id: I5ca5b40a98bffea11981e136afae4a511ba6c564
Bug: libyuv:886
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2746780
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-03-13 20:55:21 +00:00
Frank Barchard
ba033a11e3 Add 12 bit YUV to 10 bit RGB
Bug: libyuv:843
Change-Id: I0104c8fcaeed09e83d2fd654c6a5e7d41bcb74cf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2727775
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-03-05 01:09:37 +00:00
Yuan Tong
cdabad5bfa Add more 10 bit YUV To RGB function
The following functions are added:
planar YUV:
 I410ToAR30, I410ToARGB
planar YUVA:
 I010AlphaToARGB, I210AlphaToARGB, I410AlphaToARGB
biplanar YUV:
 P010ToARGB, P210ToARGB
 P010ToAR30, P210ToAR30

biplanar functions can also handle 12 bit and 16 bit samples.

libyuv_unittest --gtest_filter=LibYUVConvertTest.*10*ToA*:LibYUVConvertTest.*P?1?ToA*

R=fbarchard@chromium.org

Bug: libyuv:751, libyuv:844
Change-Id: I2be02244dfa23335e1e7bc241fb0613990208de5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2707003
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-03 15:48:47 +00:00
Yuan Tong
a8c181050c Add 10/12 bit YUV To YUV functions
The following functions (and their 12 bit variant) are added:

planar, 10->10:
 I410ToI010, I210ToI010

planar, 10->8:
 I410ToI444, I210ToI422

planar<->biplanar, 10->10:
 I010ToP010, I210ToP210, I410ToP410
 P010ToI010, P210ToI210, P410ToI410

R=fbarchard@chromium.org

Change-Id: I9aa2bafa0d6a6e1e38ce4e20cbb437e10f9b0158
Bug: libyuv:834, libyuv:873
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2709822
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-02-25 23:16:54 +00:00
Frank Barchard
d768774299 add yuvconvstants util
miscellaneous cleanup of other code/comments

Bug: libyuv:873, libyuv:877
Change-Id: I0d8caf9a65908ff8898b25494f7c724775f84fa3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2692930
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-12 19:45:16 +00:00
Yuan Tong
d4ecb70610 Add P010ToP410 and P210ToP410
These are 16 bit bi-planar convert functions to scale UV plane to
Y plane's size using (bi)linear filter.

libyuv_unittest --gtest_filter=*ToP41*

R=fbarchard@chromium.org

Bug: libyuv:872
Change-Id: I3cb4fafe2b2c9eedd0d91cf4c619abb9ee107bc1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2690102
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-12 14:55:24 +00:00
Frank Barchard
12a4a2372c Rounding added to scaling upsampler
Bug: libyuv:872, b/178521093
Change-Id: I86749f73f5e55d5fd8b87ea6938084cbacb1cda7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2686945
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-02-10 18:51:02 +00:00
Frank Barchard
c28d404936 win32 build fix for I422ToRGBA
Bug:  libyuv:877, b/178713286
Change-Id: Iad55df99083b9a4bb9306e052e0e687e58570d96
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2657701
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-29 10:07:08 +00:00
Frank Barchard
39240f7149 Fix in row_gcc.cc to change subq to sub
subq is only available for x64
sub works for both 32 bit x86 and 64 bit x64

Fox in row_gcc.cc for 32 bit x86 running out of registers.

Fix in row_neon.cc for split function argb paramter name.

Bug: libyuv:877, b/178283356, b/178713286
Change-Id: If2b12a2d6168eab08005a2cdf2c17a470a924dd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2656771
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-01-28 19:34:29 +00:00
Yuan Tong
a85cc26fde Add MergeARGBPlane and SplitARGBPlane
These functions convert between planar and interleaved ARGB,
optionally fill 255 to alpha / discard alpha.

This can help handle YUV(A) with Identity matrix, which is
basically planar ARGB.

libyuv_unittest --gtest_filter=LibYUVPlanarTest.*ARGBPlane*:LibYUVPlanarTest.*XRGBPlane*

R=fbarchard@google.com

Change-Id: I522a189b434f490ba1723ce51317727e7c5eb112
Bug: libyuv:877
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2649887
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-27 19:33:51 +00:00
Frank Barchard
37480f12c6 Add BT.709 Full Range yuv constants.
MAKEYUVCONSTANTS macro to generate struct for YUV to RGB
Fix I444AlphaToARGB unit test for ARM by adjusting C version to match Neon implementation.

Bug: libyuv:879, libyuv:878, libyuv:877, libyuv:862, b/178283356
Change-Id: Iedb171fbf668316e7d45ab9e3481de6205ed31e2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2646472
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-01-26 18:36:56 +00:00
Yuan Tong
08d0dce5fc Add I422AlphaToARGB and I444AlphaToARGB
Bug: libyuv:878
Change-Id: I64c314326ac7ae5242acc64e20016e30adc6d17f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2639439
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-23 00:40:33 +00:00
Frank Barchard
b7a1c5ee5d Scale by even factor low level row function
Bug: b/171884264
Change-Id: I6a94bde0aa05e681bb4590ea8beec33a61ddbfc9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2518361
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-11-03 21:25:18 +00:00
Frank Barchard
385418a8e2 I420ToARGB prototype added to convert_from.h
Duplicate I420ToARGB prototype from convert_argb.h into convert_from.h for webrtc
Apply clang format for white spacing consistency.

Bug: libyuv:838, b/151375918
Change-Id: I0f667ca5350192710dbb135e92e73e18b46135e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2446613
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-02 21:05:10 +00:00
Frank Barchard
7a52fde1c4 NV12Scale function using split/merge on UV channal
Bug: libyuv:718, libyuv:838, b/168918847
Change-Id: I78b27baac50f0ce955e00cb6aaf7dfe5a0cb1e3d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2432067
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-09-28 20:13:21 +00:00
Frank Barchard
1837f0022e Rollback of ARGBAttentuate
ARGBAttenuate AVX2 different than NEON/C

Was
C     ARGBAttenuate_Opt (1151 ms)
SSSE3 ARGBAttenuate_Opt (455 ms)
AVX2  ARGBAttenuate_Opt (296 ms)

Now
C     ARGBAttenuate_Opt (1765 ms)
SSSE3 ARGBAttenuate_Opt (355 ms)
AVX2  ARGBAttenuate_Opt (299 ms)

BUG=b/153564664

Change-Id: I2f027339552e399b90cc5ffeffde4255e9ff175b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2294488
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2020-07-13 21:55:13 +00:00
Frank Barchard
8869628c24 Remove unnecessary include of convert_argb
Bug: libyuv:861, b/156642185
Change-Id: I3ddbe2f7b61629ed18b6879203203a51b3700773
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2219047
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-05-28 18:58:37 +00:00
Frank Barchard
94af5319f4 Remove M420 and refactor NV12ToI420
M420 is a row biplanar variation of NV12 supported on Microsoft webcams.
The code was hardcoded to bt.601 and should be jpeg, but the format is
very old and rare.  Is a variation on NV12, so if someone needs it, it
can be re-implemented easily.

Bug: libyuv:858
Change-Id: I246167dba3c190cc76af741b8e91e58e68fde28f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2212608
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-05-26 18:48:00 +00:00
Frank Barchard
da41bca02b I400ToARGBMatrix Pass a color matrix to use different coefficients
32 bit
Neon I400ToARGB_Opt (1937 ms)
64 bit
C I400ToARGB_Opt (8957 ms)
NEON I400ToARGB_Opt (2147 ms)

x86
cI400ToARGB_Opt (1110 ms)
AVX2 I400ToARGB_Opt (213 ms)
SSE2 I400ToARGB_Opt (225 ms)

Bug: libyuv:861, b/156642185
Change-Id: I96b6f4ebba6ff9c4ed8803291ce098de6f93fa4f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2209718
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-20 20:33:12 +00:00
Frank Barchard
d426247a3b YUV to RGB Matrix functions for color space support
Make all Matrix versions of conversions public.

Bug: libyuv:861, b/156642185
Change-Id: Ida067c95dd041b612e2bab64dbface58b257038a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2202748
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
2020-05-19 16:59:29 +00:00
Frank Barchard
84da59c168 ARGBAttenuate AVX2 rewritten to match NEON/C code
Bug: 665
Change-Id: If26fb389dabbca870a0e720f5258d6c9b2cde156
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2196904
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-13 03:58:10 +00:00
Frank Barchard
7a61759f78 NV12Mirror and MirrorUVPlane functions added
HalfMergeUV AVX2 version

Skylake Xeon performance for 1280x720
NV12Mirror_Any (109 ms)
NV12Mirror_Unaligned (113 ms)
NV12Mirror_Invert (107 ms)
NV12Mirror_Opt (108 ms)
NV12Mirror_NullY (19 ms)

Slightly faster than comparable I420Mirror
I420Mirror_Any (113 ms)
I420Mirror_Unaligned (110 ms)
I420Mirror_Invert (109 ms)
I420Mirror_Opt (110 ms)

BUG=libyuv:840, libyuv:858

Change-Id: I686b1b778383bfa10ecd1655e986bdc99e76d132
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2176066
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-04 22:32:14 +00:00
Frank Barchard
2f48ffd42b HalfMergeUVPlane function and optimized I444ToNV12 and I444ToNV21
Bug: libyuv:858
Change-Id: Ie1f03a9acaff02ee8059cf1e5c2c2e5afcde8592
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2154608
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-04-17 19:22:29 +00:00
Frank Barchard
d4c3f45eb6 libyuv r1749 upstream for I444ToNV12
Bug: libyuv:858
Change-Id: Iacf70938ace6258e5bbd397cd78414f1025474c5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2154331
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-17 09:16:46 +00:00
Frank Barchard
7e05059557 Apply clang format to libyuv source
Bug: None
Change-Id: Ifd16b59d7f0dbf4402dd5741bb89d1ec06dfaac8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2131868
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Hsiu Wang <hsiu@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-01 18:07:34 +00:00
Frank Barchard
aabcc477bd RGB24Mirror function
Bug: b/151960427
Change-Id: I413db0011a4ed87eefc0dd166bb8e076b5aa4b1d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2116639
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-03-24 20:13:08 +00:00
Frank Barchard
b5e223ac4c Upstream all libyuv changes to version 1746 Prefetch for all arm functions - helps performance at higher resolutions Make MirrorPlane function public.
Bug: libyuv:855
Change-Id: I4020face6b52767ee78d81870314285d63e98b95
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2113650
Reviewed-by: Hsiu Wang <hsiu@google.com>
2020-03-21 20:19:44 +00:00
Frank Barchard
d82f4baf5f Upstream minor changes. Faster tests, Faster YUV Rotate180 and Mirror
Bug: libyuv:840, libyuv:849: b/144318948
Change-Id: I303c02ac2b838a09d3e623df7a69ffc085fe3cd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1914781
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-13 20:02:40 +00:00
Frank Barchard
1f12946068 Add U444ToABGR, J444ToABGR, H444ToABGR, H444ToARGB and ConvertToARGB support
BUG=960620, libyuv:845, b/129864744

Change-Id: I9f80cda3be8e13298c596fac514f65a23a38d3d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1900310
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-05 22:11:20 +00:00
Frank Barchard
22f8aad8bc RAWToRGBA for 3 channel OCR
Replace ARM64 only row function with high level function
that implements SSSE3, 32 bit Neon and C.

Compared to 2 step RAWToARGB + ARGBToRGBA on row level:
3.1x faster on ARM
6.2% faster on Intel

BUG=b/140748379

Change-Id: Ia8636d9e4fcdbe10b8c2e81610a54728e29845cd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1860914
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-10-14 22:27:37 +00:00
Frank Barchard
fce0fed542 ARGBToY use 8 bit precision instead of 7 bit.
Neon and GCC Intel optimized, but win32 and mips not optimized.

BUG=libyuv:842, b/141482243

Change-Id: Ia56fa85c8cc1db51f374bd0c89b56d21ec94afa7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1825642
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2019-10-07 23:01:10 +00:00
Frank Barchard
0bb2773a39 AVX2 versions of ABGRToNV12 and ABGRToNV21
BUG=libyuv:833

Change-Id: I9b6653e9c304b4e0805b7d3c8408ce57009c8559
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1740682
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-08-07 18:16:34 +00:00
Frank Barchard
9b63884a3e Add ABGRToNV21 and ABGRToNV12
Fix ARGBToUVJRow_AVX2 constants for win32

BUG=libyuv:833, libyuv:839

Change-Id: Id4731a573d40d7a9b46fcc31c2fee295483e1ff6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1739509
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
2019-08-07 01:29:13 +00:00
Frank Barchard
fec9121b67 SwapUV AVX2 and SSSE3
Based on ARGBShuffle but with count adjusted and new shuffle mask

BUG=libyuv:809

Change-Id: Idd936ee6bedcf285607a68c2fc54d876b4becc01
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1711882
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-26 18:41:40 +00:00
Frank Barchard
f9aacffa02 Fix arm unittest failure by removing unused FloatDivToByteRow.
Apply clang-format to fix jpeg if() for lint fix.
Change comments about 4th pixel for open source compliance.
Rename UVToVU to SwapUV for consistency with MergeUV.

BUG=b/135532289, b/136515133

Change-Id: I9ce377c57b1d4d8f8b373c4cb44cd3f836300f79
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1685936
Reviewed-by: Chong Zhang <chz@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-02 20:00:30 +00:00
Frank Barchard
413a8d8041 Add AYUVToNV12 and NV21ToNV12
BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*ToNV12* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1

R=rrwinterton@gmail.com

Change-Id: Id03b4613211fb6a6e163d10daa7c692fe31e36d8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1560080
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2019-04-12 17:48:45 +00:00
Frank Barchard
5b6042fa0d add YUV24 and AYUV formats
Alternatives to RGB24 and AYUV for working with GPU.

BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*NV21To???24* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
R=rrwinterton@gmail.com

Change-Id: I5559c63f4bd4c847492fcb1571f7b03c58146689
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1501735
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-03-05 02:53:56 +00:00
Frank Barchard
a9626b9daf Disable AVX512 for iOS simulator xcode 9 builds.
iOS simulator has the option to build with xcode instead of clang.
GN use_xcode_clang=true enables the xcode build.
As of version Xcode 9.2, the clang version used does not support
AVX512.  The version reported is version 9, but for normal clang,
version 7 is sufficient to AVX512.
When a version of XCode does support AVX512, the version check can
be updated to allow AVX512 for newer versions of XCode.
with XCode 9.2 the following macro is set.
__APPLE_CC__ 6000

Bug: libyuv:789
Test: gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"x86\" use_xcode_clang=true"
Change-Id: I5a9a0b4a2760c7d09a4bcb464b3668979113b07e
Reviewed-on: https://chromium-review.googlesource.com/991595
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-04-03 18:45:14 +00:00
Frank Barchard
816b7b1279 Add __attribute__ ((__target__ ("avx512vbmi")))
Bug: libyuv:789
Test: builds locally on linux with clang
Change-Id: I3000494d4b0b18f59d7852bc1bc0c9e422d2d63a
Reviewed-on: https://chromium-review.googlesource.com/987331
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-30 17:17:35 +00:00
Frank Barchard
548ec65656 Require clang 6 for AVX512 support
row.h adds CLANG_HAS_AVX512
function ifdefs in row.h for avx512
source code ifdefed function by function for
avx512 and avx2.

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: If32b51459685d0d5785c5c1e94c8f668f8e74b55
Reviewed-on: https://chromium-review.googlesource.com/982402
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-28 02:38:39 +00:00
Frank Barchard
83aa7512c1 AVX512 VMBI version of ARGBToRGB24
Use VMBI instructions but on AVX2 registers to avoid clockrate change.

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: Id4f8ad1e0e142a380c8a46c5eab90ce145a10edd
Reviewed-on: https://chromium-review.googlesource.com/956609
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-10 02:04:48 +00:00
Frank Barchard
1d509f2178 ARGBToRGB24_AVX2 version
AVX2 port of SSSE3 conversion to output 24 bit RGB

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I14f7815522d1b790ecd2bb39d9a3441e803b694a
Reviewed-on: https://chromium-review.googlesource.com/953303
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-08 02:38:21 +00:00
Frank Barchard
e1f6c1c0b5 tidy applied with readability-inconsistent-declaration-parameter-name
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281
Reviewed-on: https://chromium-review.googlesource.com/902170
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2018-02-07 00:24:25 +00:00
Frank Barchard
5790a765b9 I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-02 23:57:35 +00:00
Frank Barchard
7ff53f324c I422ToYUY2Row_AVX2 use vpmovzxbd instead of vpermq
I422ToYUY2Row_AVX2 optimized from 7 cycles per 32 pixels to 6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
lea        0x10(%1),%1
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

I422ToYUY2Row_AVX2 optimization

Improve performance of AVX2 code by avoiding vpermq

Bug: libyuv:556
Test: /usr/local/google/home/fbarchard/iaca-lin64/bin/iaca.sh -reduceout -arch BDW out/Release/obj/libyuv_internal/row_gcc.o
Change-Id: Ie36732da23ecea1ffcc6b297bacc962780b59ef1
Reviewed-on: https://chromium-review.googlesource.com/898067
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-02 18:57:49 +00:00
Frank Barchard
664c735677 I420ToYUY2_AVX2 port
I420 and I422 To YUY2 and UYVY ported from SSE2 to AVX2.

Was SSE2
I420ToYUY2_Opt (135 ms)
I420ToUYVY_Opt (148 ms)
I422ToYUY2_Opt (145 ms)
I422ToUYVY_Opt (142 ms)

Now AVX2
I420ToYUY2_Opt (133 ms)
I420ToUYVY_Opt (130 ms)
I422ToYUY2_Opt (127 ms)
I422ToUYVY_Opt (137 ms)

Bug: libyuv:556
Test: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*I42?To*UY*Opt
Change-Id: Ic35f97cee02dc009fd98785589ba17c7cf50bb35
Reviewed-on: https://chromium-review.googlesource.com/892493
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-01 00:33:25 +00:00
Frank Barchard
ffec313dbe ABGRToAR30 used AVX2 with reversed shuffler
vpshufb is used to reverse R and B channels;
Code is otherwise the same as ARGBToAR30.

Bug: libyuv:751
Test: ABGRToAR30 unittest
Change-Id: I30e02925f5c729e4496c5963ba4ba4af16633b3b
Reviewed-on: https://chromium-review.googlesource.com/891807
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-29 22:31:31 +00:00
Frank Barchard
ed96b7b2c7 AVX2 port of H010ToAR30_AVX2
Was SSSE3 H010ToAR30_Opt (635 ms)
Now AVX2  H010ToAR30_Opt (448 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I17b1a0e3268c4a9836e09683dd3377fb1ce60932
Reviewed-on: https://chromium-review.googlesource.com/889906
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-27 00:14:27 +00:00
Frank Barchard
c95fd57993 AVX2 port of I010ToAR30_AVX2
Was SSSE3 I420ToAR30_Opt (635 ms)
Now AVX2  I420ToAR30_Opt (446 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I261be19ec981136a8f453ae0d3211532a790e5c5
Reviewed-on: https://chromium-review.googlesource.com/887750
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-26 02:12:07 +00:00
Frank Barchard
92e22cf5b6 Lint cleanup after C99 change CL
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: git cl lint
Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743
Reviewed-on: https://chromium-review.googlesource.com/882217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 19:16:03 +00:00
Frank Barchard
7e389884a1 Switch to C99 types
Append _t to all sized types.
uint64 becomes uint64_t etc

Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Frank Barchard
8af6ea4100 I420ToAR30 in 1 step SSSE3 assembly
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToAR30_Opt
Change-Id: Ie89c3eb2526354cf11175746bc8af72be83a1e00
Reviewed-on: https://chromium-review.googlesource.com/877541
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-23 01:33:10 +00:00
Frank Barchard
09db0c4ce2 H010ToAR30 in 1 step with SSSE3 assembly
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.

[ RUN      ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[       OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[       OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-19 19:46:58 +00:00
Frank Barchard
ecab5430c2 Remove MEMOPREG x64 NaCL macros
MEMOPREG macros are deprecated in row.h

Regular expressions to remove MEMOPREG macros:

MEMOPREG(movd, 0x00, [u_buf], [v_buf], 1, xmm1)                            \
MEMOPREG\((.*), (.*), (.*), (.*), (.*), (.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6           \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: If8743abd9af2e8c549d0c7d3d49733a9b0f0ca86
Reviewed-on: https://chromium-review.googlesource.com/865964
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-16 19:10:44 +00:00
Frank Barchard
b33e0f97e7 Remove MEMOPMEM x64 NaCL macros
MEMOPMEM macros are deprecated in row.h

Usage examples
    MEMOPMEM(vmovdqu,ymm0,0x00,0,1,1)          //  vmovdqu %%ymm0,(%0,%1)
    MEMOPMEM(movdqu,xmm2,0x00,1,0,1)

Regular expressions to remove MEMACCESS macros:

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    %%\2,\3(%\4,%\5,\6)\7 \\n"

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    %%\2,\3(%\4,%\5,\6)            \\n"

TBR=braveyao@chromium.org
Bug: libyuv:702
Test: try bots pass
Change-Id: Id8c6963d544d16e39bb6a9a0536babfb7f554b3a
Reviewed-on: https://chromium-review.googlesource.com/865934
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-13 01:33:21 +00:00
Frank Barchard
a875ed173d Remove VMEMOPREG x64 NaCL macros
VMEMOPREG macros are deprecated in row.h

Usage examples
    VMEMOPREG(vpavgb,0x00,0,4,1,ymm0,ymm0)     // vpavgb (%0,%4,1),%%ymm0,%%ymm0
    VMEMOPREG(vpavgb,0x20,0,4,1,ymm1,ymm1)

Regular expressions to remove MEMACCESS macros:

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6,%%\7      \\n"

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6,%%\7            \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: I472446606f7fd568fdf33aaacc22d5ed78673dab
Reviewed-on: https://chromium-review.googlesource.com/865640
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 22:54:24 +00:00
Frank Barchard
030042a2ff Remove VEXTOPMEM x64 NaCL macros
VEXTOPMEM macros are deprecated in row.h

Usage examples
    VEXTOPMEM(vextractf128,1,ymm0,0x0,1,2,1) // vextractf128 $1,%%ymm0,(%1,%2,1)

Regular expressions to remove MEMACCESS macros:

VEXTOPMEM\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1 $\2,%\3,\4(%\5,%\6,\7)        \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I177edf9813128408e74816672dd25abb03a5e1ca
Reviewed-on: https://chromium-review.googlesource.com/865283
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 21:16:34 +00:00
Frank Barchard
5088f00165 Remove MEMACCESS x64 NaCL macros
MEMACCESS macros are deprecated in row.h

Usage examples
    "movdqu    " MEMACCESS(0) ",%%xmm0         \n"
    "movdqu    " MEMACCESS2(0x10,0) ",%%xmm1   \n"

Regular expressions to remove MEMACCESS macros:

" MEMACCESS2\((.*),(.*)\) "(.*)\\n"
\1(%\2)\3              \\n"

" MEMACCESS\((.*)\) "(.*)\\n"
(%\1)\2            \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I42f62d5dede8ef2ea643e78c204371a7659d25e6
Reviewed-on: https://chromium-review.googlesource.com/862803
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 20:37:41 +00:00
Frank Barchard
e3797d1765 Remove MEMOPARG x64 NaCL macros
MEMOPARG macros are deprecated in row.h

  #opcode " " #offset "(%" #base ",%" #index "," #scale "),%" #arg "\n"

Usage examples
    MEMOPARG(movzwl,0x00,1,3,1,k2)             //  movzwl  (%1,%3,1),%k2

Regular expression to remove MEMACCESS macro:

MEMOPARG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1    \2(%\3,%\4,\5),%\6                \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I4a5ad2abf5017e651576f4c8c784be1c8dbf5a83
Reviewed-on: https://chromium-review.googlesource.com/863108
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-12 18:26:06 +00:00
Frank Barchard
3694891922 Remove MEMLEA x64 NaCL macros
Bug: libyuv:702
Test: try bots pass
Change-Id: I0ee094551734368f2179c298e7bf423ec80a929c
Reviewed-on: https://chromium-review.googlesource.com/857845
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 19:16:16 +00:00
Frank Barchard
a2142148e9 Remove x64 native_client macros.
Bug: libyuv:702
Test: try bots pass
Change-Id: I76d74b5f02fe9843418108b84742e2f714d1ab0a
Reviewed-on: https://chromium-review.googlesource.com/855656
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 01:27:22 +00:00
Frank Barchard
00d526d4ea H010ToARGB_AVX2 optimized conversion
AVX2 optimized 10 bit YUV to ARGB.

Bug: libyuv:751
Test: H010ToARGB unittest
Change-Id: I705630beb62714b52042c2a5dcdb8b7859e734ae
Reviewed-on: https://chromium-review.googlesource.com/852563
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-09 03:17:33 +00:00
Frank Barchard
55310f92bc Remove NACL_R14 macro
Bug: libyuv:702
Test: try bots still build
Change-Id: I05317e45c885955fcda233bdddbd11ce1d246d90
Reviewed-on: https://chromium-review.googlesource.com/854770
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-08 22:41:15 +00:00
Frank Barchard
9d2cd6a3ef H010ToAR30 optimized to 2 step conversion
Previously H010ToAR30 was done in a 3 step conversion:
H010ToH420, H420ToARGB, ARGBToAR30.
This CL merges the first 2 steps into H010ToARGB, to
improve performance.
Caveat - only 10 bit YUV is supported at this time.
Previously the low level code supported different numbers
of bits - 9, 10, 12 or 16.

Was 3 step conversion:
LibYUVConvertTest.H010ToAR30_Any (1263 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
LibYUVConvertTest.H010ToAR30_Invert (913 ms)
LibYUVConvertTest.H010ToAR30_Opt (901 ms)

Now 2 step conversion:
LibYUVConvertTest.H010ToAR30_Any (853 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
LibYUVConvertTest.H010ToAR30_Invert (781 ms)
LibYUVConvertTest.H010ToAR30_Opt (755 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
Reviewed-on: https://chromium-review.googlesource.com/853288
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-07 08:36:57 +00:00
Frank Barchard
a64658593e I210ToARGB conversion from 10 bit YUV to RGB
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.

Bug: libyuv:751
Test:  I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-05 02:43:38 +00:00
Frank Barchard
140fc0a261 Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.

Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-02 18:57:39 +00:00
Frank Barchard
768f103b8b Convert8To16 for better H010 support
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.

Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-28 22:27:24 +00:00
Frank Barchard
c67db60534 HalfFloat_SSE2 use movd from memory
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned

Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-21 19:37:50 +00:00
Frank Barchard
5336217f11 H010Copy function to copy 16 bit planar formats
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToH010_Opt
Change-Id: I996d309040a14193a97d05b62ac0b3e1ad1ee74b
Reviewed-on: https://chromium-review.googlesource.com/823445
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-15 03:34:34 +00:00
Frank Barchard
bb3180ae80 Add I420ToAR30 10 bit RGB
For more complete support of AR30 format, add I420ToAR30 allowing
the new RGB 10 bit format to be used from standard 8 bit I420 format.

Bug: libyuv:751
Test: I420ToAR30 unittest added
Change-Id: Ia8b0857447408bd6adab485158ce5f38d6dc2faa
Reviewed-on: https://chromium-review.googlesource.com/823243
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 23:40:58 +00:00
Frank Barchard
c367751430 ARGBToAR30 SSSE3 use pmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 20:12:58 +00:00