1997 Commits

Author SHA1 Message Date
Frank Barchard
c994782086 Enable RVV if qemu is detected
- include a fix for jpeg unittests to do at least 1 iteration
- include a fix for scale uv to only use linearup2 if filter is linear

Tested on qemu with Intel host:
[ RUN      ] LibYUVBaseTest.TestCpuHas
Cpu Flags 805306369
Has RISCV 268435456
Has RVV 536870912
Has RVVZVFH 0
Has X86 0

Bug: libyuv:956, libyuv:959, libyuv:960
Change-Id: I4a1b66f83d82ba127780f52526153d586db90111
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4429570
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Randall Bosetti <rlb@google.com>
2023-04-18 20:29:04 +00:00
Darren Hsieh
44396e6e9a Add ARGBToRAWRow_RVV, ARGBToRGB24Row_RVV, RGB24ToARGBRow_RVV
* Run on SiFive internal FPGA:

ARGBToRAW_Opt (~1.55x vs scalar)

ARGBToRGB24_Opt (~1.44x vs scalar)

RGB24ToARGB_Opt (~1.77x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Bug: libyuv:956

Change-Id: I26722f6848cd68684d95d9a7ee06ce0416e7985d
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4413083
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-13 19:33:16 +00:00
Frank Barchard
68659d0d68 UVScale down by 2 fix for C and optimize for NEON
- update cpu_id to use "re" for fopen to avoid leaking handles if a thread is started while the file is open.

Bug: libyuv:958
Change-Id: I1af9de68fce12e440e1226fc8070634ccb1bf090
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4417176
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-12 22:49:20 +00:00
Frank Barchard
ee3e71c7ce Any functions use memset(vin, 0, sizeof(vin)) for GCC warning fix
- Fix -Wmemset-elt-size warning for GCC
- Use vin for inputs and vout for outputs

Bug: None
Change-Id: Iefd418dc884b4d062e1fdd9215319c8838c49eaa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4412065
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2023-04-10 20:44:10 +00:00
Darren Hsieh
724e7aee03 Fix macro define typo in scale_uv.cc
The correct define can be found in scale_row.h

Change-Id: I633ed47006c7bd8014038493005c2d934489ff18
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4411353
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-10 16:55:48 +00:00
James Zern
0200037a5a row_any,ANYDETILE: fix -Wmemset-elt-size warning
under gcc 12.2.0 using -Wall:

source/row_any.cc: In function ‘void libyuv::DetileRow_16_Any_SSE2(const
                       uint16_t*, ptrdiff_t, uint16_t*, int)’:
source/row_any.cc:2287:11: warning: ‘memset’ used with length equal to
number of elements without multiplication by element size
[-Wmemset-elt-size]
 2287 |     memset(temp, 0, 16 * BPP); /* for msan */
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
source/row_any.cc:2308:1: note: in expansion of macro ‘ANYDETILE’
 2308 | ANYDETILE(DetileRow_16_Any_SSE2, DetileRow_16_SSE2, uint16_t, 2, 15)

This increases the memset to the full buffer size, which may not be
strictly necessary.

Change-Id: Iea2fc649990ee84ea9aa8020d6f6b25e012b18fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4406599
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-08 19:01:02 +00:00
Darren Hsieh
e8af6cb2e4 Add RAWToARGBRow_RVV,RAWToRGBARow_RVV,RAWToRGB24Row_RVV
* Run on SiFive internal FPGA:

RAWToARGB_Opt (~2x vs scalar)

RAWToRGBA_Opt (~2x vs scalar)

RAWToRGB24_Opt (~1.5x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Change-Id: I21a13d646589ea2aa3822cb9225f5191068c285b
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408357
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-07 18:45:08 +00:00
Darren Hsieh
aa47d668d8 Add riscv cpu info detection.
* Supports:

 * The standard single-letter Vector detection.

 * Vector fp16 detection.

Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Change-Id: Ia7ee1bd8ec1a990f1b2b1700805942e99c0aa87b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4401738
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-06 15:58:29 +00:00
Wan-Teh Chang
ec48e4328e Add assertions for the Clang static analyzer
The Clang static analyzer (scan-build) in LLVM 14 warns about
array index out of bounds in scaletbl[boxwidth - minboxwidth] in
ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two
elements. It's not clear the index boxwidth - minboxwidth is either 0 or
1.

Change-Id: I072476e86950154beffe6b1a89915755118b3cbd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2023-04-05 21:58:22 +00:00
Frank Barchard
464c51a035 AArch32 YUVTORGB_SETUP use load and dup to avoid modifying pointer
- Allows code to be optimized with clang 17 -flto-thin
- Bump version number to 1864 to allow detection of fix
- Apply clang format to standardize formatting; No impact on code generated

Bug: chromium:1424089
Change-Id: Ib745836b27915a5e4cb1d7d928ee52659360612b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370052
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2023-03-24 19:32:30 +00:00
Frank Barchard
1a971f8cc3 clang 17 -flto-thin bug fix for Neon YUVtoRGB and ARGBToRGB565Dither
- YUV to RGB AArch32 kRGBCoeffBias rewind pointer
- ARGBToRGB565Dither declare width and source pointers as modified


Bug: chromium:1424089
Change-Id: I987180652331bab16ce27d8d166399a687ee890e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370099
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-03-24 10:59:40 +00:00
Frank Barchard
3f219a3501 GCC warning fix for MT2T
- Fix redundent assignment compile warning in GCC
- Apply clang-format
- Bump version to 1863

Bug: libyuv:955
Change-Id: If2b6588cd5a7f068a1745fe7763e90caa7277101
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4344729
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2023-03-16 06:57:20 +00:00
Justin Green
76468711d5 M2T2 Unpack fixes
Fix the algorithm for unpacking the lower 2 bits of M2T2 pixels.

Bug: b:258474032
Change-Id: Iea1d63f26e3f127a70ead26bc04ea3d939e793e3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4337978
Commit-Queue: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-03-14 14:59:26 +00:00
Frank Barchard
f9b23b9cc0 Transpose 4x4 for SSE2 and AVX2
Skylake Xeon
AVX2 Transpose4x4_Opt (290 ms)
SSE2 Transpose4x4_Opt (302 ms)
C    Transpose4x4_Opt (522 ms)

AMD Zen2
AVX2 Transpose4x4_Opt (136 ms)
SSE2 Transpose4x4_Opt (137 ms)
C    Transpose4x4_Opt (431 ms)

Bug: None
Change-Id: I4997dbd5c5387c22bfd6c5960b421504e4bc8a2a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4292946
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-03-03 17:46:23 +00:00
Frank Barchard
88b050f337 MergeUV AVX512BW use assembly
- Convert MergeUVRow_AVX512BW to assembly
- Enable MergeUVRow_AVX512BW for Windows with clangcl
- MergeUVRow_AVX2 use vpmovzxbw and vpsllw
- MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V

AMD Zen 4 640x360 100000 iterations
Was
AVX512 MergeUVPlane_Opt (884 ms)
AVX2   MergeUVPlane_Opt (945 ms)
AVX2   MergeUVPlane_16_Opt (2167 ms)

Now
AVX512 MergeUVPlane_Opt (865 ms)
AVX2   MergeUVPlane_Opt (943 ms)
SSE2   MergeUVPlane_Opt (973 ms)
AVX2   MergeUVPlane_16_Opt (2102 ms)

Bug: None
Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230
Reviewed-by: Fritz Koenig <frkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-22 21:19:08 +00:00
Frank Barchard
2bdc210be9 MergeUV_AVX512BW for I420ToNV12
On Skylake Xeon 640x360 100000 iterations
AVX512   MergeUVPlane_Opt (1196 ms)
AVX2     MergeUVPlane_Opt (1565 ms)
SSE2     MergeUVPlane_Opt (1780 ms)
Pixel 7  MergeUVPlane_Opt (1177 ms)

Bug: None
Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-13 20:14:57 +00:00
Sergio Garcia Murillo
b2528b0be9 Add support for odd width and height in I410ToI420
Bug: libyuv:950
Change-Id: Ic9a094463af875aefd927023f730b5f35f8551de
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4154630
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-01-23 19:05:00 +00:00
Hao Chen
0809713775 Refine some functions on the Longarch platform.
Add ARGBToYMatrixRow_LSX/LASX, RGBAToYMatrixRow_LSX/LASX and
RGBToYMatrixRow_LSX/LASX functions with RgbConstants argument.

Bug: libyuv:912
Change-Id: I956e639d1f0da4a47a55b79c9d41dcd29e29bdc5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4167860
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-18 18:54:14 +00:00
Frank Barchard
0faf8dd0e0 Fix for DivideRow_NEON functions
- was dup of 8h but mul of 4s.  now use umull

Bug: libyuv:951
Change-Id: If6cb01f5f006c2235886b81ce120642d7e24a9bb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166563
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-18 00:30:05 +00:00
Frank Barchard
541d8efbaf Fix for divide row functions used by P010ToI010
Bug: libyuv:951
Change-Id: Id323656cb6f99b1be0be7aaa854d3cc15feeba69
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166562
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-17 21:40:45 +00:00
Frank Barchard
d5aa3d4a76 P010ToI010 and P012ToI012 conversion functions
- Convert 10 and 12 bit biplanar formats to planar.
- Shift 10 MSB to 10 LSB
- P010 is similar to NV12 in layout, but uses 10 MSB of 16 bit values.
- I010 is similar to I420 in layout, but uses 10 LSB of 16 bit values.

Bug: libyuv:951
Change-Id: I16a1bc64239d0fa4f41810910da448bf5720935f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166560
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-13 19:20:12 +00:00
Frank Barchard
6e4b0acb4b I422Rotate take stride for temporary buffers
- Minor variable name changes first/last to top/bottom
- Comments explaining rotate temporary buffers usage
- Add asserts for scale parameter
- Use NULL and stddef.h instead of 0
- Use void * for allocation in row.h
- Add () around size parameter in macros


Bug: libyuv:926, libyuv:949
Change-Id: Ib55417570926ccada0a0f8abd1753dc12e5b162e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4136762
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-04 23:11:52 +00:00
Sergio Garcia Murillo
f8626a7224 Add 10 bit rotate methods.
This initial implementation is based on current unoptimized code in webrtc using just plain for loops.

Bug: libyuv:949
Change-Id: Ic87ee49c3a0b62edbaaa4255c263c1f7be4ea02b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110782
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-04 21:10:01 +00:00
Sergio Garcia Murillo
22a579c438 Use ScalePlaneDown2_16To8 for avoiding the 2 step process
Bug: libyuv:950
Change-Id: I5a77bca9a0230fe00abd810939e217833a14683f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4134524
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-03 21:41:21 +00:00
Sergio Garcia Murillo
f583b1b4b8 Add I410Copy and I410ToI420 methods
The I410To420 implementation does a two step approach for scaling down and 10-to-8 bit conversion using the Y plane as temporal storage.

Bug: libyuv:950
Change-Id: I3d35fad4b99e17253230456233fbd947e013c0ec
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110783
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-01-03 20:27:28 +00:00
Frank Barchard
3abd6f36b6 Casting for scale functions
- MT2T support for source strides added, but only works for positive values.
- Reduced casting in row_common - one cast per assignment.
- scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t

Bug: libyuv:948, b/257266635, b/262468594
Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ernest Hua <ernesthua@google.com>
2022-12-15 22:34:22 +00:00
Frank Barchard
610e0cdead MT2T Warning fixes for fuchsia
Bug: b/258474032, b/257266635
Change-Id: Ic5cbbc60e2e1463361e359a2fe3e97976c1ea929
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4081348
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-12-06 19:54:40 +00:00
Frank Barchard
ea26d7adb1 DetilePlane_16 AVX version
- fix ifdefs for DetilePlane_16 to use 16 bit versions, not 8 bit.  (no functional change)

Bug: b/258474032
Change-Id: Ic07e02d9801e21126ebee0ceb5779aa712a493ce
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4034812
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-18 23:59:06 +00:00
Frank Barchard
8713ba3f0b Add vzeroupper to AVX row functions
- move power of two macro to planar functions source
- revert row.h IS_ALIGNED change

Bug: b/258474032
Change-Id: If87bb8d55c9b9930dd3e378614f8e4faae0870e9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4035166
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-11-17 23:00:08 +00:00
Frank Barchard
2d2cee418a Add Detile_16 planar function for 10 bit MT2T format
- Neon and SSE2
- Any for odd widths

Pixel 2 little core AArch32 build
C
TestDetilePlane_16 (1275 ms)
TestDetilePlane (1203 ms)
Neon
TestDetilePlane_16 (693 ms)
TestDetilePlane (660 ms)

Bug: b/258474032
Change-Id: Idbd09c5e9324e4deef5f1d54090d4b63cc7db812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4031848
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-17 02:47:57 +00:00
Frank Barchard
fe9ced6e3c ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows
- Preserve xmm7 in ScaleRowUp2_Bilinear_12_SSSE3
- Previously xmm7 was used in ScaleRowUp2_Bilinear_12_SSSE3 without being preserved, which violates the Windows x64 calling conventions and can cause undefined behavior.


Bug: libyuv:945, 1218384
Change-Id: If18b292b588573355f9b4ba8c5b9c3fbe143d36b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3972137
Reviewed-by: Bruce Dawson <brucedawson@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-10-21 19:35:17 +00:00
Frank Barchard
3da24c3ca3 Detile vld for gcc build fix
- add {} around loaded register

Bug: libyuv:944
Change-Id: I0d916e37beb50bda0838e4867742eb7afa57e1cc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3957634
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-10-14 19:12:53 +00:00
Frank Barchard
cb35d5f90e BGRAToI420 use SSSE3 for Y but C for UV when LIBYUV_BIT_EXACT enabled
- Previously was C for both Y and UV.

Was BGRAToI420_Opt (17780 ms)
Now BGRAToI420_Opt (9546 ms)

Bug: b/253491233
Change-Id: Id103d8d5ba0fed0f7a427dd5955e1830275eff6b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3953131
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-10-14 03:09:56 +00:00
Jeremy Maitin-Shepard
c365da9c6c Use find_package(JPEG) in place of include(FindJPEG)
The former allows the package to be overridden with a local build by
`FetchContent`.

Also includes the fix to _MSC_VER conditions from:
https://aomedia.googlesource.com/aom/+/refs/heads/main/third_party/libyuv/README.libaom

Change-Id: I666e591becb3efaa7b5b68d27476319a2909b88e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3933570
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-10-03 22:19:28 +00:00
Frank Barchard
00950840d1 YUY2ToNV12 using YUY2ToY and YUY2ToNVUV
- Optimized YUY2ToNV12 that reduces it from 3 steps to 2 steps
  - Was SplitUV, memcpy Y, InterpolateUV
  - Now YUY2ToY, YUY2ToNVUV
- rollback LIBYUV_UNLIMITED_DATA

3840x2160 1000 iterations:

Pixel 2 Cortex A73
Was YUY2ToNV12_Opt (6515 ms)
Now YUY2ToNV12_Opt (3350 ms)

AB7 Mediatek P35 Cortex A53
Was YUY2ToNV12_Opt (6435 ms)
Now YUY2ToNV12_Opt (3301 ms)

Skylake AVX2 x64
Was YUY2ToNV12_Opt (1872 ms)
Now YUY2ToNV12_Opt (1657 ms)

SSE2 x64
Was YUY2ToNV12_Opt (2008 ms)
Now YUY2ToNV12_Opt (1691 ms)

Windows Skylake AVX2 32 bit x86
Was YUY2ToNV12_Opt (2161 ms)
Now YUY2ToNV12_Opt (1628 ms)

Bug: libyuv:943
Change-Id: I6c2ba2ae765413426baf770b837de114f808f6d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3929843
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-30 22:41:21 +00:00
Frank Barchard
b9adaef113 Enable unlimited data for YUV to RGB
- Provide LIBYUV_LIMITED_DATA macro for backwards compatiblity

Bug: b/474156256
Change-Id: I5d5d7fb640d51ae3c5ad363f2a28c8bfbd3048a5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3912081
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-23 12:51:37 +00:00
Frank Barchard
f9fda6e7d8 Fix shift amount for SSSE3 assembly for I012 format conversions
Bug: libyuv:938, libyuv:942
Change-Id: I6fb6e7e17fa941785e398bc630f465baf72fcabd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906091
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 23:07:53 +00:00
Frank Barchard
8fc02134c8 10/12 bit YUV replicate upper bits to low bits before converting to RGB
- shift high bits of 10 and 12 bit into lower bits

Bug: libyuv:941, libyuv:942,
Change-Id: I14381dbf226ef27dcce06893ea88860835639baa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906085
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 20:56:43 +00:00
Frank Barchard
e4b1ddd8fe Fix immediate offsets for row_neon build on gcc
Bug: libyuv:942
Change-Id: I7d2dc87a44cc1cc5c79c37f407583e0c907dc2de
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906088
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-20 20:16:13 +00:00
Frank Barchard
248172e2ba I422ToRGB24, I422ToRAW, I422ToRGB24MatrixFilter conversion functions added.
- YUV to RGB use linear for first and last row.
- add assert(yuvconstants)
- rename pointers to match row functions.
- use macros that match row functions.
- use 12 bit upsampler for conversions of 10 and 12 bits

Cortex A53 AArch32
I420ToRGB24_Opt (3627 ms)
I422ToRGB24_Opt (4099 ms)
I444ToRGB24_Opt (4186 ms)
I420ToRGB24Filter_Opt (5451 ms)
I422ToRGB24Filter_Opt (5430 ms)

AVX2
Was I420ToRGB24Filter_Opt (583 ms)
Now I420ToRGB24Filter_Opt (560 ms)

Neon Cortex A7
Was I420ToRGB24Filter_Opt (5447 ms)
Now I420ToRGB24Filter_Opt (5439 ms)

Bug: libyuv:938


Change-Id: I1731f2dd591073ae11a756f06574103ba0f803c7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906082
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-20 02:00:52 +00:00
Frank Barchard
f71c83552d I420ToRGB24MatrixFilter function added
- Implemented as 3 steps: Upsample UV to 4:4:4, I444ToARGB, ARGBToRGB24
- Fix some build warnings for missing prototypes.

Pixel 4
I420ToRGB24_Opt (743 ms)
I420ToRGB24Filter_Opt (1331 ms)

Windows with skylake xeon:
x86 32 bit
I420ToRGB24_Opt (387 ms)
I420ToRGB24Filter_Opt (571 ms)
x64 64 bit
I420ToRGB24_Opt (384 ms)
I420ToRGB24Filter_Opt (582 ms)


Bug: libyuv:938, libyuv:830
Change-Id: Ie27f70816ec084437014f8a1c630ae011ee2348c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3900298
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-16 19:46:47 +00:00
Frank Barchard
3e38ce5058 SSE2 MM21->YUY2 conversion
Add SSE2 optimization for MM21ToYUY2 conversion.

Bug: b/238137982
Change-Id: I189f712514308322f651b082b496bce9c015c4ee
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832525
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-08-17 18:39:05 +00:00
Frank Barchard
65e7c9d570 MM21ToYUY2 and ABGRToJ420 conversion
MM21 to YUY2 use zip1 for performance

Cortex A510
Was MM21ToYUY2 (612 ms)
Now MM21ToYUY2 (573 ms)

Prefetches help Cortex A53
Was MM21ToYUY2 (4998 ms)
Now MM21ToYUY2 (1900 ms)

Pixel 4 Cortex A76
Was MM21ToYUY2 (215 ms)
Now MM21ToYUY2 (173 ms)

ABGRToJ420
- NEON, SSSE3 and AVX2 row functions
- J400, J420 and J422 formats.
- Added AVX2 for UV on ARGBToJ420.  Was SSSE3

Same code/performance as ARGBToJ420 but with constants re-ordered.
Pixel 4
ABGRToJ420_Opt (623 ms)
ABGRToJ422_Opt (702 ms)
ABGRToJ400_Opt (238 ms)

Skylake Xeon
With LIBYUV_BIT_EXACT which uses C for UV
ABGRToJ420_Opt (988 ms)
ABGRToJ422_Opt (1872 ms)
ABGRToJ400_Opt (186 ms)
Skylake Xeon using AVX2
ABGRToJ420_Opt (251 ms)
ABGRToJ422_Opt (245 ms)
ABGRToJ400_Opt (184 ms)
Skylake Xeon using SSSE3
ABGRToJ420_Opt (328 ms)
ABGRToJ422_Opt (362 ms)
ABGRToJ400_Opt (185 ms)

Bug: b/238137982
Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-16 22:07:38 +00:00
Frank Barchard
1c5a8bb17a AB64ToARGB fix for inplace conversion
- add tests for all single plane formats that reduce or stay same in size

Bug: b/242233673
Change-Id: Ic25d808114f11995ac56ea9c31b99f66ba36d345
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3828485
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-12 01:28:13 +00:00
Vignesh Venkatasubramanian
a5a1102a60 Add I422ToRGB565Matrix
The code already exists to use a specific matrix. This CL simply
adds a function to use a generic YUV matrix for the conversion.

Bug: b/241451603
Change-Id: I0eea7e96a891d045905a9c963b56c053097029ec
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3820903
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-08-09 20:15:44 +00:00
Frank Barchard
d53f1beecd RAWToJ400 require multiple of 16 pixels for NEON
- fix crash when width is not a multiple of 16
- apply clang format
- bump version

Bug: libyuv:940, b/240094327
Change-Id: Ic18e5b7b64f78f26e8b7d8440bf490a679bda200
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3812594
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-04 22:55:48 +00:00
Vignesh Venkatasubramanian
394436b289 row_neon*: Explicitly initialize pad in RgbConstants
Explicitly initialize the 'pad' field of RgbConstants to 0. This
prevents the following warning/error in some compilers:
error: missing field 'pad' initializer [-Werror,-Wmissing-field-initializers]

Bug: b/241008246
Change-Id: Id6a0beb75c5c709404290c75915049f8a3898c83
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3808044
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-04 18:19:46 +00:00
Wan-Teh Chang
9892d70c96 Fix MSVC warnings by adding casts
Fix the following MSVC warnings:
src\source\row_win.cc(117): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(136): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(155): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(174): warning C4309: 'argument': truncation of
constant value
src\source\row_common.cc(1712): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1731): warning C4244: 'initializing':
conversion from 'int16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1786): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1805): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data

Bug: libyuv:939
Change-Id: Ie87ba6e716732d1ff1ae5c236dfd9cfdac13439d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3807105
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-03 21:24:21 +00:00
Yuan Tong
98ec7c28d5 Fix SSE2 version of ScalePlaneUp2_16_Bilinear
- Define HAS_SCALEROWUP2_BILINEAR_16_SSE2: it's now fixed.
- Correct function name to ScaleRowUp2_Bilinear_16_Any_SSE2:
   this row function uses only SSE2 instructions.

Bug: libyuv:882
Change-Id: Ib1c7ac5b09997cb5b32bc54109d8c566af762433
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3800842
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-02 20:35:48 +00:00
Frank Barchard
b028453ba6 Disable bilinear 16 bit scale up for SSE2
- Undefine HAS_SCALEROWUP2_BILINEAR_16_SSE2
- Save XMM7 in ScaleRowUp2_Bilinear_16_SSE2().
- Rename HAS_SCALEROWUP2LINEAR_xxx to HAS_SCALEROWUP2_LINEAR_xxx
- DetileSplitUVRow_C() is implemented using SplitUVRow_C().
- Changes to unit_test/planar_test.cc.

Bug: libyuv:882
Change-Id: I0a8e8e5fb43bdf58ded87244e802343eacb789f2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3795063
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-01 22:54:48 +00:00
Frank Barchard
6900494d90 Merge/SplitRGB fix -mcmodel=large x86 and InterpolateRow_16To8_NEON
MergeRGB and SplitRGB use a register to point to 9 shuffle tables.

- fixes an out of registers error with -mcmodel=large

InterpolateRow_16To8_NEON improves performance for I210ToI420:

On Pixel 4 for 720p x1000 images
Was I210ToI420_Opt (608 ms)
Now I210ToI420_Opt (336 ms)

On Skylake Xeon
Was I210ToI420_Opt (259 ms)
Now I210ToI420_Opt (209 ms)


Bug: libyuv:931, libyuv:930
Change-Id: I20f8244803f06da511299bf1a2ffc7945eb35221
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717054
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-06-29 00:00:46 +00:00
Frank Barchard
fe4a50df8e Bilinear scale up msan fix
- Avoid stepping to height + 1 for bilinear filter 2nd row for last row of source
- Box filter ubsan fix for 3/4 and 3/8 scaling for 16 bit planar
- Height 1 asan fixes

Bug: libyuv:935, b/206716399
Change-Id: I56088520f2a884a37b987ee5265def175047673e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717263
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-06-22 00:11:49 +00:00
Frank Barchard
e906ba9fe9 InterpolateRow_Any test if fraction is 0 and dont memcpy 2nd row.
Bug: b/228605787
Change-Id: Ia8912e4c1599401320ee82882a2593e78bf56582
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3708833
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-06-17 18:15:09 +00:00
Frank Barchard
30f9b28048 Add I210ToI420
Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: Ib135d0b4ff17665f6a4ab60edb782a7b314219a4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3696042
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2022-06-09 08:07:50 +00:00
Frank Barchard
baef414478 Convert16To8Row_NEON use shift without rounding
Fixes chromium PaintCanvasVideoRendererTest.HighBitDepth

sqdmulh was creating a 9 bit value with rounding, and then shifted it right 1 with no rounding.  The rounding had an off by 1 impact in some tests.

Pixel 3
C           I010ToI420_Opt (749 ms)
Was sqdmulh I010ToI420_Opt (370 ms)
Now ushl    I010ToI420_Opt (324 ms)

Pixel 4
C           I010ToI420_Opt (581 ms)
Was sqdmulh I010ToI420_Opt (240 ms)
Now ushl    I010ToI420_Opt (231 ms)

Bug:  b/216321733, b/233233302
Change-Id: I26f673bb411401d1e4a8126bf22d61c649223e9b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3694143
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-06-08 19:40:30 +00:00
Frank Barchard
d011314f14 Revert "I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix"
This reverts commit 60254a1d846a93a4d7559009004cdd91bcc04d82.

Reason for revert: breaks PaintCanvasVideoRendererTest.HighBitDepth

Original change's description:
> I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix
>
> - Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit
> - Add NEON InterpolateRow_16 for fast 10 bit scaling
> - When scaling up, set step to interpolate toward height - 1 to avoid buffer overread
> - When scaling down, center the 2 rows used for source to achieve filtering.
> - CopyPlane check for 0 size and return
>
> Bug:  libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
> Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1
> Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552
> Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
> Reviewed-by: richard winterton <rrwinterton@gmail.com>

Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: Icc05bb340db0e7fe864061fb501d0a861c764116
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3692886
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2022-06-07 09:16:05 +00:00
Frank Barchard
60254a1d84 I210ToI420, InterpolatePlane_16, and ScalePlane Vertical-only asan fix
- Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit
- Add NEON InterpolateRow_16 for fast 10 bit scaling
- When scaling up, set step to interpolate toward height - 1 to avoid buffer overread
- When scaling down, center the 2 rows used for source to achieve filtering.
- CopyPlane check for 0 size and return

Bug:  libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2022-06-07 01:41:56 +00:00
Joe Downing
c0c8c40b31 Update CopyPlane to handle 0 width and height dimensions
If a width, height, and src/dst strides passed in are all 0, height is updated to 1 which means some CPU optimized functions may try to copy data when the dst rect is not valid.

Bug: b:234340482
Change-Id: I63be1c6ba05d669d67f5079d812acbec09c8f6c9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3689909
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-06-07 01:20:14 +00:00
Frank Barchard
eb2c88e499 Convert16To8 NEON
Pixel 3
Was C    I010ToI420_Opt (749 ms)
Now NEON I010ToI420_Opt (356 ms)

Pixel 4
Was C    I010ToI420_Opt (581 ms)
Now NEON I010ToI420_Opt (163 ms)

Bug: b/233233302, b/233634772
Change-Id: I60a84648a66f77d97c0a7822b29bd18b8e3a3355
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3661401
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-05-24 18:07:16 +00:00
Frank Barchard
715150b5aa Add UYVYToY function
This function reads 2 byte values and writes the 2nd byte to the destination.
It turns out this is useful for P010ToNV12 as well, so adding the planar function allows a high level to call this.
And adds UYVY support for something YUY2 already had.  Which is writing the 1st byte.

Bug: b/233233302, b/233634772
Change-Id: I10a9454cb4f5b2c4ac5532fa86feddf78284d8b8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3659055
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-05-24 01:42:31 +00:00
Frank Barchard
d62ee21e66 UVScale fix for vertical-only scaling
Bug: b/228841445
Change-Id: I0342856e1bfcea69851d718459d66926bb170219
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3595240
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas-Sanchez <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-20 01:27:33 +00:00
Frank Barchard
3c0f408607 Enable HAS_DETILESPLITUVROW_NEON
On Pixel 4
Was C
AArch64 TestDetileSplitUVPlane_Benchmark (935 ms)
AArch32 TestDetileSplitUVPlane_Benchmark (787 ms)

Now NEON
AArch64 TestDetileSplitUVPlane_Benchmark (248 ms)
AArch32 TestDetileSplitUVPlane_Benchmark (256 ms)

Bug: libyuv:915, b/228518489
Change-Id: Ib82b702c1321285738c044ad8c2a7805b16f074a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3594524
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-19 21:17:03 +00:00
Frank Barchard
eec8dd37e8 Change ScaleUVRowUp2_Biinear_16_SSE2 to SSE41
Bug: libyuv:928

xed -i scale_gcc.o:
SYM ScaleUVRowUp2_Linear_16_SSE2:
XDIS 0: LOGICAL   SSE2       660FEFED                 pxor xmm5, xmm5
XDIS 4: SSE       SSE2       660F76E4                 pcmpeqd xmm4, xmm4
XDIS 8: SSE       SSE2       660F72D41F               psrld xmm4, 0x1f
XDIS d: SSE       SSE2       660F72F401               pslld xmm4, 0x1
XDIS 12: DATAXFER  SSE2       F30F7E07                 movq xmm0, qword ptr [rdi]
XDIS 16: DATAXFER  SSE2       F30F7E4F04               movq xmm1, qword ptr [rdi+0x4]
XDIS 1b: SSE       SSE2       660F61C5                 punpcklwd xmm0, xmm5
XDIS 1f: SSE       SSE2       660F61CD                 punpcklwd xmm1, xmm5
XDIS 23: DATAXFER  SSE2       660F6FD0                 movdqa xmm2, xmm0
XDIS 27: DATAXFER  SSE2       660F6FD9                 movdqa xmm3, xmm1
XDIS 2b: SSE       SSE2       660F70D24E               pshufd xmm2, xmm2, 0x4e
XDIS 30: SSE       SSE2       660F70DB4E               pshufd xmm3, xmm3, 0x4e
XDIS 35: SSE       SSE2       660FFED4                 paddd xmm2, xmm4
XDIS 39: SSE       SSE2       660FFEDC                 paddd xmm3, xmm4
XDIS 3d: SSE       SSE2       660FFED0                 paddd xmm2, xmm0
XDIS 41: SSE       SSE2       660FFED9                 paddd xmm3, xmm1
XDIS 45: SSE       SSE2       660FFEC0                 paddd xmm0, xmm0
XDIS 49: SSE       SSE2       660FFEC9                 paddd xmm1, xmm1
XDIS 4d: SSE       SSE2       660FFEC2                 paddd xmm0, xmm2
XDIS 51: SSE       SSE2       660FFECB                 paddd xmm1, xmm3
XDIS 55: SSE       SSE2       660F72D002               psrld xmm0, 0x2
XDIS 5a: SSE       SSE2       660F72D102               psrld xmm1, 0x2
XDIS 5f: SSE       SSE4       660F382BC1               packusdw xmm0, xmm1
XDIS 64: DATAXFER  SSE2       F30F7F06                 movdqu xmmword ptr [rsi], xmm0
XDIS 68: MISC      BASE       488D7F08                 lea rdi, ptr [rdi+0x8]
XDIS 6c: MISC      BASE       488D7610                 lea rsi, ptr [rsi+0x10]
XDIS 70: BINARY    BASE       83EA04                   sub edx, 0x4
XDIS 73: COND_BR   BASE       7F9D                     jnle 0x12 <ScaleUVRowUp2_Linear_16_SSE2+0x12>
XDIS 75: RET       BASE       C3                       ret

SYM ScaleUVRowUp2_Bilinear_16_SSE2:
XDIS 0: LOGICAL   SSE2       660FEFFF                 pxor xmm7, xmm7
XDIS 4: SSE       SSE2       660F76F6                 pcmpeqd xmm6, xmm6
XDIS 8: SSE       SSE2       660F72D61F               psrld xmm6, 0x1f
XDIS d: SSE       SSE2       660F72F603               pslld xmm6, 0x3
XDIS 12: DATAXFER  SSE2       F30F7E07                 movq xmm0, qword ptr [rdi]
XDIS 16: DATAXFER  SSE2       F30F7E4F04               movq xmm1, qword ptr [rdi+0x4]
XDIS 1b: SSE       SSE2       660F61C7                 punpcklwd xmm0, xmm7
XDIS 1f: SSE       SSE2       660F61CF                 punpcklwd xmm1, xmm7
XDIS 23: DATAXFER  SSE2       660F6FD0                 movdqa xmm2, xmm0
XDIS 27: DATAXFER  SSE2       660F6FD9                 movdqa xmm3, xmm1
XDIS 2b: SSE       SSE2       660F70D24E               pshufd xmm2, xmm2, 0x4e
XDIS 30: SSE       SSE2       660F70DB4E               pshufd xmm3, xmm3, 0x4e
XDIS 35: SSE       SSE2       660FFED0                 paddd xmm2, xmm0
XDIS 39: SSE       SSE2       660FFED9                 paddd xmm3, xmm1
XDIS 3d: SSE       SSE2       660FFEC0                 paddd xmm0, xmm0
XDIS 41: SSE       SSE2       660FFEC9                 paddd xmm1, xmm1
XDIS 45: SSE       SSE2       660FFEC2                 paddd xmm0, xmm2
XDIS 49: SSE       SSE2       660FFECB                 paddd xmm1, xmm3
XDIS 4d: DATAXFER  SSE2       F30F7E1477               movq xmm2, qword ptr [rdi+rsi*2]
XDIS 52: DATAXFER  SSE2       F30F7E5C7704             movq xmm3, qword ptr [rdi+rsi*2+0x4]
XDIS 58: SSE       SSE2       660F61D7                 punpcklwd xmm2, xmm7
XDIS 5c: SSE       SSE2       660F61DF                 punpcklwd xmm3, xmm7
XDIS 60: DATAXFER  SSE2       660F6FE2                 movdqa xmm4, xmm2
XDIS 64: DATAXFER  SSE2       660F6FEB                 movdqa xmm5, xmm3
XDIS 68: SSE       SSE2       660F70E44E               pshufd xmm4, xmm4, 0x4e
XDIS 6d: SSE       SSE2       660F70ED4E               pshufd xmm5, xmm5, 0x4e
XDIS 72: SSE       SSE2       660FFEE2                 paddd xmm4, xmm2
XDIS 76: SSE       SSE2       660FFEEB                 paddd xmm5, xmm3
XDIS 7a: SSE       SSE2       660FFED2                 paddd xmm2, xmm2
XDIS 7e: SSE       SSE2       660FFEDB                 paddd xmm3, xmm3
XDIS 82: SSE       SSE2       660FFED4                 paddd xmm2, xmm4
XDIS 86: SSE       SSE2       660FFEDD                 paddd xmm3, xmm5
XDIS 8a: DATAXFER  SSE2       660F6FE0                 movdqa xmm4, xmm0
XDIS 8e: DATAXFER  SSE2       660F6FEA                 movdqa xmm5, xmm2
XDIS 92: SSE       SSE2       660FFEE0                 paddd xmm4, xmm0
XDIS 96: SSE       SSE2       660FFEEE                 paddd xmm5, xmm6
XDIS 9a: SSE       SSE2       660FFEE0                 paddd xmm4, xmm0
XDIS 9e: SSE       SSE2       660FFEE5                 paddd xmm4, xmm5
XDIS a2: SSE       SSE2       660F72D404               psrld xmm4, 0x4
XDIS a7: DATAXFER  SSE2       660F6FEA                 movdqa xmm5, xmm2
XDIS ab: SSE       SSE2       660FFEEA                 paddd xmm5, xmm2
XDIS af: SSE       SSE2       660FFEC6                 paddd xmm0, xmm6
XDIS b3: SSE       SSE2       660FFEEA                 paddd xmm5, xmm2
XDIS b7: SSE       SSE2       660FFEE8                 paddd xmm5, xmm0
XDIS bb: SSE       SSE2       660F72D504               psrld xmm5, 0x4
XDIS c0: DATAXFER  SSE2       660F6FC1                 movdqa xmm0, xmm1
XDIS c4: DATAXFER  SSE2       660F6FD3                 movdqa xmm2, xmm3
XDIS c8: SSE       SSE2       660FFEC1                 paddd xmm0, xmm1
XDIS cc: SSE       SSE2       660FFED6                 paddd xmm2, xmm6
XDIS d0: SSE       SSE2       660FFEC1                 paddd xmm0, xmm1
XDIS d4: SSE       SSE2       660FFEC2                 paddd xmm0, xmm2
XDIS d8: SSE       SSE2       660F72D004               psrld xmm0, 0x4
XDIS dd: DATAXFER  SSE2       660F6FD3                 movdqa xmm2, xmm3
XDIS e1: SSE       SSE2       660FFED3                 paddd xmm2, xmm3
XDIS e5: SSE       SSE2       660FFECE                 paddd xmm1, xmm6
XDIS e9: SSE       SSE2       660FFED3                 paddd xmm2, xmm3
XDIS ed: SSE       SSE2       660FFED1                 paddd xmm2, xmm1
XDIS f1: SSE       SSE2       660F72D204               psrld xmm2, 0x4
XDIS f6: SSE       SSE4       660F382BE0               packusdw xmm4, xmm0
XDIS fb: DATAXFER  SSE2       F30F7F22                 movdqu xmmword ptr [rdx], xmm4
XDIS ff: SSE       SSE4       660F382BEA               packusdw xmm5, xmm2
XDIS 104: DATAXFER  SSE2       F30F7F2C4A               movdqu xmmword ptr [rdx+rcx*2], xmm5
XDIS 109: MISC      BASE       488D7F08                 lea rdi, ptr [rdi+0x8]
XDIS 10d: MISC      BASE       488D5210                 lea rdx, ptr [rdx+0x10]
XDIS 111: BINARY    BASE       4183E804                 sub r8d, 0x4
XDIS 115: COND_BR   BASE       0F8FF7FEFFFF             jnle 0x12 <ScaleUVRowUp2_Bilinear_16_SSE2+0x12>
XDIS 11b: RET       BASE       C3                       ret

Change-Id: Ia20860e9c3c45368822cfd8877167ff0bf973dcc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3587602
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-15 18:46:09 +00:00
Wan-Teh Chang
18f9110516 Avoid AVX instructions in ScaleRowUp2_Linear_SSSE3
The "vpackuswb   %%xmm2,%%xmm0,%%xmm0" and "vmovdqu     %%xmm0,(%1)"
instructions in ScaleRowUp2_Linear_SSSE3() are AVX instructions. They
cause an illegal instruction exception on CPUs that do not support AVX.

Bug: libyuv:927
Bug: chromium:1312551
Change-Id: I87b2aaf041e7d185e7e8fb07172d4f37482e9d08
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3585881
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2022-04-15 00:18:39 +00:00
Frank Barchard
2ad73733d9 I422Rotate update to remove name space for ios build warning
- Remove libyuv:: from within libyuv to resolve a build warning on IOS.
- Check src_y parameter is not NULL if there is a dst_y parameter
- Apply clang-format
- Bump version

Performance on Intel Skylake Xeon
ARGBRotate90_Opt (795 ms)
I420Rotate90_Opt (283 ms)
I422Rotate90_Opt (867 ms)  <-- scales and rotates
I444Rotate90_Opt (565 ms)
NV12Rotate90_Opt (289 ms)

Performance on Pixel 4 (Cortex A76)
ARGBRotate90_Opt (4208 ms)
I420Rotate90_Opt (273 ms)
I422Rotate90_Opt (1207 ms)
I444Rotate90_Opt (718 ms)
NV12Rotate90_Opt (282 ms)


Bug: libyuv:926
Change-Id: I42e1b93a9595f6ed075918e91bed977dd3d23f6f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3576778
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-07 21:06:44 +00:00
Sergio Garcia Murillo
a77d615e10 Add tentative I422Rotate.
When doing 90 or 270 degrees rotation we need to do a rotate&scale of the UV planes, as there are no helper optimized functions to do this, we use the Y plane as temporal memory and perform each of the transforms independently:

First U plane is rotated, putting the result in the Y plane. After the rotation, the output has double the samples horizontally and half the samples vertically, so it is scaled into the final U plane. Same process is done with the V plane.

Last the Y plane that can be just rotated without scaling.

It would be great to have an optimized version for this, but maybe this is helpfull for triggering the discussions.

Bug: libyuv:926
Change-Id: I188af103c4d0e3f9522021b4bf2b63c9d5de8b93
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3568424
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-06 23:49:35 +00:00
Sergio Garcia Murillo
4589081cea Add I422 and I210 functions
Bug: webrtc:13826
Change-Id: I68235a668abecf76133f7b89472b192b1442bed4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3557217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-31 15:30:53 +00:00
Wan-Teh Chang
f4d2530846 Declare RgbConstants structs as static const
These RgbConstants structs were added in
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023. They
should be declared as static const. These four symbols were detected by
Chrome's android-binary-size trybot as "Mutable Constants".

Change-Id: I3b4d4ff4b32e261ba528c07647b9d69ac368ab5b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3553035
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2022-03-25 20:39:38 +00:00
Wan-Teh Chang
ebd9e130f0 Fix bugs in I010AlphaToARGBMatrixBilinear()
Add a missing increment of src_a and ARGBAttenuateRow() call.

Bug: libyuv:922
Change-Id: I26e04e70c6a1a231cbe54b60c249f4c2e8af112a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3549976
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2022-03-25 15:39:52 +00:00
Wan-Teh Chang
173ed374c0 Add null pointer checks for the src_a parameters
Change-Id: Icc96e18eab07080c18b6542171a340c97f059c78
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3550016
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
2022-03-25 15:14:38 +00:00
Frank Barchard
124bf08fee RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24
1920x1080 to/from 1280x720 to ARGB on Intel Skylake Xeon
RGBScaleTo1920x1080_Bilinear (2625 ms)
RGBScaleFrom1920x1080_Bilinear (2115 ms)
ARGBScaleTo1920x1080_Bilinear (1668 ms)
ARGBScaleFrom1920x1080_Bilinear (1164 ms)

Bug: b/224814071
Change-Id: Ifc7611b597409771728b13c9c39e5a7e06131021
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3537341
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-19 01:44:06 +00:00
Frank Barchard
95b14b2446 RAWToJ400 faster version for ARM
- Unrolled to 16 pixels
- Take constants via structure, allowing different colorspace and channel order
- Use ADDHN to add 16.5 and take upper 8 bits of 16 bit values, narrowing to 8 bits
- clang-format applied, affecting mips code

On Cortex A510
Was RAWToJ400_Opt (1623 ms)
Now RAWToJ400_Opt (862 ms)

C   RAWToJ400_Opt (1627 ms)

Bug: b/220171611
Change-Id: I06a9baf9650ebe2802fb6ff6dfbd524e2c06ada0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-18 07:22:36 +00:00
Yuan Tong
25d0a5110b Fix FilterMode enum type
When used in C enum keyword can't be eliminated.

Bug: libyuv:872
Change-Id: Iacff5a8bd84ec7caa1f90889e48f81ffc10071ae
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3513317
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 21:25:44 +00:00
Yuan Tong
ebb27d6916 Add YUV to RGB conversion function with filter parameter
Add the following functions:

I420ToARGBMatrixFilter
I422ToARGBMatrixFilter
I010ToAR30MatrixFilter
I210ToAR30MatrixFilter
I010ToARGBMatrixFilter
I210ToARGBMatrixFilter
I420AlphaToARGBMatrixFilter
I422AlphaToARGBMatrixFilter
I010AlphaToARGBMatrixFilter
I210AlphaToARGBMatrixFilter

Bug: libyuv:872
Change-Id: Ib33b09fd7d304688c5e06c55e0a576a964665a51
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3430334
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 11:50:35 +00:00
Hao Chen
91bae707e1 Optimize functions for LASX in row_lasx.cc.
1. Optimize 18 functions in source/row_lasx.cc file.
2. Make small modifications to LSX.
3. Remove some unnecessary content.

Bug: libyuv:912
Change-Id: Ifd1d85366efb9cdb3b99491e30fa450ff1848661
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3507640
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 08:52:54 +00:00
Frank Barchard
42d76a342f RAWToJNV21 function with 2 step conversion
RAWToJ420 + J420ToNV21 on row level

Pixel 6
RAWToJNV21_Opt (320 ms)

Skylake Xeon
RAWToJNV21_Opt (302 ms)

Bug: b/220171611
Change-Id: I39dcce9cf56c576b95666bb4fb1baccf9fbc7f7a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3495876
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-01 19:33:49 +00:00
Hao Chen
2dd3ea6f39 Fix Bugs on mips platform V2.
This patch adds some deleted control macros so that these MSA
optimization functions can be called normally on mips platform.
There are also some modifications to adapt to the clang compiler.

Bug: libyuv:918
Change-Id: I6ffadc6582682b5eaeae2e0f4033d66d370b48b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3494667
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-01 13:16:31 +00:00
Frank Barchard
e77531f6f1 Fix RotatePlane by 90 on Neon when source width is not a multiple of 8
Bug: b/220888716, b/218875554, b/220205245
Change-Id: I17e118ac9b9a7013386a5f0ad27a2dd249474ae5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3483576
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-02-23 19:16:53 +00:00
Hao Chen
3b8c86d23a Fix bugs on mips platform.
This patch fixes compilation errors caused by the removal of kUVBias
and two failed test cases of LibYUVConvertTest.RGB565ToI420_Opt and
LibYUVConvertTest.ARGB1555ToI420_Opt.

Bug: libyuv:918
Change-Id: I1a66bcd7ef616aacbeca5b4015013015ccdf0f18
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3477416
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-02-22 18:24:02 +00:00
Justin Green
b4ddbaf549 Add support for MM21.
Add support for MM21 to NV12 and I420 conversion, and add SIMD
optimizations for arm, aarch64, SSE2, and SSSE3 machines.

Bug: libyuv:915, b/215425056
Change-Id: Iecb0c33287f35766a6169d4adf3b7397f1ba8b5d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3433269
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Justin Green <greenjustin@google.com>
2022-02-03 17:01:49 +00:00
Frank Barchard
804980bbab DetilePlane and unittest for NEON
Bug: libyuv:915, b/215425056
Change-Id: Iccab1ed3f6d385f02895d44faa94d198ad79d693
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3424820
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-31 20:05:55 +00:00
Frank Barchard
2c6bfc02d5 Remove MMI support
Bug: libyuv:916
Change-Id: I345b7e271ceb4b32fe91e292915e66be40812810
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3415817
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-26 08:41:33 +00:00
Hao Chen
2f87e9a713 Add optimization functions in scale_lsx.cc file.
Optimize 20 functions in source/scale_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913

Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
f8e2da48ae Add optimization functions in rotate_lsx.cc file.
Optimize two functions in source/rotate_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913
Change-Id: Idf670a1bc078f6284a499a292e0cb795f5b603b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
dfe046d272 Add optimization functions in row_lsx.cc file.
Optimize 44 functions in source/row_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913
Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
de8ae8c679 Add optimization functions in row_lasx.cc file.
Optimize 32 functions in source/row_lasx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:912
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Change-Id: I7d3f649f753f72ca9bd052d5e0562dbc6f6ccfed
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351466
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
51de1e16f2 Add supports for loongarch LSX and LASX.
1. Add supports for LSX and LASX.
2. Three optimization functions are added in loongarch/row_lasx.cc file:
   I422ToARGBRow_LASX,I422ToRGBARow_LASX,I422AlphaToARGBRow_LASX.

Bug: libyuv:912, Bug: libyuv:913
Change-Id: I043c2704f99a5215724b5c0b7f97e6bf5f7a199b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329189
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-20 19:25:38 +00:00
Frank Barchard
90ffd5cba9 I420ToARGB for AVX512
On Skylake Xeon
AVX512  I420ToARGB_Opt (2050 ms)
AVX2    I420ToARGB_Opt (2533 ms)
SSSE3   I420ToARGB_Opt (3688 ms)

Bug: libyuv:911
Change-Id: I2214cc15dec24b06541895ca59d88990edbb2216
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3382100
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-14 09:17:33 +00:00
Frank Barchard
cdd62da670 VNNI detect
Bug: libyuv:911
Change-Id: Ic4e7720b4d5c20010470f06a7021d1a2426e765f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3381495
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-12 07:08:20 +00:00
Frank Barchard
78625492cb InterpolateRow_AVX2 use AVX2 instead of ERMS for 100%
Bug: b/210066781
Change-Id: I709e403f03bd6b9f8fe693b165b242b784076fe0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329072
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-15 03:54:18 +00:00
Frank Barchard
fdc71956bd InterpolateRow_AVX2 - extend width count to 64 bits
Bug: b/210066781
Change-Id: Ib9052d8edfce29b95ca02a6f7254d3ff35d2b64d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329070
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-10 04:11:18 +00:00
Frank Barchard
d7a2d5da87 J400ToARGB optimized for Exynos using ZIP+ST1
Bug: 204562143
Change-Id: I56c98198c02bd0dd1283f1c14837730c92832c39
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3328702
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-10 01:00:07 +00:00
Frank Barchard
000806f373 NV21ToYUV24 replace ST3 with ST1. ARGBToAR64 replace ST2 with ST1
On Samsung S8 Exynos M2
Was ST3 NV21ToYUV24_Opt (769 ms)
Now ST1 NV21ToYUV24_Opt (473 ms)
Was ST2 ARGBToAR64_Opt (1759 ms)
Now ST1 ARGBToAR64_Opt (987 ms)

Skylake Xeon, AVX2 version:
Was NV21ToYUV24_Opt (885 ms)
Now NV21ToYUV24_Opt (194 ms)

Bug: b/204562143, b/124413599
Change-Id: Icc9cb64d822cd11937789a4e04fbb773b3e33aa3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3290664
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-11-24 07:38:49 +00:00
Frank Barchard
a04e4f87fb Fix scale any mask parameter bug for NV12Scale
Bug: None
Change-Id: Ib4e174c086162ee709faf4b04c7d5d5847a7de3d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3267488
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-11-08 20:00:04 +00:00
Frank Barchard
fa043c7a64 Android420ToI420Rotate function to convert with rotation
- adapted from Android420ToI420, adding a rotation parameter
- SplitRotateUV added to rotate and split the UV channel of NV12 or NV21
- rename RotateUV functions to SplitRotateUV

Bug: b/203549508
Change-Id: I6774da5fb5908fdf1fc12393f0001f41bbda9851
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3251282
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-10-28 22:38:04 +00:00
Frank Barchard
b179f1847a Enable SIMD for exact RGB to Y conversions
Bug: libyuv:908, b/202888439
Change-Id: Icc5470b85d91b441ded9958ee04b4f32246646f0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3230489
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-10-19 07:54:50 +00:00
Frank Barchard
f0cfc1f1c8 ubsan friendly unaligned tests
- ubsan complains on unaligned tests when an int16 or int32 is stored unaligned in C.
Although current Intel, ARM, Mips and PPC can do unaligned load/store, its not guaranteed
and could crash a CPU that doesnt support it.
- unaligned tests use offset of 2 or 4, which ubsan accepts.
- unittest fills in random buffer with 2 bytes at a time instead of a short.
- row common functions for int16 types use 2 shorts instead of 1 int.

Bug: libyuv:908, b/203243873
Change-Id: Idf13fa901647d7b0975f1947291caa781999a9bc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3229782
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-10-18 18:03:28 +00:00
Frank Barchard
55b97cb48f BIT_EXACT for unattenuate and attenuate.
- reenable Intel SIMD unaffected by BIT_EXACT
- add bit exact version of ARGBAttenuate, which uses ARM version of formula.
- add bit exact version of ARGBUnatenuate, which mimics the AVX code.

Apply clang format to cleanup code.

Bug: libyuv:908, b/202888439
Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-10-15 19:46:02 +00:00
Frank Barchard
11cbf8f976 Add LIBYUV_BIT_EXACT macro to force C to match SIMD
- C code use ARM path, so NEON and C match
- C used on Intel platforms, disabling AVX.

Bug: libyuv:908, b/202888439
Change-Id: Ie035a150a60d3cf4ee7c849a96819d43640cf020
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3223507
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-10-14 20:37:39 +00:00
Frank Barchard
daf9778a24 Fix for failed compile with armv-7a neon gcc
Bug: libyuv:907
Change-Id: I955e83c72b57ce5ba45730030b32f337be610a21
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3216739
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-10-12 18:17:50 +00:00
Frank Barchard
b92a60320f ConvertFromI420 respect destination stride for NV12 and NV21
Bug: libyuv:904
Change-Id: Ie1fd39c693e64661eb52f75492a261384db70776
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3176483
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-09-22 19:44:06 +00:00
Frank Barchard
33a68ec779 JPeg decoder remove assert when out of data
Bug: b/186665202
Change-Id: I406cc2ef8cfa2cdf987d41c4bd85d3024aedfaab
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3166710
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-09-16 23:11:14 +00:00
Frank Barchard
ed5a9c81de change ld1 to ldr for memory references to allow GCC to use an offset
Bug: chromium:819294, libyuv:903
Change-Id: I1cd19cc5a068c421d1112c9ea6090e18fb002a4c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3152821
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-09-09 23:46:46 +00:00
Stephan Hartmann
c6ed1b8f0e GCC: force memory address without offset on aarch64
With "m" GCC generates a memory address with offset which is
not allowed with ld1 on aarch64. Change constraint to "Q" to
force address without offset.

Bug: chromium:819294, libyuv:903
Change-Id: Iaae24bc6882cdef823259040a37fdbfc31f91185
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922146
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-09-09 22:20:27 +00:00
Frank Barchard
639dd4ea76 Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.
- swap U and V when crop x is odd
- document YUY2 and UYVY formats
- apply clang-format

Bug: libyuv:902
Change-Id: I045e44c907f4a9eb625d7c024b669bb308055f32
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3039549
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-07-19 22:22:22 +00:00
Frank Barchard
d19f69d9df Update Android.bp to always enable NEON
Relax Cpu unittest to allow ARM emulator to run.

Bug: libyuv:863, libyuv:877, b/178283356
Change-Id: I3c751574219fdf731a3f9d4a79934a349acba446
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2950938
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-06-10 19:31:48 +00:00
Stephan Hartmann
6ea7647b6e GCC: replace mov .8h with mov .16b
mov Vy.8h, Vx.8h isn't a valid instruction. Clang/LLVM
automatically replace it with mov Vy.16b, Vx.16b.

Bug: chromium:819294
Change-Id: I8a0cbf2e6c4efcc6c1e38812cee949bde7e99b11
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922147
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-06-01 17:44:56 +00:00
Frank Barchard
5b3351bd07 Fix ARGB1555ToI420 odd width bug in C code.
Was
[ RUN      ] LibYUVConvertTest.ARGB1555ToI420_Any
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure
Expected equality of these values:
  dst_uv_c[i * kStrideUV + j]
    Which is: '\x8B' (139)
  dst_uv_opt[i * kStrideUV + j]
    Which is: '\x92' (146)
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure

[  FAILED  ] LibYUVConvertTest.ARGB1555ToI420_Any

Now
[ RUN      ] LibYUVConvertTest.ARGB1555ToI420_Any
[       OK ] LibYUVConvertTest.ARGB1555ToI420_Any (0 ms)

Bug: libyuv:894, b/155722711
Change-Id: I12dcacd0ecfff4ede5693a2554e9bb10dc8586c1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2870484
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-05-04 19:03:22 +00:00
Frank Barchard
49ebc996aa Make 2 step transitive tests measure 2 step time.
Add tests of all macros used by libyuv public headers

When a 1 step conversion is added, a 2 step test can compare
the old 2 step method to the 1 step.  A 1 step unittest is
also added which compares C to SIMD.  Making the 2 step
conversions measure performance of the 2 steps allows the
old 2 step performance to be compared to 1 step.

All macros used in public headers are added to an ifdef test.
Showing them in a unittest allows some diagnostics when
a test is failing.

Bug: libyuv:901
Change-Id: I7ffa6ed0cb3b506fa1b7fd4b7b1b729658c3c266
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2857916
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-30 18:14:57 +00:00
Yuan Tong
99cddd8051 Fix ARM YuvConstants value
R=fbarchard@chromium.org

Bug: libyuv:901
Change-Id: Ie2f9ac214a2a7462cc613f510b64308d3b861b74
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2856225
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-29 15:57:20 +00:00
Yuan Tong
c9843de02a Optimize unlimited data for Intel
Use unsigned coefficient and signed UV value in YUVTORGB.

R=fbarchard@chromium.org

Bug: libyuv:862, libyuv:863
Change-Id: I32e58b2cee383fb98104c055beb0867a7ad05bfe
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2850016
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-27 20:35:27 +00:00
Frank Barchard
5e05f26a2b Switch win32 to row_gcc for clangcl.
Bug: libyuv:900, libyuv:848, b/178283356, b/185922513
Change-Id: I7697953753391c555a778198db36412c853fb29e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2844962
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2021-04-22 19:32:32 +00:00
Yuan Tong
8c8d907d29 Unlimited data for Windows
Port unlimited data YUVToRGB code to windows.
Disable MIPS YUVToRGB assembly for now to get correct result.

R=fbarchard@chromium.org

Bug: libyuv:862, libyuv:863
Change-Id: Ib3e99c98082badfef4eb671205a151dd1de56b67
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2839383
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-04-22 01:48:48 +00:00
Frank Barchard
5e83cac0d5 Disable win32 SIMD
Bug: libyuv:900, libyuv:848, b/178283356, b/185922513
Change-Id: Iee7d9970c7991856c8f51158cd12ec72ee9c57eb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2844779
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
2021-04-21 21:37:44 +00:00
Yuan Tong
a1814576bf Unlimited data for Intel
Use unsigned coefficients on Intel.
Make C, NEON and AVX2 match under LIBYUV_UNLIMITED_DATA.

Bug: libyuv:862, libyuv:863
Change-Id: I6c02147ea3c1875c4fc23863435aea86dcf5880a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2830180
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-04-19 20:29:10 +00:00
Yuan Tong
590c17ce40 Refactor NEON YUVToRGB, Remove subsampling
Refactor NEON YUVToRGB Assembly to support HBD data as input and output.
Work on YUV444 internally, remove subsampling in I444ToARGB.

libyuv_unittest --gtest_filter=*.NV??ToARGB_Opt:*UYVYToARGB_Opt:*YUY2ToARGB_Opt:*I4*ToARGB_Opt

Bug: libyuv:895, libyuv:862, libyuv:863
Change-Id: I05b56ea8ea56d9e523720b842fa6e4b122ed4115
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2810060
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-04-15 19:13:10 +00:00
Frank Barchard
287158925b use width + 1 for odd width tests
Bug: libyuv:894, libyuv:898, libyuv:899
Change-Id: Ieba8eaeb8b06f0323824967776673e339b263220
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2809701
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-04-09 20:17:55 +00:00
Yuan Tong
2cd098f83b Fix MergeAR64Plane on odd width
R=fbarchard@chromium.org

Bug: libyuv:898
Change-Id: I031e008ea91baba1c7598efa0eda70750cbfce85
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2810066
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-04-08 09:18:34 +00:00
Frank Barchard
d1bfc6ead6 gcc fix for row_gcc.cc vbroadcastss
Bug: libyuv:893
Change-Id: I5b70e6a94356878deb348cbd19c9e1e50b2a18aa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2808793
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-06 21:31:29 +00:00
Frank Barchard
60db98b6fa clang-tidy applied
Bug: libyuv:886, libyuv:889
Change-Id: I2d14d03c19402381256d3c6d988e0b7307bdffd8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2800147
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-04-01 21:42:47 +00:00
Mirko Bonadei
34bf48e160 Check if LIBYUV_UNLIMITED_DATA is defined to avoid -Wundef.
No-Try: True
Bug: None
Change-Id: I32f6da42c82628210f82ce446d4ec69e2013a2ff
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2799761
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-04-01 18:41:25 +00:00
Yuan Tong
8a13626e42 Add MergeAR30Plane, MergeAR64Plane, MergeARGB16To8Plane
These functions merge high bit depth planar RGB pixels into packed format.

Change-Id: I506935a164b069e6b2fed8bf152cb874310c0916
Bug: libyuv:886, libyuv:889
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-03-31 20:46:02 +00:00
Frank Barchard
312c02a5aa Fixes for SplitUVPlane_16 and MergeUVPlane_16
Planar functions pass depth instead of scale factor.
Row functions pass shift instead of depth.  Add assert to C.
AVX shift instruction expects a single shift value in XMM.
Neon pass shift as input (not output).
Split Neon reimplemented as left shift on shorts by negative to achieve right shift.
Add planar unitests

Bug: libyuv:888
Change-Id: I8fe62d3d777effc5321c361cd595c58b7f93807e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2782086
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-03-24 21:37:10 +00:00
Frank Barchard
d8f1bfc981 Add RAWToJ420
Add J420 output from RAW.
Optimize RGB24 and RAW To J420 on ARM by using NEON for the 2 step conversion.

Also fix sign-compare warning that was breaking Windows build

Bug: libyuv:887, b/183534734
Change-Id: I8c39334552dc0b28414e638708db413d6adf8d6e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2783382
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-03-23 23:45:54 +00:00
Frank Barchard
b046131c0b Replace MOV .4s with MOV .16b for GCC compatability
MOV Vy.4s, Vx.4s is not a valid instruction form (even though LLVM allows it).
It should be MOV Vy.16b, Vx.16b (.8b for 64-bit variants)

Bug: None
Change-Id: I3c3b42288a0ebc275962fa3adad707b351d00d4c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780155
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-03-23 18:03:06 +00:00
Yuan Tong
f37014fcff Add support for AR64 format
Add following conversions:
ARGB,ABGR <-> AR64,AB64
AR64 <-> AB64

R=fbarchard@chromium.org

Change-Id: I5ca5b40a98bffea11981e136afae4a511ba6c564
Bug: libyuv:886
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2746780
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-03-13 20:55:21 +00:00
Yuan Tong
d47031c0d4 Fix x86 windows build error
Correct rule for marking relevant functions as available.
Fix some clang-tidy issues.

R=fbarchard@chromium.org

Change-Id: I66fa0d7ae5a681356f94bfc1bc82b7f1f407d5df
Bug: libyuv:884, libyuv:885
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2738414
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-05 03:24:44 +00:00
Frank Barchard
ba033a11e3 Add 12 bit YUV to 10 bit RGB
Bug: libyuv:843
Change-Id: I0104c8fcaeed09e83d2fd654c6a5e7d41bcb74cf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2727775
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-03-05 01:09:37 +00:00
Martin Storsjö
95ff456c33 Fix the mask for odd widths for ScaleRowUp2_Linear*_Any_NEON
These NEON functions produce 16 pixels per iteration each, thus
use the mask 15, not 7.

Change-Id: I1f3eb691a9ca4af705393b2842b18b65f6878926
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2731801
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-03 16:19:17 +00:00
Yuan Tong
cdabad5bfa Add more 10 bit YUV To RGB function
The following functions are added:
planar YUV:
 I410ToAR30, I410ToARGB
planar YUVA:
 I010AlphaToARGB, I210AlphaToARGB, I410AlphaToARGB
biplanar YUV:
 P010ToARGB, P210ToARGB
 P010ToAR30, P210ToAR30

biplanar functions can also handle 12 bit and 16 bit samples.

libyuv_unittest --gtest_filter=LibYUVConvertTest.*10*ToA*:LibYUVConvertTest.*P?1?ToA*

R=fbarchard@chromium.org

Bug: libyuv:751, libyuv:844
Change-Id: I2be02244dfa23335e1e7bc241fb0613990208de5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2707003
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-03 15:48:47 +00:00
Yuan Tong
c41eabe3d4 Add full 16 bit scaling up by 2x function
R=fbarchard@chromium.org

Change-Id: I4a869aefdc16e34357a615727711594c5d8e3a80
Bug: libyuv:882
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2719842
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-03-02 19:29:02 +00:00
Yuan Tong
a8c181050c Add 10/12 bit YUV To YUV functions
The following functions (and their 12 bit variant) are added:

planar, 10->10:
 I410ToI010, I210ToI010

planar, 10->8:
 I410ToI444, I210ToI422

planar<->biplanar, 10->10:
 I010ToP010, I210ToP210, I410ToP410
 P010ToI010, P210ToI210, P410ToI410

R=fbarchard@chromium.org

Change-Id: I9aa2bafa0d6a6e1e38ce4e20cbb437e10f9b0158
Bug: libyuv:834, libyuv:873
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2709822
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-02-25 23:16:54 +00:00
Frank Barchard
08815a2976 Scale 12 functions that are scale 16 but with only low 12 bits valid
Rename yuvconstants to .c and use round from math.h

Bug: libyuv:882, b/180472591
Change-Id: I70720bf3e0833ba00df0d721f12020fba0b07a03
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2706966
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-02-19 18:04:48 +00:00
Frank Barchard
d768774299 add yuvconvstants util
miscellaneous cleanup of other code/comments

Bug: libyuv:873, libyuv:877
Change-Id: I0d8caf9a65908ff8898b25494f7c724775f84fa3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2692930
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-12 19:45:16 +00:00
Yuan Tong
d4ecb70610 Add P010ToP410 and P210ToP410
These are 16 bit bi-planar convert functions to scale UV plane to
Y plane's size using (bi)linear filter.

libyuv_unittest --gtest_filter=*ToP41*

R=fbarchard@chromium.org

Bug: libyuv:872
Change-Id: I3cb4fafe2b2c9eedd0d91cf4c619abb9ee107bc1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2690102
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-12 14:55:24 +00:00
Frank Barchard
12a4a2372c Rounding added to scaling upsampler
Bug: libyuv:872, b/178521093
Change-Id: I86749f73f5e55d5fd8b87ea6938084cbacb1cda7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2686945
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-02-10 18:51:02 +00:00
Yuan Tong
f7fc83f46d Add NV12ToNV24 and NV16ToNV24
These are bi-planar convert functions to scale UV plane to Y plane's size using (bi)linear filter.

libyuv_unittest --gtest_filter=*ToNV24*

R=fbarchard@chromium.org

Change-Id: I3d98f833feeef00af3c903ac9ad0e41bdcbcb51f
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2682152
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-09 07:38:40 +00:00
Frank Barchard
942c508448 BT.2020 Full Range yuvconstants
new color util to compute constants needed based on white point.

[ RUN      ] LibYUVColorTest.TestFullYUVV
hist	      -2	      -1	       0	       1	       2
red	       0	 1627136	13670144	 1479936	       0
green	  319285	 3456836	 9243059	 3440771	  317265
blue	       0	 1561088	14202112	 1014016	       0

Bug: libyuv:877, b/178283356
Change-Id: If432ebfab76b01302fdb416a153c4f26ca0832d6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2678859
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-02-06 00:26:55 +00:00
Yuan Tong
fc61dde1eb Add special optimization for I420ToI444 and I422ToI444
These functions use (bi)linear filter, to scale U and V planes to the size of Y plane.
This will help enhance the quality of YUV to RGB conversion.

Also added 10bit and 12bit version:
I010ToI410
I210ToI410
I012ToI412
I212ToI412

libyuv_unittest --gtest_filter=LibYUVConvertTest.I42*ToI444*:LibYUVConvertTest.I*1*ToI41*

R=fbarchard@chromium.org

Change-Id: Ie4a711a5ba28f2ff1f44c021f7a5c149022264c5
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2658097
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-02-03 10:53:02 +00:00
Frank Barchard
c28d404936 win32 build fix for I422ToRGBA
Bug:  libyuv:877, b/178713286
Change-Id: Iad55df99083b9a4bb9306e052e0e687e58570d96
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2657701
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-29 10:07:08 +00:00
Frank Barchard
39240f7149 Fix in row_gcc.cc to change subq to sub
subq is only available for x64
sub works for both 32 bit x86 and 64 bit x64

Fox in row_gcc.cc for 32 bit x86 running out of registers.

Fix in row_neon.cc for split function argb paramter name.

Bug: libyuv:877, b/178283356, b/178713286
Change-Id: If2b12a2d6168eab08005a2cdf2c17a470a924dd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2656771
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-01-28 19:34:29 +00:00
Yuan Tong
a85cc26fde Add MergeARGBPlane and SplitARGBPlane
These functions convert between planar and interleaved ARGB,
optionally fill 255 to alpha / discard alpha.

This can help handle YUV(A) with Identity matrix, which is
basically planar ARGB.

libyuv_unittest --gtest_filter=LibYUVPlanarTest.*ARGBPlane*:LibYUVPlanarTest.*XRGBPlane*

R=fbarchard@google.com

Change-Id: I522a189b434f490ba1723ce51317727e7c5eb112
Bug: libyuv:877
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2649887
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-27 19:33:51 +00:00
Frank Barchard
37480f12c6 Add BT.709 Full Range yuv constants.
MAKEYUVCONSTANTS macro to generate struct for YUV to RGB
Fix I444AlphaToARGB unit test for ARM by adjusting C version to match Neon implementation.

Bug: libyuv:879, libyuv:878, libyuv:877, libyuv:862, b/178283356
Change-Id: Iedb171fbf668316e7d45ab9e3481de6205ed31e2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2646472
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-01-26 18:36:56 +00:00
Yuan Tong
08d0dce5fc Add I422AlphaToARGB and I444AlphaToARGB
Bug: libyuv:878
Change-Id: I64c314326ac7ae5242acc64e20016e30adc6d17f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2639439
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-23 00:40:33 +00:00
Frank Barchard
93b1b332cd NV12 Bilinear upsampling bug fix
Reenable InterpolateRow_AVX2

Bug: libyuv:838, b/68638384, b/176195584
Change-Id: I990fcc204d89ee9b8f5264184558a08aa21d6a9f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2626067
Reviewed-by: Eugene Zemtsov <eugene@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-01-12 23:10:42 +00:00
Frank Barchard
1d3f901aa0 Scale bug fix with msan when scaling up in height and down in width with box filter.
runyuv3 Scale*Rotate_Box --libyuv_width=200 --libyuv_height=50

Bug: chromium:1158178, libyuv:875, b/176195584
Change-Id: Ic9a380179433bf3dffb951e7b5563491592d5aa5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2603877
Reviewed-by: Eugene Zemtsov <eugene@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2020-12-26 20:23:13 +00:00
Evan Shrubsole
dfaf7534e0 NV12 Copy, include scale_uv.h
Bug: None
Change-Id: I8148def3f1253913eb62fcc000e5f72704262a17
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2569748
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2020-12-08 18:54:16 +00:00
Frank Barchard
b7a1c5ee5d Scale by even factor low level row function
Bug: b/171884264
Change-Id: I6a94bde0aa05e681bb4590ea8beec33a61ddbfc9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2518361
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-11-03 21:25:18 +00:00
Frank Barchard
cec28e7088 PlaneScale, UVScale and ARGBScale test 3x and 4x down sample.
Intel SkylakeX
UVTest3x (1925 ms)
UVTest4x (2915 ms)
PlaneTest3x (2040 ms)
PlaneTest4x (4292 ms)
ARGBTest3x (2079 ms)
ARGBTest4x (1854 ms)

Pixel 2
ARGBTest3x (3602 ms)
ARGBTest4x (4064 ms)
PlaneTest3x (3331 ms)
PlaneTest4x (8977 ms)
UVTest3x (3473 ms)
UVTest4x (6970 ms)

Bug: b/171798872, b/171884264
Change-Id: Iebc70fed907857b6cb71a9baf2aba9861ef1e3f7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2505601
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-28 20:41:59 +00:00
Frank Barchard
5c4dc242f4 MJPGToNV12 added and build files sorted
Bug: None
Change-Id: I87aa64a14bb3f0785f984f492e56fcf2313431ce
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2502780
Reviewed-by: Evan Shrubsole <eshr@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-28 16:24:38 +00:00
Frank Barchard
a4ec5cf9c2 UVScale down use AVX2 and Neon for aarch32
Intel SkylakeX
Was SSSE3 UVScaleDownBy4_Box (2496 ms)
Now AVX2  UVScaleDownBy4_Box (1983 ms)

Was SSSE3 UVScaleDownBy2_Box (380 ms)
Now AVX2  UVScaleDownBy2_Box (360 ms)

Pixel 4 aarch32
Was UVScaleDownBy4_Box (4295 ms)
Now UVScaleDownBy4_Box (3307 ms)

Was UVScaleDownBy2_Box (1022 ms)
Now UVScaleDownBy2_Box (778 ms)

Bug: libuyv:838
Change-Id: Ic823fa15e5761c1b9a897da27341adbf1ed39883
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2470196
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-14 06:23:26 +00:00
Frank Barchard
725c64015d UVScale down by 4 use SSSE3/NEON
Intel SkylakeX
Was UVScaleDownBy4_Box (7421 ms)
Now UVScaleDownBy4_Box (2496 ms)

Pixel4
Was UVScaleDownBy4_Box (3510 ms)
Now UVScaleDownBy4_Box (2797 ms)

Bug: libuyv:838
Change-Id: Ibbde56e497b0706fbcb7b5ec4a991d40ca17f861
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2469050
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-13 22:32:25 +00:00
Frank Barchard
d730dc2f18 2x down sample for UV planes ported to SSSE3 / NEON
Bug: libuyv:838
Change-Id: Id9fb3282a3e86143d76b5e0cb557f0523a88b3c8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2465578
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-13 21:42:15 +00:00
Frank Barchard
385418a8e2 I420ToARGB prototype added to convert_from.h
Duplicate I420ToARGB prototype from convert_argb.h into convert_from.h for webrtc
Apply clang format for white spacing consistency.

Bug: libyuv:838, b/151375918
Change-Id: I0f667ca5350192710dbb135e92e73e18b46135e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2446613
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-02 21:05:10 +00:00
Frank Barchard
0b1e6ea6c9 scale neon adjust PRFM instruction to co-issue with math
Bug: libyuv:838, b/151375918
Change-Id: Ib0013fd971d700d2981b58e0aa1dd666e68fedd4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2443953
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-10-02 17:15:00 +00:00
Frank Barchard
e647902212 NV12Scale function and ScaleUV for packed UV plane bilinear scaling
Bug: libyuv:718, libyuv:838, b/168918847
Change-Id: I3300c1e7d51407b9c3201cf52b68e2e11346ff5f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2427868
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-09-29 23:49:05 +00:00
Frank Barchard
7a52fde1c4 NV12Scale function using split/merge on UV channal
Bug: libyuv:718, libyuv:838, b/168918847
Change-Id: I78b27baac50f0ce955e00cb6aaf7dfe5a0cb1e3d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2432067
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-09-28 20:13:21 +00:00
Frank Barchard
d6833cda38 ARGBSetRow_Any do memset for msan
Bug: b/169296991
Change-Id: Ia000cdbca0d0d95465e09535b67775ad3b885038
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2434383
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-09-28 19:16:12 +00:00
John Budorick
33503d9c9c Fix libyuv deps autoroll and roll chromium deps.
This includes:
  - fixing a handrolled raw exec-based DEPS parser that was failing
    to parse Str, similar to crbug.com/1106435.
  - rolling chromium forward by nearly a year. (The last roll that
    landed was crrev.com/c/1797295). This required a bunch of changes in
    order to be able to successfully sync, run gn, and compile:
    - switching the mirrors for three repositories to match chromium,
      which switched in crrev.com/c/2062580.
    - making libyuv write an empty gclient_args file
    - adding a few build_override gn arguments
    - adding nasm as a deps entry, as it's now required by libjpeg_turbo
    - android:
      - adding jdk, libunwindstack, and turbine
      - rolling the android sdk
      - rolling bazel and r8
      - rolling the cipd packages managed by third_party/android_deps
      - adding six and requests to .vpython for the test runner
      - switching to memcpy in a few places to avoid SIGBUS errors on
        arm due to unaligned reads
    - linux:
      - checking out instrumented libraries for msan (including adding
        depot_tools to deps for the hook)
    - mac:
      - adding mac_xcode_version to gclient_gn_args
    - win:
      - limit mac_toolchain to checkout_mac

Bug: 1063768, 1097306
Change-Id: Idd86fffcdac174fd2f7899243a56af4f1ed8077e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2384320
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
2020-09-15 06:21:24 +00:00
Lu Wang
b45db3c4af Fix failed unittest TestARGBQuantize.
Wrong stride used in the for block.
Change the stride of x from 8 to 16.

Change-Id: Ic0cddf8413d1bd2decf5752b7a92c16f0345f0fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2355693
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2020-08-17 17:26:09 +00:00
Hao Chen
0de9bf3b18 Fix two failed case after enabling msa optimization.
Failed case: LibYUVConvertTest.TestI400 and LibYUVPlanarTest.ARGBBlend_Unattenuated.
This patch updates the I400ToARGBRow_MSA and ARGBBlendRow_MSA functions in the row_msa.cc file.

Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Change-Id: Iec1a647af79be3ca1f2724802f6698deab60eac8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2330807
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-08-12 18:12:19 +00:00
Shiyou Yin
5c6cdd0747 ARGBToJ420 MMI and MSA version match C.
In commit 6cd1ff, C version has been updated.
This patch update the MMI and MSA version to mach C version.

Change-Id: Iea811e232f9c6019a80364d165f0255a37ce41b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227755
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2020-07-22 20:17:39 +00:00
Frank Barchard
6d603ec3f5 clamp C functions use compare
Intel
Was ARGBSubtract_Opt (1760 ms)
Now ARGBSubtract_Opt (1546 ms)

ARM
Was ARGBAdd_Opt (1747 ms)
Now ARGBAdd_Opt (1260 ms)

Bug: None
Change-Id: I52436f6390b6b7313f2a8820833bb4f60ae958be
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2299639
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-07-16 22:03:34 +00:00
Frank Barchard
1837f0022e Rollback of ARGBAttentuate
ARGBAttenuate AVX2 different than NEON/C

Was
C     ARGBAttenuate_Opt (1151 ms)
SSSE3 ARGBAttenuate_Opt (455 ms)
AVX2  ARGBAttenuate_Opt (296 ms)

Now
C     ARGBAttenuate_Opt (1765 ms)
SSSE3 ARGBAttenuate_Opt (355 ms)
AVX2  ARGBAttenuate_Opt (299 ms)

BUG=b/153564664

Change-Id: I2f027339552e399b90cc5ffeffde4255e9ff175b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2294488
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2020-07-13 21:55:13 +00:00
Frank Barchard
0b793d9fac Add J420AlphaToARGB and colortests for bt.709 and rec.2020
Bug: libyuv:864, b/159753166
Change-Id: If6ba742a0e7c5baeab29e8b92569aee361af88e9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2261568
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-06-24 00:57:28 +00:00
Frank Barchard
c5e45dcae5 Optimze ABGRToI420 for AVX2
libyuv_test --gunit_filter=*ABGRToI420_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1

Was SSSE3 ABGRToI420_Opt (324 ms)
Now AVX2  ABGRToI420_Opt (253 ms)

Bug: b/155989084
Change-Id: I4f3831e29b379be758f9d3fcb244be088bb1ca3c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2229606
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-06-04 18:24:45 +00:00
Shiyou Yin
ce5b333853 ARGBToI420 MMI and MSA version match C.
In commit 0b8bb6, C version has been updated.
This patch update the MMI and MSA version to mach C version.

Change-Id: Ib28da3629a8465990c8e2185278a95af8c27a31d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227754
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2020-06-04 04:51:05 +00:00
Shiyou Yin
db63668a24 Add MirrorUVRow_MSA.
Change-Id: Ic498d1175c3f916d0101b0fd8603b5cae994138b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227753
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-06-04 04:12:24 +00:00
Frank Barchard
8869628c24 Remove unnecessary include of convert_argb
Bug: libyuv:861, b/156642185
Change-Id: I3ddbe2f7b61629ed18b6879203203a51b3700773
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2219047
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-05-28 18:58:37 +00:00
Frank Barchard
94af5319f4 Remove M420 and refactor NV12ToI420
M420 is a row biplanar variation of NV12 supported on Microsoft webcams.
The code was hardcoded to bt.601 and should be jpeg, but the format is
very old and rare.  Is a variation on NV12, so if someone needs it, it
can be re-implemented easily.

Bug: libyuv:858
Change-Id: I246167dba3c190cc76af741b8e91e58e68fde28f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2212608
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-05-26 18:48:00 +00:00
Frank Barchard
da41bca02b I400ToARGBMatrix Pass a color matrix to use different coefficients
32 bit
Neon I400ToARGB_Opt (1937 ms)
64 bit
C I400ToARGB_Opt (8957 ms)
NEON I400ToARGB_Opt (2147 ms)

x86
cI400ToARGB_Opt (1110 ms)
AVX2 I400ToARGB_Opt (213 ms)
SSE2 I400ToARGB_Opt (225 ms)

Bug: libyuv:861, b/156642185
Change-Id: I96b6f4ebba6ff9c4ed8803291ce098de6f93fa4f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2209718
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-20 20:33:12 +00:00
Frank Barchard
d426247a3b YUV to RGB Matrix functions for color space support
Make all Matrix versions of conversions public.

Bug: libyuv:861, b/156642185
Change-Id: Ida067c95dd041b612e2bab64dbface58b257038a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2202748
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
2020-05-19 16:59:29 +00:00
Frank Barchard
84da59c168 ARGBAttenuate AVX2 rewritten to match NEON/C code
Bug: 665
Change-Id: If26fb389dabbca870a0e720f5258d6c9b2cde156
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2196904
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-13 03:58:10 +00:00
Frank Barchard
d13db1b437 RGB565ToI420 C matches SIMD ARGB4444, RGB565 and ARGB1555 C versions mimic AVX and Neon
Neon move prfm after loads for all functions.  Example performance improvement
Was
I444ToARGB_Opt (3275 ms)
I444ToNV12_Opt (1509 ms)
Now
I444ToARGB_Opt (2751 ms)
I444ToNV12_Opt (1367 ms)

Bug: libyuv:447
Change-Id: I78bf797b3600084c1eceb0be44cdbc9a575de803
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2189559
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-08 19:25:24 +00:00
Frank Barchard
6cd1ffb1b8 ARGBToJ420 and ARGBAttenuate make C match SIMD
Bug: libyuv:447
Change-Id: Ie1dd4a20fb8d5c96231dcfee9f8a0ac2edfb9bd8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2185629
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-06 23:10:19 +00:00
Frank Barchard
0b8bb60f2e ARGBToI420 C version match SIMD
Bug: libyuv:447
Change-Id: Iafb28cf635b355837caf41c26baee665642f4f95
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2181779
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-05-06 22:24:55 +00:00
Frank Barchard
7a61759f78 NV12Mirror and MirrorUVPlane functions added
HalfMergeUV AVX2 version

Skylake Xeon performance for 1280x720
NV12Mirror_Any (109 ms)
NV12Mirror_Unaligned (113 ms)
NV12Mirror_Invert (107 ms)
NV12Mirror_Opt (108 ms)
NV12Mirror_NullY (19 ms)

Slightly faster than comparable I420Mirror
I420Mirror_Any (113 ms)
I420Mirror_Unaligned (110 ms)
I420Mirror_Invert (109 ms)
I420Mirror_Opt (110 ms)

BUG=libyuv:840, libyuv:858

Change-Id: I686b1b778383bfa10ecd1655e986bdc99e76d132
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2176066
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-05-04 22:32:14 +00:00
Shiyou Yin
d9681c53b3 Refine conditional compilation for MSA and MMI.
This patch is a complement for commit bed9292f2cbba2f8f9ff0f1635a8aa17a311f2f9.
1. Supplement inspection for macro HAS_***TOUV*ROW_MMI/MSA.
2. Reduce calls to function TestCpuFlag().
3. Fix a mistake in source/convert.cc: line 1105.

Change-Id: I5e7f9fe367fa0f6d1db6f7644c5b48d4ad85fedb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2169342
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-29 19:13:23 +00:00
Shiyou Yin
bed9292f2c Move init process of msa after mmi.
Some processors support both MSA and MMI.
when they are enabled together, MSA will be preferd.
This patch move MSA initialization after MMI, so that
MSA can overide MMI and be setted to effective.

Change-Id: I8a52cce83ee4ec9727d47c99b287c9580329b149
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2155944
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-28 11:01:51 +00:00
Shiyou Yin
1cd417bda9 Use 8 bit RGB to Y coefficients for Y and YJ in MMI and MSA.
1. Switch to 8 bit precision.
2. Fix an error in the implementation of MMI and MSA.

About the error:
MMI and MSA implementation for RGBtoY and RGBToYJ used different
precision according to the C implementation( The C version has been
unified in commit fce0fed542001577e6b10f4cf859e0fa1774974e). This
patch unifies the precision to 8 bit for RGBToYJ in MMI and MSA.

Change-Id: Ic6a6e424d27a2f049b0c954f03174192d2beb091
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2155608
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-20 11:30:51 +00:00
Frank Barchard
2f48ffd42b HalfMergeUVPlane function and optimized I444ToNV12 and I444ToNV21
Bug: libyuv:858
Change-Id: Ie1f03a9acaff02ee8059cf1e5c2c2e5afcde8592
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2154608
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-04-17 19:22:29 +00:00
Frank Barchard
d4c3f45eb6 libyuv r1749 upstream for I444ToNV12
Bug: libyuv:858
Change-Id: Iacf70938ace6258e5bbd397cd78414f1025474c5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2154331
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-17 09:16:46 +00:00
Shiyou Yin
ca954a3419 Add unittest TestLinuxMipsMsaMmi.
This unittest help to test MipsCpuCaps.

Change-Id: I9e0ceeed0e5243446eaafa27e8de4c5f8163b09e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2133314
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-16 19:51:27 +00:00
Frank Barchard
f813b8a810 Refine function MipsCpuCaps.
1. Refactored function MipsCpuCaps.
2. allow msa and mmi can be enabled together.

Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Change-Id: I7330d0551a6a167e4c76d37e4defcc20783f5815
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2131145
Reviewed-by: Hsiu Wang <hsiu@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-01 18:49:34 +00:00
Frank Barchard
7e05059557 Apply clang format to libyuv source
Bug: None
Change-Id: Ifd16b59d7f0dbf4402dd5741bb89d1ec06dfaac8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2131868
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Hsiu Wang <hsiu@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-04-01 18:07:34 +00:00
Frank Barchard
aabcc477bd RGB24Mirror function
Bug: b/151960427
Change-Id: I413db0011a4ed87eefc0dd166bb8e076b5aa4b1d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2116639
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-03-24 20:13:08 +00:00
Frank Barchard
b5e223ac4c Upstream all libyuv changes to version 1746 Prefetch for all arm functions - helps performance at higher resolutions Make MirrorPlane function public.
Bug: libyuv:855
Change-Id: I4020face6b52767ee78d81870314285d63e98b95
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2113650
Reviewed-by: Hsiu Wang <hsiu@google.com>
2020-03-21 20:19:44 +00:00
Frank Barchard
3db22ebc4b RAWToJ400 and RGBToJ400 use 2 step row function for Intel. RAWToJ400 Was 3996 ms, now 3309. 20.7% faster.
Call a row function for each row, based on ARGBToI400 code.
But implement row functions as 2 step conversion.  Adds the
row functions:
RAWToYJ, RGBToYJ, SSSE3 and AVX2 versions, and Any versions.
The smaller row buffer is more cache friendly on large images.

The max cache size can be configured, and is currently:
// Maximum temporary width for wrappers to process at a time, in pixels.
And the row buffer is
  SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4]);
So 8192 bytes are used for the row buffer, leaving the rest for source
and destination buffers.

blaze-bin/third_party/libyuv/libyuv_test '--gunit_filter=*R*To?400_Opt' --libyuv_width=3600 --libyuv_height=2500 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 | sortms

Was
RAWToJ400_Opt (3996 ms)
ARGBToI400_Opt (3964 ms)
RGB24ToJ400_Opt (3960 ms)
ARGBToJ400_Opt (3909 ms)
RGBAToJ400_Opt (3885 ms)

Now
ARGBToJ400_Opt (4091 ms)
ARGBToI400_Opt (3936 ms)
RGBAToJ400_Opt (3428 ms)
RGB24ToJ400_Opt (3324 ms)
RAWToJ400_Opt (3309 ms)

Bug: libyuv:854, b/147753855
Change-Id: Ieb65fbda94e812c737f4c3c74107354b73c4bcd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2016203
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2020-01-23 03:23:38 +00:00
Frank Barchard
1cea4235af RAWToJ400 for big endian RGB to grey scale.
On Pixel 3
Was
BM_ConvertToGray/1280/720/3                        2360958 ns      2334984 ns         2999
BM_ConvertToGray/1279/721/3                        2360289 ns      2334329 ns         2994
BM_ConvertGrayTensorflowCoefficients/1280/720/3    2983296 ns      2947113 ns         2259
BM_ConvertGrayTensorflowCoefficients/1279/721/3    2871205 ns      2835359 ns         2170

Now
BM_ConvertToGray/1280/720/3                        2358469 ns      2334068 ns         2997
BM_ConvertToGray/1279/721/3                        2364584 ns      2336892 ns         2995
BM_ConvertGrayTensorflowCoefficients/1280/720/3     281312 ns       278244 ns        25170
BM_ConvertGrayTensorflowCoefficients/1279/721/3     351310 ns       347229 ns        20217

BUG=libyuv:854

Change-Id: If2192affc2d3219e0fb824737d75b9374a25d709
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2003236
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2020-01-16 00:29:11 +00:00
Frank Barchard
6e6f81b803 Floating point Gaussian kernels
On SkylakeX for 720p
TestGaussPlane_F32 (657 ms)

On Pixel3
TestGaussPlane_F32 (1787 ms)

Bug: libyuv:852, b/145611468
Change-Id: I9859af1b9381621067992305727da285f82bdded
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1949667
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Marat Dukhan <maratek@google.com>
2019-12-09 04:45:59 +00:00
Frank Barchard
d82f4baf5f Upstream minor changes. Faster tests, Faster YUV Rotate180 and Mirror
Bug: libyuv:840, libyuv:849: b/144318948
Change-Id: I303c02ac2b838a09d3e623df7a69ffc085fe3cd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1914781
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-13 20:02:40 +00:00
Frank Barchard
6502179e4c I210ToAR30 support for 422 10 bit to 10 bit RGB
BUG=960620, libyuv:845, b/129864744

Change-Id: I43b152568b7f297f81624d47e56a334c127be17b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1901465
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-06 19:37:22 +00:00
Frank Barchard
1f12946068 Add U444ToABGR, J444ToABGR, H444ToABGR, H444ToARGB and ConvertToARGB support
BUG=960620, libyuv:845, b/129864744

Change-Id: I9f80cda3be8e13298c596fac514f65a23a38d3d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1900310
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-05 22:11:20 +00:00
Frank Barchard
53e014c99d BT.2020 pull in tests and upstream fixes; expose a few more methods.
This adds some missing prototypes from the BT.2020 CL as well as expands
the H444 and J444 results.

BUG=960620, libyuv:845, b/129864744

Change-Id: I8ea3959379f1bb2edb857d4eb90fb9a1f6aa4e03
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1899093
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-11-05 20:01:59 +00:00
Dale Curtis
f15793d6af Add support for BT.2020.
This pulls in the changes that Firefox made to add BT.2020 support as well
as expands them to the existing 10-bit support. So we now have the following
input formats: U420, U422, U444, U010.

BUG=960620, libyuv:845

Change-Id: If0c47853a465d0ed660f849db08e71489fe1b9c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1884468
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2019-10-29 21:06:48 +00:00
Frank Barchard
4205d7a6c9 Fix for jpeg to allow fuzz
Bug: None
Change-Id: I6eecef4f755ffb9e3eeee9f8ca7890b3445b14a5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1884878
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-10-28 23:35:13 +00:00
Hans Wennborg
53b529e362 Remove #pragma clang loop vectorize_width
Recent versions of Clang started warning when the loop doesn't get
vectorized, such as when compiling with -Oz (see bug).

To fix the build, remove the pragma and let the compiler decide on its
own when to vectorize.

Bug: chromium:1015665
Change-Id: I40a610c9e0d94cfd577a6cd2b01e6fdaa08bef7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1872580
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Hans Wennborg <hans@chromium.org>
2019-10-21 20:18:43 +00:00
Frank Barchard
22f8aad8bc RAWToRGBA for 3 channel OCR
Replace ARM64 only row function with high level function
that implements SSSE3, 32 bit Neon and C.

Compared to 2 step RAWToARGB + ARGBToRGBA on row level:
3.1x faster on ARM
6.2% faster on Intel

BUG=b/140748379

Change-Id: Ia8636d9e4fcdbe10b8c2e81610a54728e29845cd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1860914
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-10-14 22:27:37 +00:00
Frank Barchard
fce0fed542 ARGBToY use 8 bit precision instead of 7 bit.
Neon and GCC Intel optimized, but win32 and mips not optimized.

BUG=libyuv:842, b/141482243

Change-Id: Ia56fa85c8cc1db51f374bd0c89b56d21ec94afa7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1825642
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2019-10-07 23:01:10 +00:00
Frank Barchard
c85a7b3ae3 MMI Optimized functions I422ToARGB for 1080p video
Improves playback performance for 1080p video on www.youku.com

BUG=libyuv:841

Change-Id: Iabe7693fba276162af0290863f46e214ab86fb6c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1790959
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2019-09-11 21:06:21 +00:00
Frank Barchard
0bb2773a39 AVX2 versions of ABGRToNV12 and ABGRToNV21
BUG=libyuv:833

Change-Id: I9b6653e9c304b4e0805b7d3c8408ce57009c8559
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1740682
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-08-07 18:16:34 +00:00
Frank Barchard
9b63884a3e Add ABGRToNV21 and ABGRToNV12
Fix ARGBToUVJRow_AVX2 constants for win32

BUG=libyuv:833, libyuv:839

Change-Id: Id4731a573d40d7a9b46fcc31c2fee295483e1ff6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1739509
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Hirokazu Honda <hiroh@chromium.org>
2019-08-07 01:29:13 +00:00
Frank Barchard
fec9121b67 SwapUV AVX2 and SSSE3
Based on ARGBShuffle but with count adjusted and new shuffle mask

BUG=libyuv:809

Change-Id: Idd936ee6bedcf285607a68c2fc54d876b4becc01
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1711882
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-26 18:41:40 +00:00
Frank Barchard
f1c00932df NV21 unittest and benchmark
BUG=libyuv:809

Change-Id: I75afb5612dcd05820479848a90ad16b07a7981bc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1707229
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-18 02:13:02 +00:00
Frank Barchard
f9aacffa02 Fix arm unittest failure by removing unused FloatDivToByteRow.
Apply clang-format to fix jpeg if() for lint fix.
Change comments about 4th pixel for open source compliance.
Rename UVToVU to SwapUV for consistency with MergeUV.

BUG=b/135532289, b/136515133

Change-Id: I9ce377c57b1d4d8f8b373c4cb44cd3f836300f79
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1685936
Reviewed-by: Chong Zhang <chz@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-07-02 20:00:30 +00:00
Chong Zhang
c6dcbdfaac Sync up Android.bp file with master
Change-Id: I708b2253902cb2d3a78cf7a334f8846dd732b7d1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1682526
Reviewed-by: Chong Zhang <chz@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Chong Zhang <chz@google.com>
2019-07-02 00:15:11 +00:00
Frank Barchard
09cfb2bbd6 Update to r1732 for more robust jpeg
Includes a rounding change for neon.
BUG=b/135532289

Change-Id: I36ffb57b55db6c64804ad169def865be1ac6d66e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1684439
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
2019-07-01 22:32:36 +00:00
Frank Barchard
af9bc4f67c RGB24ToJ420 for full range YUV
BUG=b/249563884

Change-Id: I41b45b274313ec22f5e3799000242da1ec692586
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1629527
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2019-05-29 02:40:22 +00:00
Frank Barchard
681c6c6739 Add LIBYUV_API to NV12ToABGR and I444Rotate, I444Scale
Gaussian blur low levels ported to 32 bit neon.
But they are not hooked up to anything but a unittest.

Bug:b/248041731, b/132108021, b/129908793
Change-Id: Iccebb8ffd6b719810aa11dd770a525227da4c357
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1611206
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
2019-05-14 01:18:06 +00:00
Emmanuel Weber
05f72b8602 add I444Scale and I444Rotate
Bug: b:132108021
Change-Id: Ife6abbd54c4620984e412c9244c6b65fe4c7946a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1597418
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2019-05-06 23:43:11 +00:00
Frank Barchard
413a8d8041 Add AYUVToNV12 and NV21ToNV12
BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*ToNV12* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1

R=rrwinterton@gmail.com

Change-Id: Id03b4613211fb6a6e163d10daa7c692fe31e36d8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1560080
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2019-04-12 17:48:45 +00:00
Frank Barchard
5b6042fa0d add YUV24 and AYUV formats
Alternatives to RGB24 and AYUV for working with GPU.

BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*NV21To???24* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
R=rrwinterton@gmail.com

Change-Id: I5559c63f4bd4c847492fcb1571f7b03c58146689
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1501735
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2019-03-05 02:53:56 +00:00
Frank Barchard
12f9b5f351 Add commment for jpeg parameters.
Bug: None
Test: Try bots
Change-Id: I7b90731e828169af96b3e0b8f8821635cff57755
Reviewed-on: https://chromium-review.googlesource.com/c/1308819
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-11-01 18:18:50 +00:00
Frank Barchard
c2ae68114a Fix for AVX2 crash in I420ToRGB24
I422ToRGB24 is implemented as a C wrapper for Intel, calling
I422ToARGB and ARGBToRGB24.  The ARGBToRGB24 for AVX2 requires 32
pixels.
This CL increases the width alignment required to use I422ToRGB24_AVX2

TBR=rrwinterton0gmail.com

Bug: libyuv:822, b:118386049
Change-Id: I4454f4eece33fbd5f593655f577c9ef5c00d1f63
Tested: locally tested with app that crashed using this function.
Reviewed-on: https://chromium-review.googlesource.com/c/1299931
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-10-29 19:41:53 +00:00
Frank Barchard
b36c86fdfe Port box filter to NEON
Bug: libyuv:821
Change-Id: I4a6b9bee2c2fae199c73c9ec7ecb32bde37c1852
Tested: out/Release/libyuv_unittest --gtest_filter=*ScaleFrom1920x1080_Box --libyuv_width=160 --libyuv_height=90 --libyuv_repeat=1000
Reviewed-on: https://chromium-review.googlesource.com/c/1298598
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-10-25 18:56:29 +00:00
Frank Barchard
1fe0613c3f MJPGToNV21
Add jpeg to NV21 conversions, unittests and conversions
for I444, I422, I420 and I420 to NV21 needed for internals.

Bug: libyuv:820
Change-Id: Idf0f15f91307e80a82cd23943f6eed5508f13fe2
Tested: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*MJ*
Reviewed-on: https://chromium-review.googlesource.com/c/1297710
Reviewed-by: Johann Koenig <johannkoenig@google.com>
2018-10-24 22:01:13 +00:00
Frank Barchard
97b3990dec NV21ToRAW and NV12ToRAW functions added
RAW is a big endian style RGB buffer with R first in memory, then G and B.
Convert NV21 and NV12 to RAW format.

Performance on SkylakeX for 720p with AVX2
I420ToRAW_Opt (388 ms)
H420ToRAW_Opt (371 ms)
NV12ToRAW_Opt (341 ms)
NV21ToRAW_Opt (339 ms)

SSSE3
I420ToRAW_Opt (507 ms)
H420ToRAW_Opt (481 ms)
NV12ToRAW_Opt (498 ms)
NV21ToRAW_Opt (493 ms)

C
I420ToRAW_Opt (2287 ms)
H420ToRAW_Opt (2246 ms)
NV12ToRAW_Opt (2191 ms)
NV21ToRAW_Opt (2204 ms)

Performance on Pixel 2 for 720p
out/Release/bin/run_libyuv_unittest -v -t 7200 --gtest_filter=*NV??ToR*Opt --libyuv_repeat=1000 --libyuv_width=1280 --libyuv_height=720
LibYUVConvertTest.NV12ToRGB24_Opt (1739 ms)
LibYUVConvertTest.NV21ToRGB24_Opt (1734 ms)
LibYUVConvertTest.NV12ToRAW_Opt (1719 ms)
LibYUVConvertTest.NV21ToRAW_Opt (1691 ms)
LibYUVConvertTest.NV12ToRGB565_Opt (2152 ms)

Bug: libyuv:778, b:117522975
Test: add new NV21ToRAW and NV12ToRAW tests
Change-Id: Ieabb68a2c6d8c26743e609c5696c81bb14fb253f
Reviewed-on: https://chromium-review.googlesource.com/c/1272615
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2018-10-10 18:11:10 +00:00
Frank Barchard
20bf569a04 Fix ConvertToI420() for odd crop_y
The original src_u calculation of FOURCC_I420 shifted half width if
crop_y is odd.
This CL fixs the problem and also add a test case for it.

Bug: b:115278653
Test: pass libyuv_unittest
Change-Id: Ia9732d22e64e13de26df47726ba44ad1c5a06484
Reviewed-on: https://chromium-review.googlesource.com/c/1258743
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-10-03 19:14:01 +00:00
Frank Barchard
67eff529ad ubsan fix for 16 bit scaling
Bug: libyuv:813
Test: tested downstream for ubsan.
Change-Id: I28c1d4e815348d051f781c9b7d8197f74905cab7
Reviewed-on: https://chromium-review.googlesource.com/1173721
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-08-14 17:40:44 +00:00
lixia zhang
bf69adfd64 libyuv:loongson Correct the optimization of mmi on loongson3a platform.
When loading or storing the data, the unaligned address will greatly degrade
the optimization performance, so non-aligned access instructions are required
on the loongson platform.

Also delete the optimization function:ScaleARGBFilterCols_MMI,
because it degraded the performance.

BUG=libyuv:804
R=fbarchard@chromium.org

Change-Id: If4c15886a21cdcbac7ae8b336292e4549acf1e47
Reviewed-on: https://chromium-review.googlesource.com/1164627
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-08-11 09:27:20 +00:00
Chong Zhang
b6b1c273a2 libyuv: choose matrix for YUV to RGB565 conversion
bug: 109762970
Change-Id: Iccfdc5dded2dc7695f8a7795b2f32b6401efea0d
Reviewed-on: https://chromium-review.googlesource.com/1169687
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-08-10 19:16:34 +00:00
Martin Storsjö
9b772abf97 Restore the file mode for source files
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.

Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-08-06 18:53:32 +00:00
Frank Barchard
57de382902 MMI ifdef guards and add source to various build files.
Bug: libyuv:810,libyuv:811
Test: cmake . && make
Change-Id: I521b45ccb6e49ff70823e415efa99fc5b9daad99
Reviewed-on: https://chromium-review.googlesource.com/1162503
Reviewed-by: Johann Koenig <johannkoenig@google.com>
2018-08-03 18:37:23 +00:00
lixia zhang
21be9122aa libyuv:loongson optimize compare/row/scale/rotate files with mmi.
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).

In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.

BUG=libyuv:804

Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-07-20 22:53:04 +00:00
Frank Barchard
9ac881f4aa msa use void * for loads
the built in __msa_ld_b() expects a void * without const.
Cast pointers to void * to avoid build warning.

TBR=johannkoenig@google.com
Bug: libyuv:805
Change-Id: Iabc4820ecf4a3a7dcb0063e67ce276ae2a4f0501
Tested: gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest
Reviewed-on: https://chromium-review.googlesource.com/1125400
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-07-04 00:24:19 +00:00
Frank Barchard
4d67b3e851 Add H420 and H422 to ConvertToARGB()
H420/H422 are bt.720 variants

TBR=braveyao@chromium.org
BUG=libyuv:799
TESTED=try bots tested build on all platforms

Change-Id: I007d8981d91ca0748c59403759109bbcd88f286c
Reviewed-on: https://chromium-review.googlesource.com/1115719
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-06-26 22:52:42 +00:00
Johann
a37e7bfece use unix line endings
Consistently use one style of line endings for the repository

Change-Id: Idd70e3d7f3a7a6641b268a81e51eebf9c705b67d
Reviewed-on: https://chromium-review.googlesource.com/1107877
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-06-20 23:19:59 +00:00
Johann
bf25313b83 add const to msa loads
Avoid warnings regarding loss of qualifiers:
warning: cast from type ‘const uint8_t* {aka const unsigned char*}’
to type ‘v16i8* {aka __vector(16) signed char*}’ casts away
qualifiers

BUG=libyuv:793

Change-Id: Ie0d215bc07b49285b5d06ee91ccc2c9a7979799e
Reviewed-on: https://chromium-review.googlesource.com/1107879
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-06-20 22:56:09 +00:00
Frank Barchard
083aa718b9 Add AR30 and AB30 to ConvertToARGB() and fix negative NV12 height
BUG=libyuv:799
TESTED=try bots build

Change-Id: Ib4ce8d928069445a710c1e30ea85d9dccc820b6c
Reviewed-on: https://chromium-review.googlesource.com/1097561
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-06-12 19:04:40 +00:00
Frank Barchard
196e2e72a3 Revert "Allow negative height when ConvertToI420/ARGB is called with NV12/NV21"
This reverts commit a8aa921c4614f9d6a0e8f3459648ca1ae75cdbe6.

Reason for revert: breaks a webrtc unittest on Windows.

https://bugs.chromium.org/p/webrtc/issues/detail?id=9263&can=2&start=0&num=100&q=&colspec=ID%20Pri%20M%20ReleaseBlock%20Component%20Status%20Owner%20Summary&groupby=&sort=

Original change's description:
> Allow negative height when ConvertToI420/ARGB is called with NV12/NV21
> 
> ConvertToI420 and ConvertToARGB support the use of a negative height
> parameter to flip the image vertically. When converting from NV12 or
> NV21 this parameter was misinterpreted, resulting in invalid output.
> This CL introduces the use of abs_src_height to correctly calculate
> the location of the source UV plane.
> 
> The sign of crop_height is not used, to reduce confusion ConvertToI420
> and ConvertToARGB no longer accept negative crop height.
> 
> Unit tests for Android420ToI420 are updated to fix miscalculation of
> src_stride_uv, fix incorrect pixel strides, and to test inversion.
> New unit tests are included to test inversion for ConvertToARGB,
> ConvertToI420, Android420ToARGB, and Android420ToABGR.
> For consistency the test NV12Crop is renamed ConvertToI420_NV12_Crop.
> 
> Bug: libyuv:446
> Test: out/Release/libyuv_unittest --gtest_filter=*.ConvertTo*:*.Android420To*
> Change-Id: Idc98e62671cb30272cfa7e24fafbc8b73712f7c6
> Reviewed-on: https://chromium-review.googlesource.com/994074
> Commit-Queue: Frank Barchard <fbarchard@chromium.org>
> Reviewed-by: Frank Barchard <fbarchard@chromium.org>

TBR=fbarchard@chromium.org,robert@bares.me

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: libyuv:446, chromium:9263, libyuv:801
Change-Id: I7c55b3fcb477f9754c249b9c2c54b24da2c29283
Reviewed-on: https://chromium-review.googlesource.com/1081267
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-06-01 00:19:40 +00:00
Frank Barchard
a7fb978e30 ARGBExtractAlphaRow_Any_AVX2 fix pixel count mask
Mask was set to 32, but should have been 31.
BUG=libyuv:798
TESTED=try bots tested

Change-Id: I6120928873a4a2f1efef907d8e8296ca8c20bb03
Reviewed-on: https://chromium-review.googlesource.com/1054830
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-05-11 07:13:58 +00:00
Robert Bares
a8aa921c46 Allow negative height when ConvertToI420/ARGB is called with NV12/NV21
ConvertToI420 and ConvertToARGB support the use of a negative height
parameter to flip the image vertically. When converting from NV12 or
NV21 this parameter was misinterpreted, resulting in invalid output.
This CL introduces the use of abs_src_height to correctly calculate
the location of the source UV plane.

The sign of crop_height is not used, to reduce confusion ConvertToI420
and ConvertToARGB no longer accept negative crop height.

Unit tests for Android420ToI420 are updated to fix miscalculation of
src_stride_uv, fix incorrect pixel strides, and to test inversion.
New unit tests are included to test inversion for ConvertToARGB,
ConvertToI420, Android420ToARGB, and Android420ToABGR.
For consistency the test NV12Crop is renamed ConvertToI420_NV12_Crop.

Bug: libyuv:446
Test: out/Release/libyuv_unittest --gtest_filter=*.ConvertTo*:*.Android420To*
Change-Id: Idc98e62671cb30272cfa7e24fafbc8b73712f7c6
Reviewed-on: https://chromium-review.googlesource.com/994074
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-04-19 02:41:27 +00:00
Johann
b8696fde84 add const to casts
When casting a const value, ensure the cast is const as well.

BUG=webm:1509

Change-Id: I5b597fdcc148d111e9824bc7cf918fc5f24e970f
Reviewed-on: https://chromium-review.googlesource.com/996553
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-04-13 22:52:52 +00:00
Frank Barchard
7e5e12757b use attribute to alias for punning float to int
Bug: libyuv:791
Test: g++ -Iinclude -I../libvpx/third_party/libwebm -I../libvpx/vp8 -I../libvpx/vp8 -I../libvpx/vp9 -I../libvpx/vp9 -Iinclude -m64 -DNDEBUG -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -Wall -Wdisabled-optimization -Wfloat-conversion -Wpointer-arith -Wtype-limits -Wcast-qual -Wvla -Wuninitialized -Wunused -Wextra -I. -I"../libvpx" -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -Wno-unused-parameter -c -o third_party/libyuv/source/row_common.cc.o source/row_common.cc
Change-Id: Ia006cb9212b671ae668cab5ec0b29759024a2c8a
Reviewed-on: https://chromium-review.googlesource.com/1012462
Reviewed-by: Johann Koenig <johannkoenig@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-04-13 19:20:52 +00:00
Johann
190fb79ced row_common.cc: add const to cast
When casting input for loads, include modifiers such as 'const'

Clears build warnings:

warning: cast from type 'const uint8_t* {aka const unsigned char*}' to
type 'uint32_t* {aka unsigned int*}' casts away qualifiers [-Wcast-qual]

Bug: webm:1509, libyuv:791
Test: g++ -Iinclude -I../libvpx/third_party/libwebm -I../libvpx/vp8 -I../libvpx/vp8 -I../libvpx/vp9 -I../libvpx/vp9 -Iinclude -m64 -DNDEBUG -O3 -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=0 -D_LARGEFILE_SOURCE -D_FILE_OFFSET_BITS=64 -Wall -Wdisabled-optimization -Wfloat-conversion -Wpointer-arith -Wtype-limits -Wcast-qual -Wvla -Wuninitialized -Wunused -Wextra -I. -I"../libvpx" -D__STDC_CONSTANT_MACROS -D__STDC_LIMIT_MACROS -Wno-unused-parameter -c -o third_party/libyuv/source/row_common.cc.o source/row_common.cc




Change-Id: I1e3b2fe2a4ae9dd466c3db9cde0560aceb9d1398
Reviewed-on: https://chromium-review.googlesource.com/996393
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-04-13 18:18:07 +00:00
Frank Barchard
a9626b9daf Disable AVX512 for iOS simulator xcode 9 builds.
iOS simulator has the option to build with xcode instead of clang.
GN use_xcode_clang=true enables the xcode build.
As of version Xcode 9.2, the clang version used does not support
AVX512.  The version reported is version 9, but for normal clang,
version 7 is sufficient to AVX512.
When a version of XCode does support AVX512, the version check can
be updated to allow AVX512 for newer versions of XCode.
with XCode 9.2 the following macro is set.
__APPLE_CC__ 6000

Bug: libyuv:789
Test: gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"x86\" use_xcode_clang=true"
Change-Id: I5a9a0b4a2760c7d09a4bcb464b3668979113b07e
Reviewed-on: https://chromium-review.googlesource.com/991595
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-04-03 18:45:14 +00:00
Frank Barchard
816b7b1279 Add __attribute__ ((__target__ ("avx512vbmi")))
Bug: libyuv:789
Test: builds locally on linux with clang
Change-Id: I3000494d4b0b18f59d7852bc1bc0c9e422d2d63a
Reviewed-on: https://chromium-review.googlesource.com/987331
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-30 17:17:35 +00:00
Frank Barchard
4ad33344cf Pass float parameters via vector 2 float and "w" for scalar multiply.
Scalar multiply expects a 'd' register.  The "w" (float) uses 's' for float
and wont work with the multiply in 32 bit (it does in 64 bit).
A vector 2 of float passes as 'd' register.
A vector 4 of float passes as 'q' register.
This change copies the float into the first entry of a vector 2
and passes that.  The optimizer removes the extra copy, allowing
the single float to use referenced as

Test: LibYUVPlanarTest.TestByteToFloat
Bug: libyuv:786
Change-Id: I8773c5bae043c7b84e1d1db7fdea6731aa0b1323
Reviewed-on: https://chromium-review.googlesource.com/973984
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-03-28 21:52:08 +00:00
Frank Barchard
548ec65656 Require clang 6 for AVX512 support
row.h adds CLANG_HAS_AVX512
function ifdefs in row.h for avx512
source code ifdefed function by function for
avx512 and avx2.

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: If32b51459685d0d5785c5c1e94c8f668f8e74b55
Reviewed-on: https://chromium-review.googlesource.com/982402
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-28 02:38:39 +00:00
Jay Civelli
fdad6299d6 Add a method to force the CPU flags
Adds a method that forces the CPU flags. Useful when using libyuv inside
a sandboxed process which may not have access to the file system.

Bug: libyuv:787
Change-Id: I01f71e39a7301085d9de388eba930b4cac0fd7be
Reviewed-on: https://chromium-review.googlesource.com/972338
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-03-26 19:31:00 +00:00
Frank Barchard
9d70f13c8f cpuid sandbox friendlier avoiding getenv()
Move getenv to unittest.cc to allow libyuv to be
run in sandbox for x86, x64 and aarch64

Bug: libyuv:767
Test: unittests still run and respect environment variables
Change-Id: I84cb1717977828776142b51c029774b3e6b142a3
Reviewed-on: https://chromium-review.googlesource.com/969645
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-20 01:04:30 +00:00
Frank Barchard
83aa7512c1 AVX512 VMBI version of ARGBToRGB24
Use VMBI instructions but on AVX2 registers to avoid clockrate change.

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: Id4f8ad1e0e142a380c8a46c5eab90ce145a10edd
Reviewed-on: https://chromium-review.googlesource.com/956609
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-10 02:04:48 +00:00
Frank Barchard
004954c969 cpu disables for AVX 512 and unittest show decimal
Change unittest flags to decimal so they can be used for --libyuv_cpu_info=
Add environment variables to disable AVX 512 bits.

Bug: libyuv:784
Test: LibYUVBaseTest.TestCpuHas
Change-Id: Iea6704368fbe9f6d3395933da7993fb2a3453225
Reviewed-on: https://chromium-review.googlesource.com/957704
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-10 00:59:14 +00:00
Frank Barchard
1d509f2178 ARGBToRGB24_AVX2 version
AVX2 port of SSSE3 conversion to output 24 bit RGB

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I14f7815522d1b790ecd2bb39d9a3441e803b694a
Reviewed-on: https://chromium-review.googlesource.com/953303
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-08 02:38:21 +00:00
Frank Barchard
3009890c11 NV21ToRGB24_AVX2 and SSSE3
Use 2 step conversion for NV21ToRGB24 to leverage AVX2
low levels instead of C.

Was C
NV21ToRGB24_Opt (882 ms)

Now SSSE3
NV21ToRGB24_Opt (218 ms)

Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I58faf766bbec4cc595aab2e217f6c874dd4b4363
Reviewed-on: https://chromium-review.googlesource.com/951629
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-03-07 03:58:48 +00:00
Frank Barchard
368ac76acf clang-tidy fix for MJPEGToI420 and MJPEGToARGB
Make parameters match in the code to the header.

TBR=braveyao@chromium.org
Bug: libyuv:782
Test: try bots still build
Change-Id: Id53fa2fe988aee5e125d87bc5fe70cce6b275403
Reviewed-on: https://chromium-review.googlesource.com/938948
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-02-27 08:37:55 +00:00
Frank Barchard
85722f5d93 ByteToFloatRow_NEON to convert and scale bytes to floats
Each byte is converted to float (0.0 to 255.0) and then multiplied
by a scale parameter.

Bug: None
Test: arm 64 build passes.
Change-Id: I04736798540b8d985f60abdf0388e24a209d075b
Reviewed-on: https://chromium-review.googlesource.com/930226
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ian Field <ianfield@google.com>
2018-02-24 00:34:07 +00:00
Frank Barchard
0ea50cbc74 NV21ToRGB24_NEON conversion
32 bit thumb2 performance:
NV12ToARGB_Opt (472 ms)
NV21ToARGB_Opt (466 ms)
NV12ToRGB24_Opt (457 ms)
NV21ToRGB24_Opt (457 ms)
NV12ToRGB565_Opt (501 ms)

Bug: libyuv:778
Test: add new NV21ToRGB24 test
Change-Id: I330585789835c79ee4b4da61d164716598268df3
Reviewed-on: https://chromium-review.googlesource.com/924646
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-22 22:24:24 +00:00
Frank Barchard
5f0354bde5 clang-tidy and clang-format applied reland
row_neon.cc manually editted for clang format bugs

TBR=braveyao@chromium.org

Bug: None
Test: local arm builds still pass
Change-Id: Ida4aac2f4ee354e2c1bd354b06e76a26b3c0becc
Reviewed-on: https://chromium-review.googlesource.com/930165
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-02-21 23:30:38 +00:00
Frank Barchard
9c0663d7ce Revert "clang-tidy and clang-format applied"
This reverts commit cfff527a4738cbd125f788937c503558d225d9fa.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> clang-tidy and clang-format applied
> 
> TBR=braveyao@chromium.org
> Bug: None
> Test: local arm builds still pass
> Change-Id: Iac042fbaad940e01fc4ce228a104d3d561b80f92
> Reviewed-on: https://chromium-review.googlesource.com/929999
> Reviewed-by: Frank Barchard <fbarchard@chromium.org>

TBR=fbarchard@chromium.org,braveyao@chromium.org

Change-Id: I4ee92ceeaa3c34bce3f20bf759dd30593807ad3f
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: None
Reviewed-on: https://chromium-review.googlesource.com/930141
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-02-21 23:21:07 +00:00
Frank Barchard
cfff527a47 clang-tidy and clang-format applied
TBR=braveyao@chromium.org
Bug: None
Test: local arm builds still pass
Change-Id: Iac042fbaad940e01fc4ce228a104d3d561b80f92
Reviewed-on: https://chromium-review.googlesource.com/929999
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-02-21 22:44:53 +00:00
Frank Barchard
18c9ab106c Rotate ARGB using scale_row.h header
ARGB rotation using scaling code.  Previously it had forward
declarations of the low level row functions used.  This CL
uses the header and hooks up Any and MSA versions of the code.

Bug: libyuv:779
Test: perf record out/Release/libyuv_unittest --gtest_filter=*ARGBRotate90_Opt --libyuv_width=640 --libyuv_height=359 --libyuv_repeat=999
Change-Id: Ifacd58b26bb17a236181a404fad589fd2543b911
Reviewed-on: https://chromium-review.googlesource.com/927530
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2018-02-21 00:53:53 +00:00
Frank Barchard
3d6b5658d7 AR30ToARGB using shifts and masking to vectorize
AR30ToARGB will vectorize if the output is masked
together as an int instead of 4 byte stores.
Performance is 2x faster
Was AR30ToARGB_Opt (1585 ms)
Now AR30ToARGB_Opt (746 ms)

Bug: libyuv:777
Test:LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: Idd47ae599d5d125207bb53e618d6d7e784d4a37c
Reviewed-on: https://chromium-review.googlesource.com/923169
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-02-16 18:55:38 +00:00
Frank Barchard
9c9215b218 End swap 10 bit RGB
Bug: libyuv:777
Test: None
Change-Id: I69b81f51c50d7739cfdb3cfb0c3d315c32bd63d2
Reviewed-on: https://chromium-review.googlesource.com/923042
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-02-15 23:50:40 +00:00
Frank Barchard
6630558875 10 bit YUV to 10 bit BGR
BGR variation of 10 bit conversion using swapped U and V
and mirrored matrix to produce AB30 format instead of AR30.

Bug: libyuv:777
Test: LibYUVConvertTest.H010ToAB30_Opt
Change-Id: I96d115a5d1e12138f40cb548871e03aa3ab210eb
Reviewed-on: https://chromium-review.googlesource.com/922284
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-02-15 22:44:36 +00:00
Frank Barchard
8a00c2bb4d Tidy applied with all safe checks on all arm, mips and intel, 32 and 64 bit
Using clang-tidy 7.
warnings=-*,mpi-*,objc-*,llvm-*,hicpp-*,-hicpp-use-noexcept,llvm-*,-hicpp-deprecated-headers,-hicpp-use-auto,bugprone-*,cert-*,google-*,-google-readability-casting,misc-*,,-misc-unused-parameters,-misc-macro-parentheses,cppcoreguidelines-*,-cppcoreguidelines-pro-type-member-init,readability-*,-readability-non-const-parameter,-readability-implicit-bool-conversion,fuchsia-*,-fuchsia-multiple-inheritance,-android-cloexec-*

~/bin/clang-tidy -fix-errors -format-style=file -checks=$warnings $* -- -Iinclude -D__ARM_NEON__ -D__arm__   -D__clang__ -D__clang_major__=6 -DHAVE_JPEG
~/bin/clang-tidy -fix-errors -format-style=file -checks=$warnings $* -- -Iinclude -D__mips_msa               -D__clang__ -D__clang_major__=6 -DHAVE_JPEG
~/bin/clang-tidy -fix-errors -format-style=file -checks=$warnings $* -- -Iinclude -D__aarch64__              -D__clang__ -D__clang_major__=6 -DHAVE_JPEG
~/bin/clang-tidy -fix-errors -format-style=file -checks=$warnings $* -- -Iinclude -D_MSC_VER=1600 -D_M_IX86  -D__clang__ -D__clang_major__=6 -DHAVE_JPEG
~/bin/clang-tidy -fix-errors -format-style=file -checks=$warnings $* -- -Iinclude -D_MSC_VER=1600 -D_M_X64   -D__clang__ -D__clang_major__=6 -DHAVE_JPEG
~/bin/clang-tidy -fix-errors -format-style=file -checks=$warnings $* -- -Iinclude -D__i386__                 -D__clang__ -D__clang_major__=6 -DHAVE_JPEG
~/bin/clang-tidy -fix-errors -format-style=file -checks=$warnings $* -- -Iinclude -D__x86_64__               -D__clang__ -D__clang_major__=6 -DHAVE_JPEG

Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: Ieb0f026c5b5a1d2daf8aca18b9290927fdaaa55c
Reviewed-on: https://chromium-review.googlesource.com/907853
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2018-02-12 18:34:33 +00:00
Frank Barchard
9a765f01bc Revert "tidy applied with readability-*"
This reverts commit 7b9ff4a0355c778f2cf03bdb15029d60a1259061.

Reason for revert: ios build bots are red

Original change's description:
> tidy applied with readability-*
> 
> TBR=braveyao@chromium.org
> Bug: libyuv:750
> Test: builds and runs and passes more tidy tests
> Change-Id: I316822f7d13b370b88b92a693912e880b21f92c8
> Reviewed-on: https://chromium-review.googlesource.com/907371
> Reviewed-by: Frank Barchard <fbarchard@chromium.org>

TBR=fbarchard@chromium.org,braveyao@chromium.org

# Not skipping CQ checks because original CL landed > 1 day ago.

Bug: libyuv:750
Change-Id: I4a73ffee2b71664c6cb93f38f2b5d70ebd76953e
Reviewed-on: https://chromium-review.googlesource.com/912175
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-02-09 19:41:26 +00:00
Frank Barchard
7b9ff4a035 tidy applied with readability-*
TBR=braveyao@chromium.org
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I316822f7d13b370b88b92a693912e880b21f92c8
Reviewed-on: https://chromium-review.googlesource.com/907371
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-02-08 18:13:01 +00:00
Frank Barchard
b792e0dbc1 tidy applied with all cppcoreguidelines and google
TBR=braveyao@chromium.org
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I1400a915ee5734c38d19dab9cf1f210ca43d17fc
Reviewed-on: https://chromium-review.googlesource.com/905810
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-02-07 02:28:25 +00:00
Frank Barchard
e1f6c1c0b5 tidy applied with readability-inconsistent-declaration-parameter-name
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281
Reviewed-on: https://chromium-review.googlesource.com/902170
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2018-02-07 00:24:25 +00:00
Frank Barchard
5790a765b9 I422ToUYVYRow_AVX2 use vpmovzxbd instead of vpermq
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2018-02-02 23:57:35 +00:00
Frank Barchard
7ff53f324c I422ToYUY2Row_AVX2 use vpmovzxbd instead of vpermq
I422ToYUY2Row_AVX2 optimized from 7 cycles per 32 pixels to 6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu    (%1),%%xmm2
vmovdqu    0x00(%1,%2,1),%%xmm3
lea        0x10(%1),%1
vpermq     $0xd8,%%ymm2,%%ymm2
vpermq     $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2

..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd  (%1),%%ymm2
vpmovzxbd  0x00(%1,%2,1),%%ymm3
vpslld     $0x10,%%ymm3,%%ymm3
vpor       %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.

Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt

I422ToYUY2Row_AVX2 optimization

Improve performance of AVX2 code by avoiding vpermq

Bug: libyuv:556
Test: /usr/local/google/home/fbarchard/iaca-lin64/bin/iaca.sh -reduceout -arch BDW out/Release/obj/libyuv_internal/row_gcc.o
Change-Id: Ie36732da23ecea1ffcc6b297bacc962780b59ef1
Reviewed-on: https://chromium-review.googlesource.com/898067
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-02 18:57:49 +00:00
Frank Barchard
664c735677 I420ToYUY2_AVX2 port
I420 and I422 To YUY2 and UYVY ported from SSE2 to AVX2.

Was SSE2
I420ToYUY2_Opt (135 ms)
I420ToUYVY_Opt (148 ms)
I422ToYUY2_Opt (145 ms)
I422ToUYVY_Opt (142 ms)

Now AVX2
I420ToYUY2_Opt (133 ms)
I420ToUYVY_Opt (130 ms)
I422ToYUY2_Opt (127 ms)
I422ToUYVY_Opt (137 ms)

Bug: libyuv:556
Test: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*I42?To*UY*Opt
Change-Id: Ic35f97cee02dc009fd98785589ba17c7cf50bb35
Reviewed-on: https://chromium-review.googlesource.com/892493
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-02-01 00:33:25 +00:00
Frank Barchard
ffec313dbe ABGRToAR30 used AVX2 with reversed shuffler
vpshufb is used to reverse R and B channels;
Code is otherwise the same as ARGBToAR30.

Bug: libyuv:751
Test: ABGRToAR30 unittest
Change-Id: I30e02925f5c729e4496c5963ba4ba4af16633b3b
Reviewed-on: https://chromium-review.googlesource.com/891807
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-29 22:31:31 +00:00
Frank Barchard
ff8ab9baf1 AR30ToABGR for 10 to 8 bit RGB on Android
ABGR is the more common format on Android.
This CL converts 10 bit AR30, to standard 8 bit ABGR.
Unoptimized but allows better testing and feature completeness.

Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToABGR_Opt
Change-Id: I0c7e7273158be215129e0a1d355587ae15942299
Reviewed-on: https://chromium-review.googlesource.com/891694
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-29 22:21:42 +00:00
Frank Barchard
ed96b7b2c7 AVX2 port of H010ToAR30_AVX2
Was SSSE3 H010ToAR30_Opt (635 ms)
Now AVX2  H010ToAR30_Opt (448 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I17b1a0e3268c4a9836e09683dd3377fb1ce60932
Reviewed-on: https://chromium-review.googlesource.com/889906
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-27 00:14:27 +00:00
Frank Barchard
c95fd57993 AVX2 port of I010ToAR30_AVX2
Was SSSE3 I420ToAR30_Opt (635 ms)
Now AVX2  I420ToAR30_Opt (446 ms)

Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I261be19ec981136a8f453ae0d3211532a790e5c5
Reviewed-on: https://chromium-review.googlesource.com/887750
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-26 02:12:07 +00:00
Frank Barchard
3f43ecc029 Add H420ToAR30 and a test that does a histogram
[ RUN      ] LibYUVConvertTest.TestH420ToAR30
uniques: B 222, G, 222, R 222
[       OK ] LibYUVConvertTest.TestH420ToAR30 (0 ms)
[ RUN      ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)

Bug: libyuv: 751
Test: LibYUVConvertTest.TestH420ToAR30
Change-Id: I9b75af286124c058c24799778a58c3feb9a1a1ab
Reviewed-on: https://chromium-review.googlesource.com/884845
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-25 00:36:40 +00:00
Frank Barchard
92e22cf5b6 Lint cleanup after C99 change CL
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: git cl lint
Change-Id: I51cf8107a8db17fbc9952d610f3e4d7aac5aa743
Reviewed-on: https://chromium-review.googlesource.com/882217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-24 19:16:03 +00:00
Frank Barchard
7e389884a1 Switch to C99 types
Append _t to all sized types.
uint64 becomes uint64_t etc

Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: Ide273d7f8012313d6610415d514a956d6f3a8cac
Reviewed-on: https://chromium-review.googlesource.com/879922
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-23 19:16:05 +00:00
Frank Barchard
13771ffaad basic_types.h - remove unused macros
Removes macros that were part of standard basic_types
header but not used by libyuv itself.

TBR=braveyao@chromium.org
Bug: libyuv:774
Test: try bots still build
Change-Id: I8de6fad5a9277df0a50959881392ba212b1b5972
Reviewed-on: https://chromium-review.googlesource.com/879591
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-23 02:24:58 +00:00
Frank Barchard
8af6ea4100 I420ToAR30 in 1 step SSSE3 assembly
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToAR30_Opt
Change-Id: Ie89c3eb2526354cf11175746bc8af72be83a1e00
Reviewed-on: https://chromium-review.googlesource.com/877541
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-23 01:33:10 +00:00
Frank Barchard
09db0c4ce2 H010ToAR30 in 1 step with SSSE3 assembly
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.

[ RUN      ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[       OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[       OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN      ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[       OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-19 19:46:58 +00:00
Frank Barchard
ecab5430c2 Remove MEMOPREG x64 NaCL macros
MEMOPREG macros are deprecated in row.h

Regular expressions to remove MEMOPREG macros:

MEMOPREG(movd, 0x00, [u_buf], [v_buf], 1, xmm1)                            \
MEMOPREG\((.*), (.*), (.*), (.*), (.*), (.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6            \\n"

MEMOPREG(movdqu,0x00,1,4,1,xmm2)
MEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6           \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: If8743abd9af2e8c549d0c7d3d49733a9b0f0ca86
Reviewed-on: https://chromium-review.googlesource.com/865964
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-16 19:10:44 +00:00
Frank Barchard
b33e0f97e7 Remove MEMOPMEM x64 NaCL macros
MEMOPMEM macros are deprecated in row.h

Usage examples
    MEMOPMEM(vmovdqu,ymm0,0x00,0,1,1)          //  vmovdqu %%ymm0,(%0,%1)
    MEMOPMEM(movdqu,xmm2,0x00,1,0,1)

Regular expressions to remove MEMACCESS macros:

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    %%\2,\3(%\4,%\5,\6)\7 \\n"

MEMOPMEM\((.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    %%\2,\3(%\4,%\5,\6)            \\n"

TBR=braveyao@chromium.org
Bug: libyuv:702
Test: try bots pass
Change-Id: Id8c6963d544d16e39bb6a9a0536babfb7f554b3a
Reviewed-on: https://chromium-review.googlesource.com/865934
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-13 01:33:21 +00:00
Frank Barchard
a875ed173d Remove VMEMOPREG x64 NaCL macros
VMEMOPREG macros are deprecated in row.h

Usage examples
    VMEMOPREG(vpavgb,0x00,0,4,1,ymm0,ymm0)     // vpavgb (%0,%4,1),%%ymm0,%%ymm0
    VMEMOPREG(vpavgb,0x20,0,4,1,ymm1,ymm1)

Regular expressions to remove MEMACCESS macros:

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*)(//.*)
"\1    \2(%\3,%\4,\5),%%\6,%%\7      \\n"

VMEMOPREG\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)
"\1    \2(%\3,%\4,\5),%%\6,%%\7            \\n"

TBR=braveyao@chromium.org

Bug: libyuv:702
Test: try bots pass
Change-Id: I472446606f7fd568fdf33aaacc22d5ed78673dab
Reviewed-on: https://chromium-review.googlesource.com/865640
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 22:54:24 +00:00
Frank Barchard
030042a2ff Remove VEXTOPMEM x64 NaCL macros
VEXTOPMEM macros are deprecated in row.h

Usage examples
    VEXTOPMEM(vextractf128,1,ymm0,0x0,1,2,1) // vextractf128 $1,%%ymm0,(%1,%2,1)

Regular expressions to remove MEMACCESS macros:

VEXTOPMEM\((.*),(.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1 $\2,%\3,\4(%\5,%\6,\7)        \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I177edf9813128408e74816672dd25abb03a5e1ca
Reviewed-on: https://chromium-review.googlesource.com/865283
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 21:16:34 +00:00
Frank Barchard
5088f00165 Remove MEMACCESS x64 NaCL macros
MEMACCESS macros are deprecated in row.h

Usage examples
    "movdqu    " MEMACCESS(0) ",%%xmm0         \n"
    "movdqu    " MEMACCESS2(0x10,0) ",%%xmm1   \n"

Regular expressions to remove MEMACCESS macros:

" MEMACCESS2\((.*),(.*)\) "(.*)\\n"
\1(%\2)\3              \\n"

" MEMACCESS\((.*)\) "(.*)\\n"
(%\1)\2            \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I42f62d5dede8ef2ea643e78c204371a7659d25e6
Reviewed-on: https://chromium-review.googlesource.com/862803
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-12 20:37:41 +00:00
Frank Barchard
e3797d1765 Remove MEMOPARG x64 NaCL macros
MEMOPARG macros are deprecated in row.h

  #opcode " " #offset "(%" #base ",%" #index "," #scale "),%" #arg "\n"

Usage examples
    MEMOPARG(movzwl,0x00,1,3,1,k2)             //  movzwl  (%1,%3,1),%k2

Regular expression to remove MEMACCESS macro:

MEMOPARG\((.*),(.*),(.*),(.*),(.*),(.*)\)(.*//.*)
"\1    \2(%\3,%\4,\5),%\6                \\n"

Bug: libyuv:702
Test: try bots pass
Change-Id: I4a5ad2abf5017e651576f4c8c784be1c8dbf5a83
Reviewed-on: https://chromium-review.googlesource.com/863108
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-12 18:26:06 +00:00
Frank Barchard
c682abe597 libyuv: fix undefined mul overflow
Bug: libyuv:771
Test: build asm with ubsan and check
Change-Id: I966d0bff74eef9ddfbeb93965fbff24c1472927c
Reviewed-on: https://chromium-review.googlesource.com/860898
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 23:22:46 +00:00
Frank Barchard
3694891922 Remove MEMLEA x64 NaCL macros
Bug: libyuv:702
Test: try bots pass
Change-Id: I0ee094551734368f2179c298e7bf423ec80a929c
Reviewed-on: https://chromium-review.googlesource.com/857845
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 19:16:16 +00:00
Frank Barchard
a2142148e9 Remove x64 native_client macros.
Bug: libyuv:702
Test: try bots pass
Change-Id: I76d74b5f02fe9843418108b84742e2f714d1ab0a
Reviewed-on: https://chromium-review.googlesource.com/855656
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-10 01:27:22 +00:00
Frank Barchard
00d526d4ea H010ToARGB_AVX2 optimized conversion
AVX2 optimized 10 bit YUV to ARGB.

Bug: libyuv:751
Test: H010ToARGB unittest
Change-Id: I705630beb62714b52042c2a5dcdb8b7859e734ae
Reviewed-on: https://chromium-review.googlesource.com/852563
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-09 03:17:33 +00:00
Frank Barchard
55310f92bc Remove NACL_R14 macro
Bug: libyuv:702
Test: try bots still build
Change-Id: I05317e45c885955fcda233bdddbd11ce1d246d90
Reviewed-on: https://chromium-review.googlesource.com/854770
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-08 22:41:15 +00:00
Frank Barchard
50f9e618fa Add H010ToABGR, I010ToABGR and I010ToARGB functions
ABGR output is implemented using the same source code as ARGB, by swapping
the u and v and supplying the mirrored conversion matrix.
ABGR format (RGBA in memory) is popular on Android.

Bug: libyuv:751
Test: H010ToABGR, I010ToABGR and I010ToARGB unittests

Change-Id: I0b5103628c58dcb22a6442c03814d4d5972e0339
Reviewed-on: https://chromium-review.googlesource.com/852985
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-08 17:40:33 +00:00
Frank Barchard
9d2cd6a3ef H010ToAR30 optimized to 2 step conversion
Previously H010ToAR30 was done in a 3 step conversion:
H010ToH420, H420ToARGB, ARGBToAR30.
This CL merges the first 2 steps into H010ToARGB, to
improve performance.
Caveat - only 10 bit YUV is supported at this time.
Previously the low level code supported different numbers
of bits - 9, 10, 12 or 16.

Was 3 step conversion:
LibYUVConvertTest.H010ToAR30_Any (1263 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
LibYUVConvertTest.H010ToAR30_Invert (913 ms)
LibYUVConvertTest.H010ToAR30_Opt (901 ms)

Now 2 step conversion:
LibYUVConvertTest.H010ToAR30_Any (853 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
LibYUVConvertTest.H010ToAR30_Invert (781 ms)
LibYUVConvertTest.H010ToAR30_Opt (755 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
Reviewed-on: https://chromium-review.googlesource.com/853288
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
2018-01-07 08:36:57 +00:00
Frank Barchard
a64658593e I210ToARGB conversion from 10 bit YUV to RGB
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.

Bug: libyuv:751
Test:  I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2018-01-05 02:43:38 +00:00
Frank Barchard
2ed2402fa0 I420ToI010 for 8 to 10 bit YUV conversion.
Convert planar 8 bit formats to planar 16 bit formats.

Includes msan fix for Convert8To16Row_Opt unittest.

I420 is YUV bt.601 8 bits per channel with 420 subsampling.
I010 is YUV bt.601 10 bits per channel with 420 subsampling.
I is color space - bt.601.  The function does no color space
 conversion so H420ToI010 is aliased to this function as well.
0 = 420 subsampling.  The chroma channels are half width / height.
10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.

For SSSE3 version:
out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
[ RUN      ] LibYUVConvertTest.I420ToI010_Opt
[       OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)

Bug: libyuv:751
Test: LibYUVConvertTest.I420ToI010_Opt
Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
Reviewed-on: https://chromium-review.googlesource.com/846421
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2018-01-02 21:09:39 +00:00
Frank Barchard
140fc0a261 Remove LIBYUV_SSSE3_ONLY and ARGBSHUFFLEROW_SSE2
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.

Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2018-01-02 18:57:39 +00:00
Frank Barchard
768f103b8b Convert8To16 for better H010 support
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.

Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-28 22:27:24 +00:00
Frank Barchard
c67db60534 HalfFloat_SSE2 use movd from memory
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned

Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-21 19:37:50 +00:00
Frank Barchard
790054ff03 Add AR30ToARGB function
Initial AR30ToARGB function to allow converion
from AR30 to other formats if necessary and/or
for testing.
Not optimized at this point.

Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: I38ef192315240f3caa7aee0218b38d5e88a2849f
Reviewed-on: https://chromium-review.googlesource.com/833025
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-19 01:54:42 +00:00
Frank Barchard
5336217f11 H010Copy function to copy 16 bit planar formats
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToH010_Opt
Change-Id: I996d309040a14193a97d05b62ac0b3e1ad1ee74b
Reviewed-on: https://chromium-review.googlesource.com/823445
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-15 03:34:34 +00:00
Frank Barchard
3b81288ece Remove Mips DSPR2 code
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-14 18:22:16 +00:00
Frank Barchard
bb3180ae80 Add I420ToAR30 10 bit RGB
For more complete support of AR30 format, add I420ToAR30 allowing
the new RGB 10 bit format to be used from standard 8 bit I420 format.

Bug: libyuv:751
Test: I420ToAR30 unittest added
Change-Id: Ia8b0857447408bd6adab485158ce5f38d6dc2faa
Reviewed-on: https://chromium-review.googlesource.com/823243
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 23:40:58 +00:00
Frank Barchard
c367751430 ARGBToAR30 SSSE3 use pmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2017-12-12 20:12:58 +00:00
Frank Barchard
11dd1b956f ARGBToAR30 use vpmulhuw to replicate fields
AR30 is optimized with 3 techniques
1. vpmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time.  R and B, and A and G.
3. vpshufb is used to shift and mask 2 channels of R and B

Red Blue
With the 8 bit value in the upper bits, vpmulhuw by (1024+4) will produce a 10
bit value in the low 10 bits of each 16 bit value. This is whats wanted for the
blue channel. The red needs to be shifted 4 left, so multiply by (1024+4)*16 for
red.

Alpha Green
Alpha and Green are already in the high bits so vpand can zero out the other
bits, keeping just 2 upper bits of alpha and 8 bit green. The same multiplier
could be used for Green - (1024+4) putting the 10 bit green in the lsb.  Alpha
would be a simple multiplier to shift it into position.  It wants a gap of 10
above the green.  Green is 10 bits, so there are 6 bits in the low short.  4
more are needed, so a multiplier of 4 gets the 2 bits into the upper 16 bits,
and then a shift of 4 is a multiply of 16, so (4*16) = 64.  Then shift the
result left 10 to position the A and G channels.

Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: Ie4f20dce18203bae7b75acb1fd5232db8a8a4f11
Reviewed-on: https://chromium-review.googlesource.com/820046
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-12-12 02:57:54 +00:00
Frank Barchard
0f98c3c1df Add ARGBToAR30Row_SSE2 to speed up H010ToAR30
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.

Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-09 00:11:20 +00:00
Frank Barchard
aabe380890 H010ToAR30 and H010ToARGB optimized YUV buffering
Reduce allocations of row buffers to 1 alloc/free.
Do 2 rows at a time to avoid converting U and V planes twice.

Bug: libyuv:715
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I2f3a03b4875df5e3b969112a78a1a0b28399fa2f
Reviewed-on: https://chromium-review.googlesource.com/816021
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-12-08 18:55:03 +00:00
Frank Barchard
3541e46a7e Add H010ToARGB for 10 bit YUV to ARGB
Bug: libyuv:751
Test:  LibYUVConvertTest.H010ToARGB_Opt
Change-Id: I668d3f3810e59a4fb6611503aae1c8edc7d596e7
Reviewed-on: https://chromium-review.googlesource.com/815015
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-12-07 20:17:50 +00:00
Frank Barchard
49d9b1039b NV21ToABGR for Android camera conversions
Bug: libyuv:762
Test: NV21ToABGR unittest
Change-Id: I71448ab83930339083f07eeafccf240c6cb41c48
Reviewed-on: https://chromium-review.googlesource.com/795212
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-11-30 20:29:28 +00:00
Frank Barchard
324fa32739 Convert16To8Row_SSSE3 port from AVX2
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used.  This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.

Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-28 19:22:39 +00:00
Lei Zhang
8445617191 Mark a bunch of kArray variables as const.
This allows the linker to move the variables from the .data section to
the .rodata section.

Bug: libyuv:254
Test: out/Release/libyuv_unittest --gtest_filter=* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1

Change-Id: I6998570f1af4337d7b80313d9e18e36aa20d6ec0
Reviewed-on: https://chromium-review.googlesource.com/777033
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2017-11-27 23:38:44 +00:00
Frank Barchard
26173eb73e H010ToAR30 for 10 bit bt.709 YUV to 30 bit RGB
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2

Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-22 23:58:30 +00:00
Frank Barchard
a98d6cdb17 ARGBToAR30 AVX2 conversion function
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: I09c13eb53ba5f1ce1740c013dc587f8300f1d9e0
Reviewed-on: https://chromium-review.googlesource.com/780437
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-21 20:37:01 +00:00
Frank Barchard
12c904a97c H420ToRAW and H420ToRGB24 added for bt.709 support.
Bug: libyuv:760
Test: LibYUVConvertTest.H420ToRAW_Opt
Change-Id: I050385f477309d5db02bb2218088f224c83392ed
Reviewed-on: https://chromium-review.googlesource.com/775785
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-17 01:20:05 +00:00
Frank Barchard
46594be758 add ScalePlane_16 unit tests
Tests ScalePlane vs ScalePlane_16 match.

Bug: libyuv:749
Test: LibYUVScaleTest.ScalePlaneDownBy4_Box_16
Change-Id: I3f71748da404982d5d48bfb11bbd3ae95a1d021c
Reviewed-on: https://chromium-review.googlesource.com/765045
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
2017-11-16 01:40:48 +00:00
Frank Barchard
630c8ed1e0 Fix for ScaleDownBy4_Linear_16
The unittest compares the results of 8 and 16 bit scaling
and expects them to be the same.  This CL makes the 16 bit
scaling filter logic match.

Bug: libyuv:749
Test: LibYUVScaleTest.DISABLED_ScaleDownBy4_Linear_16
Change-Id: Ifb3ca4d770ef38f9f16abe9b9aeb843b779bf371
Reviewed-on: https://chromium-review.googlesource.com/772370
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-15 23:05:22 +00:00
Frank Barchard
49d1e3b036 MultiplyRow_16_AVX2 for converting 10 bit YUV
When converting from lsb 10 bit formats to msb, the values
need to be shifted to the top 10 bits.  Using a multiply
allows the different numbers of bits to be copied:
// 128 = 9 bits
// 64 = 10 bits
// 16 = 12 bits
// 1 = 16 bits
Bug: libyuv:751
Test: LibYUVPlanarTest.MultiplyRow_16_Opt
Change-Id: I9cf226053a164baa14155215cb175065b1c4f169
Reviewed-on: https://chromium-review.googlesource.com/762951
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 22:02:32 +00:00
Frank Barchard
2f58d126b9 MergeUV10Row_AVX2 use multiply to handle different bit depths
Instead of hardcoded shift, use a multiply by a parameter.
128 = 9 bits
64 = 10 bits
16 = 12 bits
1 = 16 bits

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: Id925edfdbf91243370c90641b50eb8e7625ec329
Reviewed-on: https://chromium-review.googlesource.com/762523
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-10 03:38:07 +00:00
Frank Barchard
e26b0a7e0e casting for c89 compatibility and lint cleanup
Bug: libyuv:756
Test: CFLAGS="-m32 -static -std=gnu89 -mno-sse -O2" CXXFLAGS="-m32 -x c -static -std=gnu99 -mno-sse -O2" make -f linux.mk libyuv.a
Change-Id: Ic362f93e01ccbb0bea14f361a58585e79297e7d2
Reviewed-on: https://chromium-review.googlesource.com/759423
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Patrik Höglund <phoglund@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 18:22:17 +00:00
Frank Barchard
735ace2ed3 Re-enable x86 assembly without requiring -msse2
clang does not require -msse2 or -msse for inline, except
the "x" parameter.  So change this to "m" for 32 bit.  64 bit
requires sse2 so use "x" for 64 bit.

gcc requires -msse for xmm registers in clobber list.
Reduce compiler requirement from -msse2 to -msse for enabling
assembly.

Bug: libyuv:754, libyuv:757
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I86df72cfee80b7d349561c1fd7c97ad360767255
Reviewed-on: https://chromium-review.googlesource.com/759303
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-09 00:51:06 +00:00
Frank Barchard
68f852d835 Remove DISABLE_CLANG_MSA
cleanup to remove ifdefs around functions affected by
a clang bug.

gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest

Bug: libyuv:634
Test: build for mips with clang
Change-Id: I278b368dbb2fe89082240e280267d0a27a214c78
Reviewed-on: https://chromium-review.googlesource.com/757980
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-08 19:55:14 +00:00
Frank Barchard
d997ac287d Revert "Enable SSE2 code without -msse"
This reverts commit 01e994d74e4e3937ee1a3efdc048320a1e51f818.

Change-Id: Ie76710d0f4e641e071889c5125fd3be23cdcdb59
Reviewed-on: https://chromium-review.googlesource.com/758499
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 19:33:09 +00:00
Frank Barchard
01e994d74e Enable SSE2 code without -msse
Bug: libyuv:754
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I74bf8d032013694e65ea7637bc38d3253db53ff2
Reviewed-on: https://chromium-review.googlesource.com/758043
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-11-08 02:54:41 +00:00
Frank Barchard
522fd699e6 AVX512 feature detects for cnl and icl
Key instruction sets added for each microarchitecture:

AVX512BW, AVX512VL, AVX512DQ - skylake server or later
AVX512_VBMI, AVX512_IFMA - cannon lake or later
AVX512_BITALG, AVX512_VBMI2, AVX512_VPOPCNTDQ, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ - ice lake or later

Bug: libyuv:752
Test: ~/intelsde/sde -icl -- out/Release/libyuv_unittest --gtest_filter=*Cpu*
Change-Id: I9ee28904c90009d66721b9f805a440c5fc2da122
Reviewed-on: https://chromium-review.googlesource.com/755617
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-07 00:56:37 +00:00
Frank Barchard
afa98e1f08 HammingDistance_SSSE3 use movd not vmovd
vmovd is an AVX instruction.  This will crash on an older CPU with
only SSSE3 but not AVX.  Use movd instead.

Bug: libyuv:753
Test: ~/intelsde/sde -mrm -- out/Release/libyuv_unittest --gtest_filter=LibYUVCompareTest.BenchmarkHammingDistance_Opt
Change-Id: I1fb0026039d5f83d124f5d03fed7dc0d2d723e49
Reviewed-on: https://chromium-review.googlesource.com/756200
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-11-07 00:48:52 +00:00
Frank Barchard
a0c32b9e49 MergeUV10Row_AVX2 for converting H010 to P010
H010 is 10 bit planar format with 10 bits in lower bits.
P010 is 10 bit biplanar format with 10 bits in upper bits.
This function weaves the U and V channels and shifts the bits
into the upper bits.

Bug: libyuv:751
Test: LibYUVPlanarTest.MergeUV10Row_Opt
Change-Id: I4a0bac0ef1ff95aa1b8d68261ec8e8e86f2d1fbf
Reviewed-on: https://chromium-review.googlesource.com/752692
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-11-03 18:55:36 +00:00
Frank Barchard
de8e9ae10b HammingDistance_SSE42 register optimized to avoid push
Bug: libyuv:701
Test: objdump to confirm code gen
Change-Id: Ibdcb2cc6bc9bf14b4ccb874c49fc9ff664650e1a
Reviewed-on: https://chromium-review.googlesource.com/745390
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2017-10-31 21:12:32 +00:00
Frank Barchard
ffc4811863 clang-format fixes
Bug: None
Test: lint passes
Change-Id: I1fd40d3506bab1f4f9100902f633a9c9e7b96337
Reviewed-on: https://chromium-review.googlesource.com/745038
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-10-31 02:25:01 +00:00
Frank Barchard
80077a80c2 HammingDistance_X86 using popcnt assembly
popcnt has a fake dependency on the destination.
This assembly avoids the dependency by using a different
register for each popcnt.

Bug: libyuv:701
Test: LIBYUV_DISABLE_SSSE3=1 out/Release/libyuv_unittest --gtest_filter=*Ham*Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=9999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: Ie1d202e2613b7fa8a3c02acd433940e92c80eafa
Reviewed-on: https://chromium-review.googlesource.com/731826
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-23 21:15:12 +00:00
Frank Barchard
8fa02df3c0 mingw fix ifdefs to use gcc source
mingw gcc sets the macro _M_IX86 which is normally only set
by Visual C and clangcl which are Visual C style source code
style for assembly, but gcc is not Visual C compatible.
Add _MSC_VER to most ifdefs to detect that its really Visual C
or clangcl and not mingw gcc so the gcc source code will be used.

Bug: libyuv:744
Test: CXXFLAGS=-m32 CXX=~/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-g++ make -f linux.mk
Change-Id: I3431aa486eb769b145faa8d5eb75ed639f9d6f5e
Reviewed-on: https://chromium-review.googlesource.com/722319
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-17 17:36:35 +00:00
Frank Barchard
e23b27d040 Reduce HammingDistance block size to 32k to avoid overflow
Bug: libyuv:701
Test: HammingDistance unittest with large size
Change-Id: Id41a2c27eb8922d03b3a21dab32fa2e7b015ba38
Reviewed-on: https://chromium-review.googlesource.com/708335
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-10 18:42:47 +00:00
Frank Barchard
60f433fbd9 Revert "ComputeHammingDistance reduce SIMD loop to 1 call when possible."
This reverts commit ec75df5894845b8d6b1341885a78db1de83decd8.

Reason for revert: <INSERT REASONING HERE>

Original change's description:
> ComputeHammingDistance reduce SIMD loop to 1 call when possible.
> 
> 32 bit x86 has high overhead due to -fpic.  So this reduces the
> number of calls by 1.
> 
> TBR=kjellander@chromium.org
> Bug: libyuv:701
> Test: BenchmarkHammingDistance
> Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
> Reviewed-on: https://chromium-review.googlesource.com/701755
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Reviewed-by: Cheng Wang <wangcheng@google.com>

TBR=rrwinterton@gmail.com,fbarchard@google.com,wangcheng@google.com

Change-Id: Ia61e8558a8f083c14be5f51e0e141550b6f2b5c1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: libyuv:701
Reviewed-on: https://chromium-review.googlesource.com/707823
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-10 01:16:15 +00:00
Frank Barchard
ec75df5894 ComputeHammingDistance reduce SIMD loop to 1 call when possible.
32 bit x86 has high overhead due to -fpic.  So this reduces the
number of calls by 1.

TBR=kjellander@chromium.org
Bug: libyuv:701
Test: BenchmarkHammingDistance
Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
Reviewed-on: https://chromium-review.googlesource.com/701755
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-10-09 22:51:23 +00:00
Frank Barchard
1734712a6f Fix odd length HammingDistance
If length of HammingDistance was not a multiple of 4,
the result was incorrect.  The old tests did not catch this
so a new test is done to count 1s.

Bug: libyuv:740
Test: LibYUVCompareTest.TestHammingDistance
Change-Id: I93db5437821c597f1f162ac263d4a594bb83231f
Reviewed-on: https://chromium-review.googlesource.com/699614
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-04 22:21:36 +00:00
Frank Barchard
fecd741794 Port HammingDistance to SSSE3
Bug: libyuv:701
Test: BenchmarkHammingDistance_Opt
Change-Id: Ibdd5d382677ebef4f82a62e0d5c3b88614a3b6e4
Reviewed-on: https://chromium-review.googlesource.com/696290
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-10-03 19:11:05 +00:00
Frank Barchard
bde789b176 Hamming Distance SSE2 and AVX2 optimized
Bug: None
Test: None
Change-Id: Id52663f9c957aac3172fba92d888ad1b041d5cf0
Reviewed-on: https://chromium-review.googlesource.com/692981
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-10-02 22:32:54 +00:00
Frank Barchard
311add63c2 CopyRow_NEON use ldp instead of ld1 for better performance.
Under cache thrashing circumstances, ldp/stp perform better than
ld1/st1 on QC820/QC821 CPUs.  Same performance when hitting cache.

Bug: libyuv:738
Test: LibYUVPlanarTest.TestCopySamples_Opt (445 ms)
Change-Id: Ib6a0a5d5e6a1b7ef667b9bb2edb39d681cf3614c
Reviewed-on: https://chromium-review.googlesource.com/691281
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-29 01:52:29 +00:00
Frank Barchard
efbf15754a Step thru full color test by increments of 5 for better test speed.
Full color test is the slowest of the unittests, and not catching any
additional bugs at the moment.  Step thru range of 0 to 255 in steps of
5 to speed up the test.  255 is 3 * 5 * 17, so any of those primes would
hit 0 and 255 exactly.

Was LibYUVColorTest.TestFullYUV (896 ms)
Now LibYUVColorTest.TestFullYUV (212 ms)

TBR=kjellander@chromium.org
Bug: libyuv:736
Test: LibYUVColorTest.TestFullYUV
Change-Id: I5b55fb07ada0dc7bdc3c3c20569d36bf09bb3804
Reviewed-on: https://chromium-review.googlesource.com/672064
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-19 02:01:53 +00:00
Frank Barchard
00c501fe43 Cast xgetbv from int64 to int to avoid Visual C warning.
TBR=kjellander@chromium.org
Bug: libyuv:735
Test: try bots
Change-Id: I00dc06689cd0a23847865c0c8edeb538b0cc81ac
Reviewed-on: https://chromium-review.googlesource.com/669142
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-15 22:00:52 +00:00
Frank Barchard
0a3d23c898 fix clang-format-ing for row arm functions
TBR=kjellander@chromium.org
BUG=None
TEST=git cl lint

Change-Id: I45ecd7f8279981ba037dc051f521f6b6d5506f64
Reviewed-on: https://chromium-review.googlesource.com/664345
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-14 21:35:06 +00:00
Frank Barchard
753a91cbcb fix fmov build error on gcc 4.7 for neon64
TBR=kjellander@chromium.org
BUG=libyuv:732
TEST=LibYUVPlanarTest.TestScaleSumSamples_Opt

Change-Id: If80e9510ad5668b080b9384e656c0bd73cf5b4a6
Reviewed-on: https://chromium-review.googlesource.com/663764
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-12 22:46:33 +00:00
Frank Barchard
1e16cb5c38 SplitRGBPlane and MergeRGBPlane functions added
Converts packed RGB to planar and back.

TBR=kjellander@chromium.org
BUG=libyuv:728
TEST=MergeRGBPlane_Opt and SplitRGBPlane_Opt unittests added

Change-Id: Ida59af940afcb1fc4a48bbf62c714f592665c3cc
Reviewed-on: https://chromium-review.googlesource.com/658069
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-11 21:02:04 +00:00
Frank Barchard
8f5e9cd9eb ScaleRowUp2_16_C port of NEON to C
Single pass upsample with bilinear filter.
NEON version optimized - Pixel Sailfish QC821

Was TestScaleRowUp2_16 (5741 ms)
Now TestScaleRowUp2_16 (4484 ms)
C   TestScaleRowUp2_16 (6555 ms)

TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowUp2_16 (709 ms)

Change-Id: Ib04ceb53e0ab644a392c39c3396e313530161d92
Reviewed-on: https://chromium-review.googlesource.com/646701
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-09-05 21:40:39 +00:00
Manojkumar Bhosale
2621c91bf1 Add MSA optimized HammingDistance and SumSquareError functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Id0126ba5aff38817525b1efa6044f1dc2cfa1a36
Reviewed-on: https://chromium-review.googlesource.com/625739
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-09-05 21:32:33 +00:00
Frank Barchard
0acc67712f clang format / lint cleanup for arm scale functions
TBR=kjellander@chromium.org
BUG=libyuv:725
TEST=lint

Change-Id: I76f777427f9b1458faba12796fb0011d8e3228d5
Reviewed-on: https://chromium-review.googlesource.com/646586
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-31 22:41:08 +00:00
Frank Barchard
a826dd7112 ARGBScaleDown by 2 with nearest neighbor optimized
TBR=kjellander@chromium.org
BUG=libyuv:723
TEST=ScaleDownBy2_None

Change-Id: I6861e62d3a67dde916b87fdc46eb02f2b4ee9f17
Reviewed-on: https://chromium-review.googlesource.com/644149
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-30 23:22:14 +00:00
Frank Barchard
1c85f98846 Scale down by 2 linear use 'half add' to average pixels.
Use ld2 to load even and odd pixels into different registers
and hadd to half add them to each other.

Previously used paired and shift.

TBR=kjellander@chromium.org
BUG=libyuv:723
TEST=ScaleDownBy2_Linear

Change-Id: I3ec72bcf7d4c746837217496c301eb4e4ad963cf
Reviewed-on: https://chromium-review.googlesource.com/644113
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-30 22:10:32 +00:00
Frank Barchard
e200738d82 Scale Down by 2 use ld2 and urhadd
urhadd is a rounded average.  Linear filter wants to average
horizontally, so use ld2 to separate even and odd pixels.

TBR=jkellander@chromium.org
BUG=None
TEST=LibYUVScaleTest.*ScaleDownBy2*

Change-Id: Id667288a030e72ce8e1c1d6719b69c555c0db063
Reviewed-on: https://chromium-review.googlesource.com/642448
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-30 01:18:11 +00:00
Manojkumar Bhosale
b6e8e9aa97 Add MSA optimized HalfFloatRow function
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: I54a2c57d66093b887c8ba31fd7a21a102165393a
Reviewed-on: https://chromium-review.googlesource.com/628557
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-29 18:40:08 +00:00
Frank Barchard
f0a9d6d206 Gaussian reorder for benefit of A73
Roughly. instead of 4 loads and 8 multiples, use 1 load and 2 multiples
4 times over.  The original code, as with the C code from clang and gcc,
did all the loads, then all the math, then the store.  The new code
does a load, then the math, then the next load, etc.
This schedules better on current arm 64 cpus.
Number of registers also reduced, reusing the same registers.

HiSilicon ARM A73:

Now
TestGaussRow_Opt (890 ms)
TestGaussCol_Opt (571 ms)

Was
TestGaussRow_Opt (1061 ms)
TestGaussCol_Opt (595 ms)

Qualcomm 821 (Pixel):

Now
TestGaussRow_Opt (571 ms)
TestGaussCol_Opt (474 ms)

Was
TestGaussRow_Opt (751 ms)
TestGaussCol_Opt (520 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt

Reviewed-on: https://chromium-review.googlesource.com/627478
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Change-Id: I5ec81191d460801f0d4a89f0384f89925ff036de
Reviewed-on: https://chromium-review.googlesource.com/634448
Commit-Queue: Frank Barchard <fbarchard@google.com>
2017-08-25 19:00:05 +00:00
Frank Barchard
ad2409443c GaussRow_NEON from int to short
[ RUN      ] LibYUVPlanarTest.TestGaussRow_Opt
 [       OK ] LibYUVPlanarTest.TestGaussRow_Opt (601 ms)
 [ RUN      ] LibYUVPlanarTest.TestGaussCol_Opt
 [       OK ] LibYUVPlanarTest.TestGaussCol_Opt (522 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt

Change-Id: I1242b98672538e889f3ab48f215d6dabc7144ea7
Reviewed-on: https://chromium-review.googlesource.com/627478
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-24 01:09:23 +00:00
Frank Barchard
1cc539f7d6 GaussCol_NEON resample from short to int
Old NEON
LibYUVPlanarTest.TestGaussCol_Opt (916 ms)

New NEON
LibYUVPlanarTest.TestGaussCol_Opt (520 ms)

C vectorized
LibYUVPlanarTest.TestGaussCol_Opt (739 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussCol_Opt

Change-Id: I863b66f700f7a71fcb08a2eabb03240fdaf8a238
Reviewed-on: https://chromium-review.googlesource.com/626938
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-22 23:07:17 +00:00
Frank Barchard
c5bad809b1 Gauss unittest, Scale comments for neon64 half size updated
[ RUN      ] LibYUVPlanarTest.TestGaussRow_Opt
[       OK ] LibYUVPlanarTest.TestGaussRow_Opt (1274 ms)
[ RUN      ] LibYUVPlanarTest.TestGaussCol_Opt
[       OK ] LibYUVPlanarTest.TestGaussCol_Opt (916 ms)

TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=LibYUVPlanarTest.TestGaussRow_Opt
Change-Id: Id480f3870c40c2b40dfb9f072cb7118ebad41afc
Reviewed-on: https://chromium-review.googlesource.com/624701
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-21 23:41:46 +00:00
Frank Barchard
0c957d183e Gaussian blur NEON optimized
TBR=kjellander@chromium.org
BUG=libyuv:719
TEST=TestGaussCol_NEON

Change-Id: I52cb6dbfd0cab4a30205c93b6a528ef49e9ab529
Reviewed-on: https://chromium-review.googlesource.com/621708
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-21 21:18:32 +00:00
Frank Barchard
8cd3e4f3f2 Add MSA optimized ScaleFilterCols, ScaleARGBCols, ScaleARGBFilterCols and ScaleRowDown34 functions
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Ib139b9701fc67e24d27a6886377c0cb8b2773fda
Reviewed-on: https://chromium-review.googlesource.com/620791
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-18 17:23:27 +00:00
Frank Barchard
78e44628c6 Add MSA optimized SplitUV, Set, MirrorUV, SobelX and SobelY row functions.
TBR=kjellander@chromium.org
R=fbarchard@google.com

Bug:libyuv:634
Change-Id: Ie2342f841f1bb8469fc4631b784eddd804f5d53e
Reviewed-on: https://chromium-review.googlesource.com/616765
Reviewed-by: Frank Barchard <fbarchard@google.com>
2017-08-17 18:39:22 +00:00
Frank Barchard
bb17da97cf Test C vs NEON for ScaleDown2Box_16
TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowDown2Box_16

Change-Id: Ic74d29d6f14983ff26e8af541ef702a0f8bf3f17
Reviewed-on: https://chromium-review.googlesource.com/616189
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-16 22:18:48 +00:00
Frank Barchard
7e59ee4c75 Upsample 8x2 pixels to 16x1 with bilinear filtering
Downsample 16x2 to 8x1 with box filtering

[ RUN      ] LibYUVScaleTest.TestScaleRowUp2_16
[       OK ] LibYUVScaleTest.TestScaleRowUp2_16 (579 ms)
[ RUN      ] LibYUVScaleTest.TestScaleRowDown2Box_16
[       OK ] LibYUVScaleTest.TestScaleRowDown2Box_16 (329 ms)
[----------] 2 tests from LibYUVScaleTest (909 ms total)

TBR=kjellander@chromium.org
BUG=libyuv:718
TEST=LibYUVScaleTest.TestScaleRowUp2_16 and LibYUVScaleTest.TestScaleRowDown2Box_16

Change-Id: I457d44123f2751e5f71bf3935401fff74b8e9db2
Reviewed-on: https://chromium-review.googlesource.com/608876
Reviewed-by: Cheng Wang <wangcheng@google.com>
2017-08-15 22:28:15 +00:00