* Run on SiFive internal FPGA:
ARGBToRAW_Opt (~1.55x vs scalar)
ARGBToRGB24_Opt (~1.44x vs scalar)
RGB24ToARGB_Opt (~1.77x vs scalar)
LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10
Bug: libyuv:956
Change-Id: I26722f6848cd68684d95d9a7ee06ce0416e7985d
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4413083
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- update cpu_id to use "re" for fopen to avoid leaking handles if a thread is started while the file is open.
Bug: libyuv:958
Change-Id: I1af9de68fce12e440e1226fc8070634ccb1bf090
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4417176
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Fix -Wmemset-elt-size warning for GCC
- Use vin for inputs and vout for outputs
Bug: None
Change-Id: Iefd418dc884b4d062e1fdd9215319c8838c49eaa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4412065
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
The correct define can be found in scale_row.h
Change-Id: I633ed47006c7bd8014038493005c2d934489ff18
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4411353
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
under gcc 12.2.0 using -Wall:
source/row_any.cc: In function ‘void libyuv::DetileRow_16_Any_SSE2(const
uint16_t*, ptrdiff_t, uint16_t*, int)’:
source/row_any.cc:2287:11: warning: ‘memset’ used with length equal to
number of elements without multiplication by element size
[-Wmemset-elt-size]
2287 | memset(temp, 0, 16 * BPP); /* for msan */
| ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
source/row_any.cc:2308:1: note: in expansion of macro ‘ANYDETILE’
2308 | ANYDETILE(DetileRow_16_Any_SSE2, DetileRow_16_SSE2, uint16_t, 2, 15)
This increases the memset to the full buffer size, which may not be
strictly necessary.
Change-Id: Iea2fc649990ee84ea9aa8020d6f6b25e012b18fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4406599
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
* Run on SiFive internal FPGA:
RAWToARGB_Opt (~2x vs scalar)
RAWToRGBA_Opt (~2x vs scalar)
RAWToRGB24_Opt (~1.5x vs scalar)
LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10
Change-Id: I21a13d646589ea2aa3822cb9225f5191068c285b
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408357
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
The Clang static analyzer (scan-build) in LLVM 14 warns about
array index out of bounds in scaletbl[boxwidth - minboxwidth] in
ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two
elements. It's not clear the index boxwidth - minboxwidth is either 0 or
1.
Change-Id: I072476e86950154beffe6b1a89915755118b3cbd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
- Allows code to be optimized with clang 17 -flto-thin
- Bump version number to 1864 to allow detection of fix
- Apply clang format to standardize formatting; No impact on code generated
Bug: chromium:1424089
Change-Id: Ib745836b27915a5e4cb1d7d928ee52659360612b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370052
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Fix the algorithm for unpacking the lower 2 bits of M2T2 pixels.
Bug: b:258474032
Change-Id: Iea1d63f26e3f127a70ead26bc04ea3d939e793e3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4337978
Commit-Queue: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- Convert MergeUVRow_AVX512BW to assembly
- Enable MergeUVRow_AVX512BW for Windows with clangcl
- MergeUVRow_AVX2 use vpmovzxbw and vpsllw
- MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V
AMD Zen 4 640x360 100000 iterations
Was
AVX512 MergeUVPlane_Opt (884 ms)
AVX2 MergeUVPlane_Opt (945 ms)
AVX2 MergeUVPlane_16_Opt (2167 ms)
Now
AVX512 MergeUVPlane_Opt (865 ms)
AVX2 MergeUVPlane_Opt (943 ms)
SSE2 MergeUVPlane_Opt (973 ms)
AVX2 MergeUVPlane_16_Opt (2102 ms)
Bug: None
Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230
Reviewed-by: Fritz Koenig <frkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
- was dup of 8h but mul of 4s. now use umull
Bug: libyuv:951
Change-Id: If6cb01f5f006c2235886b81ce120642d7e24a9bb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166563
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Convert 10 and 12 bit biplanar formats to planar.
- Shift 10 MSB to 10 LSB
- P010 is similar to NV12 in layout, but uses 10 MSB of 16 bit values.
- I010 is similar to I420 in layout, but uses 10 LSB of 16 bit values.
Bug: libyuv:951
Change-Id: I16a1bc64239d0fa4f41810910da448bf5720935f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166560
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Minor variable name changes first/last to top/bottom
- Comments explaining rotate temporary buffers usage
- Add asserts for scale parameter
- Use NULL and stddef.h instead of 0
- Use void * for allocation in row.h
- Add () around size parameter in macros
Bug: libyuv:926, libyuv:949
Change-Id: Ib55417570926ccada0a0f8abd1753dc12e5b162e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4136762
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This initial implementation is based on current unoptimized code in webrtc using just plain for loops.
Bug: libyuv:949
Change-Id: Ic87ee49c3a0b62edbaaa4255c263c1f7be4ea02b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110782
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
The I410To420 implementation does a two step approach for scaling down and 10-to-8 bit conversion using the Y plane as temporal storage.
Bug: libyuv:950
Change-Id: I3d35fad4b99e17253230456233fbd947e013c0ec
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110783
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- MT2T support for source strides added, but only works for positive values.
- Reduced casting in row_common - one cast per assignment.
- scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t
Bug: libyuv:948, b/257266635, b/262468594
Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ernest Hua <ernesthua@google.com>
- fix ifdefs for DetilePlane_16 to use 16 bit versions, not 8 bit. (no functional change)
Bug: b/258474032
Change-Id: Ic07e02d9801e21126ebee0ceb5779aa712a493ce
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4034812
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Preserve xmm7 in ScaleRowUp2_Bilinear_12_SSSE3
- Previously xmm7 was used in ScaleRowUp2_Bilinear_12_SSSE3 without being preserved, which violates the Windows x64 calling conventions and can cause undefined behavior.
Bug: libyuv:945, 1218384
Change-Id: If18b292b588573355f9b4ba8c5b9c3fbe143d36b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3972137
Reviewed-by: Bruce Dawson <brucedawson@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Previously was C for both Y and UV.
Was BGRAToI420_Opt (17780 ms)
Now BGRAToI420_Opt (9546 ms)
Bug: b/253491233
Change-Id: Id103d8d5ba0fed0f7a427dd5955e1830275eff6b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3953131
Reviewed-by: Wan-Teh Chang <wtc@google.com>
- Optimized YUY2ToNV12 that reduces it from 3 steps to 2 steps
- Was SplitUV, memcpy Y, InterpolateUV
- Now YUY2ToY, YUY2ToNVUV
- rollback LIBYUV_UNLIMITED_DATA
3840x2160 1000 iterations:
Pixel 2 Cortex A73
Was YUY2ToNV12_Opt (6515 ms)
Now YUY2ToNV12_Opt (3350 ms)
AB7 Mediatek P35 Cortex A53
Was YUY2ToNV12_Opt (6435 ms)
Now YUY2ToNV12_Opt (3301 ms)
Skylake AVX2 x64
Was YUY2ToNV12_Opt (1872 ms)
Now YUY2ToNV12_Opt (1657 ms)
SSE2 x64
Was YUY2ToNV12_Opt (2008 ms)
Now YUY2ToNV12_Opt (1691 ms)
Windows Skylake AVX2 32 bit x86
Was YUY2ToNV12_Opt (2161 ms)
Now YUY2ToNV12_Opt (1628 ms)
Bug: libyuv:943
Change-Id: I6c2ba2ae765413426baf770b837de114f808f6d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3929843
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- YUV to RGB use linear for first and last row.
- add assert(yuvconstants)
- rename pointers to match row functions.
- use macros that match row functions.
- use 12 bit upsampler for conversions of 10 and 12 bits
Cortex A53 AArch32
I420ToRGB24_Opt (3627 ms)
I422ToRGB24_Opt (4099 ms)
I444ToRGB24_Opt (4186 ms)
I420ToRGB24Filter_Opt (5451 ms)
I422ToRGB24Filter_Opt (5430 ms)
AVX2
Was I420ToRGB24Filter_Opt (583 ms)
Now I420ToRGB24Filter_Opt (560 ms)
Neon Cortex A7
Was I420ToRGB24Filter_Opt (5447 ms)
Now I420ToRGB24Filter_Opt (5439 ms)
Bug: libyuv:938
Change-Id: I1731f2dd591073ae11a756f06574103ba0f803c7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906082
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- add tests for all single plane formats that reduce or stay same in size
Bug: b/242233673
Change-Id: Ic25d808114f11995ac56ea9c31b99f66ba36d345
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3828485
Reviewed-by: Wan-Teh Chang <wtc@google.com>
The code already exists to use a specific matrix. This CL simply
adds a function to use a generic YUV matrix for the conversion.
Bug: b/241451603
Change-Id: I0eea7e96a891d045905a9c963b56c053097029ec
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3820903
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- fix crash when width is not a multiple of 16
- apply clang format
- bump version
Bug: libyuv:940, b/240094327
Change-Id: Ic18e5b7b64f78f26e8b7d8440bf490a679bda200
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3812594
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Explicitly initialize the 'pad' field of RgbConstants to 0. This
prevents the following warning/error in some compilers:
error: missing field 'pad' initializer [-Werror,-Wmissing-field-initializers]
Bug: b/241008246
Change-Id: Id6a0beb75c5c709404290c75915049f8a3898c83
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3808044
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Fix the following MSVC warnings:
src\source\row_win.cc(117): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(136): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(155): warning C4309: 'argument': truncation of
constant value
src\source\row_win.cc(174): warning C4309: 'argument': truncation of
constant value
src\source\row_common.cc(1712): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1731): warning C4244: 'initializing':
conversion from 'int16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1786): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data
src\source\row_common.cc(1805): warning C4244: 'initializing':
conversion from 'uint16_t' to 'int8_t', possible loss of data
Bug: libyuv:939
Change-Id: Ie87ba6e716732d1ff1ae5c236dfd9cfdac13439d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3807105
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- Define HAS_SCALEROWUP2_BILINEAR_16_SSE2: it's now fixed.
- Correct function name to ScaleRowUp2_Bilinear_16_Any_SSE2:
this row function uses only SSE2 instructions.
Bug: libyuv:882
Change-Id: Ib1c7ac5b09997cb5b32bc54109d8c566af762433
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3800842
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
MergeRGB and SplitRGB use a register to point to 9 shuffle tables.
- fixes an out of registers error with -mcmodel=large
InterpolateRow_16To8_NEON improves performance for I210ToI420:
On Pixel 4 for 720p x1000 images
Was I210ToI420_Opt (608 ms)
Now I210ToI420_Opt (336 ms)
On Skylake Xeon
Was I210ToI420_Opt (259 ms)
Now I210ToI420_Opt (209 ms)
Bug: libyuv:931, libyuv:930
Change-Id: I20f8244803f06da511299bf1a2ffc7945eb35221
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717054
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
- Avoid stepping to height + 1 for bilinear filter 2nd row for last row of source
- Box filter ubsan fix for 3/4 and 3/8 scaling for 16 bit planar
- Height 1 asan fixes
Bug: libyuv:935, b/206716399
Change-Id: I56088520f2a884a37b987ee5265def175047673e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3717263
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Fixes chromium PaintCanvasVideoRendererTest.HighBitDepth
sqdmulh was creating a 9 bit value with rounding, and then shifted it right 1 with no rounding. The rounding had an off by 1 impact in some tests.
Pixel 3
C I010ToI420_Opt (749 ms)
Was sqdmulh I010ToI420_Opt (370 ms)
Now ushl I010ToI420_Opt (324 ms)
Pixel 4
C I010ToI420_Opt (581 ms)
Was sqdmulh I010ToI420_Opt (240 ms)
Now ushl I010ToI420_Opt (231 ms)
Bug: b/216321733, b/233233302
Change-Id: I26f673bb411401d1e4a8126bf22d61c649223e9b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3694143
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Add I210ToI420 to convert 10 bit 4:2:2 YUV to 4:2:0 8 bit
- Add NEON InterpolateRow_16 for fast 10 bit scaling
- When scaling up, set step to interpolate toward height - 1 to avoid buffer overread
- When scaling down, center the 2 rows used for source to achieve filtering.
- CopyPlane check for 0 size and return
Bug: libyuv:931, b/228605787, b/233233302, b/233634772, b/234558395, b/234340482
Change-Id: I63e8580710a57812b683c2fe40583ac5a179c4f1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3687552
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
If a width, height, and src/dst strides passed in are all 0, height is updated to 1 which means some CPU optimized functions may try to copy data when the dst rect is not valid.
Bug: b:234340482
Change-Id: I63be1c6ba05d669d67f5079d812acbec09c8f6c9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3689909
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Pixel 3
Was C I010ToI420_Opt (749 ms)
Now NEON I010ToI420_Opt (356 ms)
Pixel 4
Was C I010ToI420_Opt (581 ms)
Now NEON I010ToI420_Opt (163 ms)
Bug: b/233233302, b/233634772
Change-Id: I60a84648a66f77d97c0a7822b29bd18b8e3a3355
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3661401
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
This function reads 2 byte values and writes the 2nd byte to the destination.
It turns out this is useful for P010ToNV12 as well, so adding the planar function allows a high level to call this.
And adds UYVY support for something YUY2 already had. Which is writing the 1st byte.
Bug: b/233233302, b/233634772
Change-Id: I10a9454cb4f5b2c4ac5532fa86feddf78284d8b8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3659055
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The "vpackuswb %%xmm2,%%xmm0,%%xmm0" and "vmovdqu %%xmm0,(%1)"
instructions in ScaleRowUp2_Linear_SSSE3() are AVX instructions. They
cause an illegal instruction exception on CPUs that do not support AVX.
Bug: libyuv:927
Bug: chromium:1312551
Change-Id: I87b2aaf041e7d185e7e8fb07172d4f37482e9d08
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3585881
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
When doing 90 or 270 degrees rotation we need to do a rotate&scale of the UV planes, as there are no helper optimized functions to do this, we use the Y plane as temporal memory and perform each of the transforms independently:
First U plane is rotated, putting the result in the Y plane. After the rotation, the output has double the samples horizontally and half the samples vertically, so it is scaled into the final U plane. Same process is done with the V plane.
Last the Y plane that can be just rotated without scaling.
It would be great to have an optimized version for this, but maybe this is helpfull for triggering the discussions.
Bug: libyuv:926
Change-Id: I188af103c4d0e3f9522021b4bf2b63c9d5de8b93
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3568424
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- Unrolled to 16 pixels
- Take constants via structure, allowing different colorspace and channel order
- Use ADDHN to add 16.5 and take upper 8 bits of 16 bit values, narrowing to 8 bits
- clang-format applied, affecting mips code
On Cortex A510
Was RAWToJ400_Opt (1623 ms)
Now RAWToJ400_Opt (862 ms)
C RAWToJ400_Opt (1627 ms)
Bug: b/220171611
Change-Id: I06a9baf9650ebe2802fb6ff6dfbd524e2c06ada0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
When used in C enum keyword can't be eliminated.
Bug: libyuv:872
Change-Id: Iacff5a8bd84ec7caa1f90889e48f81ffc10071ae
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3513317
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This patch adds some deleted control macros so that these MSA
optimization functions can be called normally on mips platform.
There are also some modifications to adapt to the clang compiler.
Bug: libyuv:918
Change-Id: I6ffadc6582682b5eaeae2e0f4033d66d370b48b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3494667
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This patch fixes compilation errors caused by the removal of kUVBias
and two failed test cases of LibYUVConvertTest.RGB565ToI420_Opt and
LibYUVConvertTest.ARGB1555ToI420_Opt.
Bug: libyuv:918
Change-Id: I1a66bcd7ef616aacbeca5b4015013015ccdf0f18
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3477416
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Add support for MM21 to NV12 and I420 conversion, and add SIMD
optimizations for arm, aarch64, SSE2, and SSSE3 machines.
Bug: libyuv:915, b/215425056
Change-Id: Iecb0c33287f35766a6169d4adf3b7397f1ba8b5d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3433269
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Justin Green <greenjustin@google.com>
Optimize 20 functions in source/scale_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Optimize two functions in source/rotate_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: Idf670a1bc078f6284a499a292e0cb795f5b603b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Optimize 44 functions in source/row_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
1. Add supports for LSX and LASX.
2. Three optimization functions are added in loongarch/row_lasx.cc file:
I422ToARGBRow_LASX,I422ToRGBARow_LASX,I422AlphaToARGBRow_LASX.
Bug: libyuv:912, Bug: libyuv:913
Change-Id: I043c2704f99a5215724b5c0b7f97e6bf5f7a199b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329189
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- adapted from Android420ToI420, adding a rotation parameter
- SplitRotateUV added to rotate and split the UV channel of NV12 or NV21
- rename RotateUV functions to SplitRotateUV
Bug: b/203549508
Change-Id: I6774da5fb5908fdf1fc12393f0001f41bbda9851
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3251282
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- ubsan complains on unaligned tests when an int16 or int32 is stored unaligned in C.
Although current Intel, ARM, Mips and PPC can do unaligned load/store, its not guaranteed
and could crash a CPU that doesnt support it.
- unaligned tests use offset of 2 or 4, which ubsan accepts.
- unittest fills in random buffer with 2 bytes at a time instead of a short.
- row common functions for int16 types use 2 shorts instead of 1 int.
Bug: libyuv:908, b/203243873
Change-Id: Idf13fa901647d7b0975f1947291caa781999a9bc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3229782
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
- reenable Intel SIMD unaffected by BIT_EXACT
- add bit exact version of ARGBAttenuate, which uses ARM version of formula.
- add bit exact version of ARGBUnatenuate, which mimics the AVX code.
Apply clang format to cleanup code.
Bug: libyuv:908, b/202888439
Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
- C code use ARM path, so NEON and C match
- C used on Intel platforms, disabling AVX.
Bug: libyuv:908, b/202888439
Change-Id: Ie035a150a60d3cf4ee7c849a96819d43640cf020
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3223507
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
With "m" GCC generates a memory address with offset which is
not allowed with ld1 on aarch64. Change constraint to "Q" to
force address without offset.
Bug: chromium:819294, libyuv:903
Change-Id: Iaae24bc6882cdef823259040a37fdbfc31f91185
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922146
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- swap U and V when crop x is odd
- document YUY2 and UYVY formats
- apply clang-format
Bug: libyuv:902
Change-Id: I045e44c907f4a9eb625d7c024b669bb308055f32
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3039549
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Was
[ RUN ] LibYUVConvertTest.ARGB1555ToI420_Any
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure
Expected equality of these values:
dst_uv_c[i * kStrideUV + j]
Which is: '\x8B' (139)
dst_uv_opt[i * kStrideUV + j]
Which is: '\x92' (146)
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure
[ FAILED ] LibYUVConvertTest.ARGB1555ToI420_Any
Now
[ RUN ] LibYUVConvertTest.ARGB1555ToI420_Any
[ OK ] LibYUVConvertTest.ARGB1555ToI420_Any (0 ms)
Bug: libyuv:894, b/155722711
Change-Id: I12dcacd0ecfff4ede5693a2554e9bb10dc8586c1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2870484
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Add tests of all macros used by libyuv public headers
When a 1 step conversion is added, a 2 step test can compare
the old 2 step method to the 1 step. A 1 step unittest is
also added which compares C to SIMD. Making the 2 step
conversions measure performance of the 2 steps allows the
old 2 step performance to be compared to 1 step.
All macros used in public headers are added to an ifdef test.
Showing them in a unittest allows some diagnostics when
a test is failing.
Bug: libyuv:901
Change-Id: I7ffa6ed0cb3b506fa1b7fd4b7b1b729658c3c266
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2857916
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Use unsigned coefficient and signed UV value in YUVTORGB.
R=fbarchard@chromium.org
Bug: libyuv:862, libyuv:863
Change-Id: I32e58b2cee383fb98104c055beb0867a7ad05bfe
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2850016
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Use unsigned coefficients on Intel.
Make C, NEON and AVX2 match under LIBYUV_UNLIMITED_DATA.
Bug: libyuv:862, libyuv:863
Change-Id: I6c02147ea3c1875c4fc23863435aea86dcf5880a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2830180
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Refactor NEON YUVToRGB Assembly to support HBD data as input and output.
Work on YUV444 internally, remove subsampling in I444ToARGB.
libyuv_unittest --gtest_filter=*.NV??ToARGB_Opt:*UYVYToARGB_Opt:*YUY2ToARGB_Opt:*I4*ToARGB_Opt
Bug: libyuv:895, libyuv:862, libyuv:863
Change-Id: I05b56ea8ea56d9e523720b842fa6e4b122ed4115
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2810060
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
These functions merge high bit depth planar RGB pixels into packed format.
Change-Id: I506935a164b069e6b2fed8bf152cb874310c0916
Bug: libyuv:886, libyuv:889
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Planar functions pass depth instead of scale factor.
Row functions pass shift instead of depth. Add assert to C.
AVX shift instruction expects a single shift value in XMM.
Neon pass shift as input (not output).
Split Neon reimplemented as left shift on shorts by negative to achieve right shift.
Add planar unitests
Bug: libyuv:888
Change-Id: I8fe62d3d777effc5321c361cd595c58b7f93807e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2782086
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Add J420 output from RAW.
Optimize RGB24 and RAW To J420 on ARM by using NEON for the 2 step conversion.
Also fix sign-compare warning that was breaking Windows build
Bug: libyuv:887, b/183534734
Change-Id: I8c39334552dc0b28414e638708db413d6adf8d6e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2783382
Reviewed-by: Wan-Teh Chang <wtc@google.com>
MOV Vy.4s, Vx.4s is not a valid instruction form (even though LLVM allows it).
It should be MOV Vy.16b, Vx.16b (.8b for 64-bit variants)
Bug: None
Change-Id: I3c3b42288a0ebc275962fa3adad707b351d00d4c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780155
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
These NEON functions produce 16 pixels per iteration each, thus
use the mask 15, not 7.
Change-Id: I1f3eb691a9ca4af705393b2842b18b65f6878926
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2731801
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The following functions are added:
planar YUV:
I410ToAR30, I410ToARGB
planar YUVA:
I010AlphaToARGB, I210AlphaToARGB, I410AlphaToARGB
biplanar YUV:
P010ToARGB, P210ToARGB
P010ToAR30, P210ToAR30
biplanar functions can also handle 12 bit and 16 bit samples.
libyuv_unittest --gtest_filter=LibYUVConvertTest.*10*ToA*:LibYUVConvertTest.*P?1?ToA*
R=fbarchard@chromium.org
Bug: libyuv:751, libyuv:844
Change-Id: I2be02244dfa23335e1e7bc241fb0613990208de5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2707003
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Rename yuvconstants to .c and use round from math.h
Bug: libyuv:882, b/180472591
Change-Id: I70720bf3e0833ba00df0d721f12020fba0b07a03
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2706966
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
These are 16 bit bi-planar convert functions to scale UV plane to
Y plane's size using (bi)linear filter.
libyuv_unittest --gtest_filter=*ToP41*
R=fbarchard@chromium.org
Bug: libyuv:872
Change-Id: I3cb4fafe2b2c9eedd0d91cf4c619abb9ee107bc1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2690102
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
These are bi-planar convert functions to scale UV plane to Y plane's size using (bi)linear filter.
libyuv_unittest --gtest_filter=*ToNV24*
R=fbarchard@chromium.org
Change-Id: I3d98f833feeef00af3c903ac9ad0e41bdcbcb51f
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2682152
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
new color util to compute constants needed based on white point.
[ RUN ] LibYUVColorTest.TestFullYUVV
hist -2 -1 0 1 2
red 0 1627136 13670144 1479936 0
green 319285 3456836 9243059 3440771 317265
blue 0 1561088 14202112 1014016 0
Bug: libyuv:877, b/178283356
Change-Id: If432ebfab76b01302fdb416a153c4f26ca0832d6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2678859
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
These functions use (bi)linear filter, to scale U and V planes to the size of Y plane.
This will help enhance the quality of YUV to RGB conversion.
Also added 10bit and 12bit version:
I010ToI410
I210ToI410
I012ToI412
I212ToI412
libyuv_unittest --gtest_filter=LibYUVConvertTest.I42*ToI444*:LibYUVConvertTest.I*1*ToI41*
R=fbarchard@chromium.org
Change-Id: Ie4a711a5ba28f2ff1f44c021f7a5c149022264c5
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2658097
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
subq is only available for x64
sub works for both 32 bit x86 and 64 bit x64
Fox in row_gcc.cc for 32 bit x86 running out of registers.
Fix in row_neon.cc for split function argb paramter name.
Bug: libyuv:877, b/178283356, b/178713286
Change-Id: If2b12a2d6168eab08005a2cdf2c17a470a924dd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2656771
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
These functions convert between planar and interleaved ARGB,
optionally fill 255 to alpha / discard alpha.
This can help handle YUV(A) with Identity matrix, which is
basically planar ARGB.
libyuv_unittest --gtest_filter=LibYUVPlanarTest.*ARGBPlane*:LibYUVPlanarTest.*XRGBPlane*
R=fbarchard@google.com
Change-Id: I522a189b434f490ba1723ce51317727e7c5eb112
Bug: libyuv:877
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2649887
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
MAKEYUVCONSTANTS macro to generate struct for YUV to RGB
Fix I444AlphaToARGB unit test for ARM by adjusting C version to match Neon implementation.
Bug: libyuv:879, libyuv:878, libyuv:877, libyuv:862, b/178283356
Change-Id: Iedb171fbf668316e7d45ab9e3481de6205ed31e2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2646472
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Duplicate I420ToARGB prototype from convert_argb.h into convert_from.h for webrtc
Apply clang format for white spacing consistency.
Bug: libyuv:838, b/151375918
Change-Id: I0f667ca5350192710dbb135e92e73e18b46135e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2446613
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This includes:
- fixing a handrolled raw exec-based DEPS parser that was failing
to parse Str, similar to crbug.com/1106435.
- rolling chromium forward by nearly a year. (The last roll that
landed was crrev.com/c/1797295). This required a bunch of changes in
order to be able to successfully sync, run gn, and compile:
- switching the mirrors for three repositories to match chromium,
which switched in crrev.com/c/2062580.
- making libyuv write an empty gclient_args file
- adding a few build_override gn arguments
- adding nasm as a deps entry, as it's now required by libjpeg_turbo
- android:
- adding jdk, libunwindstack, and turbine
- rolling the android sdk
- rolling bazel and r8
- rolling the cipd packages managed by third_party/android_deps
- adding six and requests to .vpython for the test runner
- switching to memcpy in a few places to avoid SIGBUS errors on
arm due to unaligned reads
- linux:
- checking out instrumented libraries for msan (including adding
depot_tools to deps for the hook)
- mac:
- adding mac_xcode_version to gclient_gn_args
- win:
- limit mac_toolchain to checkout_mac
Bug: 1063768, 1097306
Change-Id: Idd86fffcdac174fd2f7899243a56af4f1ed8077e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2384320
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Wrong stride used in the for block.
Change the stride of x from 8 to 16.
Change-Id: Ic0cddf8413d1bd2decf5752b7a92c16f0345f0fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2355693
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Failed case: LibYUVConvertTest.TestI400 and LibYUVPlanarTest.ARGBBlend_Unattenuated.
This patch updates the I400ToARGBRow_MSA and ARGBBlendRow_MSA functions in the row_msa.cc file.
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Change-Id: Iec1a647af79be3ca1f2724802f6698deab60eac8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2330807
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
In commit 6cd1ff, C version has been updated.
This patch update the MMI and MSA version to mach C version.
Change-Id: Iea811e232f9c6019a80364d165f0255a37ce41b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227755
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Intel
Was ARGBSubtract_Opt (1760 ms)
Now ARGBSubtract_Opt (1546 ms)
ARM
Was ARGBAdd_Opt (1747 ms)
Now ARGBAdd_Opt (1260 ms)
Bug: None
Change-Id: I52436f6390b6b7313f2a8820833bb4f60ae958be
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2299639
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
In commit 0b8bb6, C version has been updated.
This patch update the MMI and MSA version to mach C version.
Change-Id: Ib28da3629a8465990c8e2185278a95af8c27a31d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227754
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
M420 is a row biplanar variation of NV12 supported on Microsoft webcams.
The code was hardcoded to bt.601 and should be jpeg, but the format is
very old and rare. Is a variation on NV12, so if someone needs it, it
can be re-implemented easily.
Bug: libyuv:858
Change-Id: I246167dba3c190cc76af741b8e91e58e68fde28f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2212608
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Neon move prfm after loads for all functions. Example performance improvement
Was
I444ToARGB_Opt (3275 ms)
I444ToNV12_Opt (1509 ms)
Now
I444ToARGB_Opt (2751 ms)
I444ToNV12_Opt (1367 ms)
Bug: libyuv:447
Change-Id: I78bf797b3600084c1eceb0be44cdbc9a575de803
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2189559
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
This patch is a complement for commit bed9292f2cbba2f8f9ff0f1635a8aa17a311f2f9.
1. Supplement inspection for macro HAS_***TOUV*ROW_MMI/MSA.
2. Reduce calls to function TestCpuFlag().
3. Fix a mistake in source/convert.cc: line 1105.
Change-Id: I5e7f9fe367fa0f6d1db6f7644c5b48d4ad85fedb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2169342
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Some processors support both MSA and MMI.
when they are enabled together, MSA will be preferd.
This patch move MSA initialization after MMI, so that
MSA can overide MMI and be setted to effective.
Change-Id: I8a52cce83ee4ec9727d47c99b287c9580329b149
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2155944
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
1. Switch to 8 bit precision.
2. Fix an error in the implementation of MMI and MSA.
About the error:
MMI and MSA implementation for RGBtoY and RGBToYJ used different
precision according to the C implementation( The C version has been
unified in commit fce0fed542001577e6b10f4cf859e0fa1774974e). This
patch unifies the precision to 8 bit for RGBToYJ in MMI and MSA.
Change-Id: Ic6a6e424d27a2f049b0c954f03174192d2beb091
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2155608
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This unittest help to test MipsCpuCaps.
Change-Id: I9e0ceeed0e5243446eaafa27e8de4c5f8163b09e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2133314
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
1. Refactored function MipsCpuCaps.
2. allow msa and mmi can be enabled together.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Change-Id: I7330d0551a6a167e4c76d37e4defcc20783f5815
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2131145
Reviewed-by: Hsiu Wang <hsiu@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Call a row function for each row, based on ARGBToI400 code.
But implement row functions as 2 step conversion. Adds the
row functions:
RAWToYJ, RGBToYJ, SSSE3 and AVX2 versions, and Any versions.
The smaller row buffer is more cache friendly on large images.
The max cache size can be configured, and is currently:
// Maximum temporary width for wrappers to process at a time, in pixels.
And the row buffer is
SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4]);
So 8192 bytes are used for the row buffer, leaving the rest for source
and destination buffers.
blaze-bin/third_party/libyuv/libyuv_test '--gunit_filter=*R*To?400_Opt' --libyuv_width=3600 --libyuv_height=2500 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 | sortms
Was
RAWToJ400_Opt (3996 ms)
ARGBToI400_Opt (3964 ms)
RGB24ToJ400_Opt (3960 ms)
ARGBToJ400_Opt (3909 ms)
RGBAToJ400_Opt (3885 ms)
Now
ARGBToJ400_Opt (4091 ms)
ARGBToI400_Opt (3936 ms)
RGBAToJ400_Opt (3428 ms)
RGB24ToJ400_Opt (3324 ms)
RAWToJ400_Opt (3309 ms)
Bug: libyuv:854, b/147753855
Change-Id: Ieb65fbda94e812c737f4c3c74107354b73c4bcd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2016203
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This adds some missing prototypes from the BT.2020 CL as well as expands
the H444 and J444 results.
BUG=960620, libyuv:845, b/129864744
Change-Id: I8ea3959379f1bb2edb857d4eb90fb9a1f6aa4e03
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1899093
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This pulls in the changes that Firefox made to add BT.2020 support as well
as expands them to the existing 10-bit support. So we now have the following
input formats: U420, U422, U444, U010.
BUG=960620, libyuv:845
Change-Id: If0c47853a465d0ed660f849db08e71489fe1b9c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1884468
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Recent versions of Clang started warning when the loop doesn't get
vectorized, such as when compiling with -Oz (see bug).
To fix the build, remove the pragma and let the compiler decide on its
own when to vectorize.
Bug: chromium:1015665
Change-Id: I40a610c9e0d94cfd577a6cd2b01e6fdaa08bef7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1872580
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Hans Wennborg <hans@chromium.org>
Replace ARM64 only row function with high level function
that implements SSSE3, 32 bit Neon and C.
Compared to 2 step RAWToARGB + ARGBToRGBA on row level:
3.1x faster on ARM
6.2% faster on Intel
BUG=b/140748379
Change-Id: Ia8636d9e4fcdbe10b8c2e81610a54728e29845cd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1860914
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Neon and GCC Intel optimized, but win32 and mips not optimized.
BUG=libyuv:842, b/141482243
Change-Id: Ia56fa85c8cc1db51f374bd0c89b56d21ec94afa7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1825642
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Based on ARGBShuffle but with count adjusted and new shuffle mask
BUG=libyuv:809
Change-Id: Idd936ee6bedcf285607a68c2fc54d876b4becc01
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1711882
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Apply clang-format to fix jpeg if() for lint fix.
Change comments about 4th pixel for open source compliance.
Rename UVToVU to SwapUV for consistency with MergeUV.
BUG=b/135532289, b/136515133
Change-Id: I9ce377c57b1d4d8f8b373c4cb44cd3f836300f79
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1685936
Reviewed-by: Chong Zhang <chz@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Includes a rounding change for neon.
BUG=b/135532289
Change-Id: I36ffb57b55db6c64804ad169def865be1ac6d66e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1684439
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
Gaussian blur low levels ported to 32 bit neon.
But they are not hooked up to anything but a unittest.
Bug:b/248041731, b/132108021, b/129908793
Change-Id: Iccebb8ffd6b719810aa11dd770a525227da4c357
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1611206
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
Alternatives to RGB24 and AYUV for working with GPU.
BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*NV21To???24* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
R=rrwinterton@gmail.com
Change-Id: I5559c63f4bd4c847492fcb1571f7b03c58146689
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1501735
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
I422ToRGB24 is implemented as a C wrapper for Intel, calling
I422ToARGB and ARGBToRGB24. The ARGBToRGB24 for AVX2 requires 32
pixels.
This CL increases the width alignment required to use I422ToRGB24_AVX2
TBR=rrwinterton0gmail.com
Bug: libyuv:822, b:118386049
Change-Id: I4454f4eece33fbd5f593655f577c9ef5c00d1f63
Tested: locally tested with app that crashed using this function.
Reviewed-on: https://chromium-review.googlesource.com/c/1299931
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Add jpeg to NV21 conversions, unittests and conversions
for I444, I422, I420 and I420 to NV21 needed for internals.
Bug: libyuv:820
Change-Id: Idf0f15f91307e80a82cd23943f6eed5508f13fe2
Tested: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*MJ*
Reviewed-on: https://chromium-review.googlesource.com/c/1297710
Reviewed-by: Johann Koenig <johannkoenig@google.com>
RAW is a big endian style RGB buffer with R first in memory, then G and B.
Convert NV21 and NV12 to RAW format.
Performance on SkylakeX for 720p with AVX2
I420ToRAW_Opt (388 ms)
H420ToRAW_Opt (371 ms)
NV12ToRAW_Opt (341 ms)
NV21ToRAW_Opt (339 ms)
SSSE3
I420ToRAW_Opt (507 ms)
H420ToRAW_Opt (481 ms)
NV12ToRAW_Opt (498 ms)
NV21ToRAW_Opt (493 ms)
C
I420ToRAW_Opt (2287 ms)
H420ToRAW_Opt (2246 ms)
NV12ToRAW_Opt (2191 ms)
NV21ToRAW_Opt (2204 ms)
Performance on Pixel 2 for 720p
out/Release/bin/run_libyuv_unittest -v -t 7200 --gtest_filter=*NV??ToR*Opt --libyuv_repeat=1000 --libyuv_width=1280 --libyuv_height=720
LibYUVConvertTest.NV12ToRGB24_Opt (1739 ms)
LibYUVConvertTest.NV21ToRGB24_Opt (1734 ms)
LibYUVConvertTest.NV12ToRAW_Opt (1719 ms)
LibYUVConvertTest.NV21ToRAW_Opt (1691 ms)
LibYUVConvertTest.NV12ToRGB565_Opt (2152 ms)
Bug: libyuv:778, b:117522975
Test: add new NV21ToRAW and NV12ToRAW tests
Change-Id: Ieabb68a2c6d8c26743e609c5696c81bb14fb253f
Reviewed-on: https://chromium-review.googlesource.com/c/1272615
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
The original src_u calculation of FOURCC_I420 shifted half width if
crop_y is odd.
This CL fixs the problem and also add a test case for it.
Bug: b:115278653
Test: pass libyuv_unittest
Change-Id: Ia9732d22e64e13de26df47726ba44ad1c5a06484
Reviewed-on: https://chromium-review.googlesource.com/c/1258743
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
When loading or storing the data, the unaligned address will greatly degrade
the optimization performance, so non-aligned access instructions are required
on the loongson platform.
Also delete the optimization function:ScaleARGBFilterCols_MMI,
because it degraded the performance.
BUG=libyuv:804
R=fbarchard@chromium.org
Change-Id: If4c15886a21cdcbac7ae8b336292e4549acf1e47
Reviewed-on: https://chromium-review.googlesource.com/1164627
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.
Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).
In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.
BUG=libyuv:804
Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
the built in __msa_ld_b() expects a void * without const.
Cast pointers to void * to avoid build warning.
TBR=johannkoenig@google.com
Bug: libyuv:805
Change-Id: Iabc4820ecf4a3a7dcb0063e67ce276ae2a4f0501
Tested: gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest
Reviewed-on: https://chromium-review.googlesource.com/1125400
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
H420/H422 are bt.720 variants
TBR=braveyao@chromium.org
BUG=libyuv:799
TESTED=try bots tested build on all platforms
Change-Id: I007d8981d91ca0748c59403759109bbcd88f286c
Reviewed-on: https://chromium-review.googlesource.com/1115719
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Consistently use one style of line endings for the repository
Change-Id: Idd70e3d7f3a7a6641b268a81e51eebf9c705b67d
Reviewed-on: https://chromium-review.googlesource.com/1107877
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Avoid warnings regarding loss of qualifiers:
warning: cast from type ‘const uint8_t* {aka const unsigned char*}’
to type ‘v16i8* {aka __vector(16) signed char*}’ casts away
qualifiers
BUG=libyuv:793
Change-Id: Ie0d215bc07b49285b5d06ee91ccc2c9a7979799e
Reviewed-on: https://chromium-review.googlesource.com/1107879
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This reverts commit a8aa921c4614f9d6a0e8f3459648ca1ae75cdbe6.
Reason for revert: breaks a webrtc unittest on Windows.
https://bugs.chromium.org/p/webrtc/issues/detail?id=9263&can=2&start=0&num=100&q=&colspec=ID%20Pri%20M%20ReleaseBlock%20Component%20Status%20Owner%20Summary&groupby=&sort=
Original change's description:
> Allow negative height when ConvertToI420/ARGB is called with NV12/NV21
>
> ConvertToI420 and ConvertToARGB support the use of a negative height
> parameter to flip the image vertically. When converting from NV12 or
> NV21 this parameter was misinterpreted, resulting in invalid output.
> This CL introduces the use of abs_src_height to correctly calculate
> the location of the source UV plane.
>
> The sign of crop_height is not used, to reduce confusion ConvertToI420
> and ConvertToARGB no longer accept negative crop height.
>
> Unit tests for Android420ToI420 are updated to fix miscalculation of
> src_stride_uv, fix incorrect pixel strides, and to test inversion.
> New unit tests are included to test inversion for ConvertToARGB,
> ConvertToI420, Android420ToARGB, and Android420ToABGR.
> For consistency the test NV12Crop is renamed ConvertToI420_NV12_Crop.
>
> Bug: libyuv:446
> Test: out/Release/libyuv_unittest --gtest_filter=*.ConvertTo*:*.Android420To*
> Change-Id: Idc98e62671cb30272cfa7e24fafbc8b73712f7c6
> Reviewed-on: https://chromium-review.googlesource.com/994074
> Commit-Queue: Frank Barchard <fbarchard@chromium.org>
> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
TBR=fbarchard@chromium.org,robert@bares.me
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: libyuv:446, chromium:9263, libyuv:801
Change-Id: I7c55b3fcb477f9754c249b9c2c54b24da2c29283
Reviewed-on: https://chromium-review.googlesource.com/1081267
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Mask was set to 32, but should have been 31.
BUG=libyuv:798
TESTED=try bots tested
Change-Id: I6120928873a4a2f1efef907d8e8296ca8c20bb03
Reviewed-on: https://chromium-review.googlesource.com/1054830
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
ConvertToI420 and ConvertToARGB support the use of a negative height
parameter to flip the image vertically. When converting from NV12 or
NV21 this parameter was misinterpreted, resulting in invalid output.
This CL introduces the use of abs_src_height to correctly calculate
the location of the source UV plane.
The sign of crop_height is not used, to reduce confusion ConvertToI420
and ConvertToARGB no longer accept negative crop height.
Unit tests for Android420ToI420 are updated to fix miscalculation of
src_stride_uv, fix incorrect pixel strides, and to test inversion.
New unit tests are included to test inversion for ConvertToARGB,
ConvertToI420, Android420ToARGB, and Android420ToABGR.
For consistency the test NV12Crop is renamed ConvertToI420_NV12_Crop.
Bug: libyuv:446
Test: out/Release/libyuv_unittest --gtest_filter=*.ConvertTo*:*.Android420To*
Change-Id: Idc98e62671cb30272cfa7e24fafbc8b73712f7c6
Reviewed-on: https://chromium-review.googlesource.com/994074
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
When casting a const value, ensure the cast is const as well.
BUG=webm:1509
Change-Id: I5b597fdcc148d111e9824bc7cf918fc5f24e970f
Reviewed-on: https://chromium-review.googlesource.com/996553
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
iOS simulator has the option to build with xcode instead of clang.
GN use_xcode_clang=true enables the xcode build.
As of version Xcode 9.2, the clang version used does not support
AVX512. The version reported is version 9, but for normal clang,
version 7 is sufficient to AVX512.
When a version of XCode does support AVX512, the version check can
be updated to allow AVX512 for newer versions of XCode.
with XCode 9.2 the following macro is set.
__APPLE_CC__ 6000
Bug: libyuv:789
Test: gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"x86\" use_xcode_clang=true"
Change-Id: I5a9a0b4a2760c7d09a4bcb464b3668979113b07e
Reviewed-on: https://chromium-review.googlesource.com/991595
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Bug: libyuv:789
Test: builds locally on linux with clang
Change-Id: I3000494d4b0b18f59d7852bc1bc0c9e422d2d63a
Reviewed-on: https://chromium-review.googlesource.com/987331
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Scalar multiply expects a 'd' register. The "w" (float) uses 's' for float
and wont work with the multiply in 32 bit (it does in 64 bit).
A vector 2 of float passes as 'd' register.
A vector 4 of float passes as 'q' register.
This change copies the float into the first entry of a vector 2
and passes that. The optimizer removes the extra copy, allowing
the single float to use referenced as
Test: LibYUVPlanarTest.TestByteToFloat
Bug: libyuv:786
Change-Id: I8773c5bae043c7b84e1d1db7fdea6731aa0b1323
Reviewed-on: https://chromium-review.googlesource.com/973984
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
row.h adds CLANG_HAS_AVX512
function ifdefs in row.h for avx512
source code ifdefed function by function for
avx512 and avx2.
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: If32b51459685d0d5785c5c1e94c8f668f8e74b55
Reviewed-on: https://chromium-review.googlesource.com/982402
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Adds a method that forces the CPU flags. Useful when using libyuv inside
a sandboxed process which may not have access to the file system.
Bug: libyuv:787
Change-Id: I01f71e39a7301085d9de388eba930b4cac0fd7be
Reviewed-on: https://chromium-review.googlesource.com/972338
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Move getenv to unittest.cc to allow libyuv to be
run in sandbox for x86, x64 and aarch64
Bug: libyuv:767
Test: unittests still run and respect environment variables
Change-Id: I84cb1717977828776142b51c029774b3e6b142a3
Reviewed-on: https://chromium-review.googlesource.com/969645
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Use VMBI instructions but on AVX2 registers to avoid clockrate change.
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: Id4f8ad1e0e142a380c8a46c5eab90ce145a10edd
Reviewed-on: https://chromium-review.googlesource.com/956609
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Change unittest flags to decimal so they can be used for --libyuv_cpu_info=
Add environment variables to disable AVX 512 bits.
Bug: libyuv:784
Test: LibYUVBaseTest.TestCpuHas
Change-Id: Iea6704368fbe9f6d3395933da7993fb2a3453225
Reviewed-on: https://chromium-review.googlesource.com/957704
Reviewed-by: richard winterton <rrwinterton@gmail.com>
AVX2 port of SSSE3 conversion to output 24 bit RGB
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I14f7815522d1b790ecd2bb39d9a3441e803b694a
Reviewed-on: https://chromium-review.googlesource.com/953303
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Use 2 step conversion for NV21ToRGB24 to leverage AVX2
low levels instead of C.
Was C
NV21ToRGB24_Opt (882 ms)
Now SSSE3
NV21ToRGB24_Opt (218 ms)
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I58faf766bbec4cc595aab2e217f6c874dd4b4363
Reviewed-on: https://chromium-review.googlesource.com/951629
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Each byte is converted to float (0.0 to 255.0) and then multiplied
by a scale parameter.
Bug: None
Test: arm 64 build passes.
Change-Id: I04736798540b8d985f60abdf0388e24a209d075b
Reviewed-on: https://chromium-review.googlesource.com/930226
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ian Field <ianfield@google.com>