1637 Commits

Author SHA1 Message Date
Frank Barchard
d62ee21e66 UVScale fix for vertical-only scaling
Bug: b/228841445
Change-Id: I0342856e1bfcea69851d718459d66926bb170219
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3595240
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas-Sanchez <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-20 01:27:33 +00:00
Frank Barchard
3c0f408607 Enable HAS_DETILESPLITUVROW_NEON
On Pixel 4
Was C
AArch64 TestDetileSplitUVPlane_Benchmark (935 ms)
AArch32 TestDetileSplitUVPlane_Benchmark (787 ms)

Now NEON
AArch64 TestDetileSplitUVPlane_Benchmark (248 ms)
AArch32 TestDetileSplitUVPlane_Benchmark (256 ms)

Bug: libyuv:915, b/228518489
Change-Id: Ib82b702c1321285738c044ad8c2a7805b16f074a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3594524
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-19 21:17:03 +00:00
Frank Barchard
eec8dd37e8 Change ScaleUVRowUp2_Biinear_16_SSE2 to SSE41
Bug: libyuv:928

xed -i scale_gcc.o:
SYM ScaleUVRowUp2_Linear_16_SSE2:
XDIS 0: LOGICAL   SSE2       660FEFED                 pxor xmm5, xmm5
XDIS 4: SSE       SSE2       660F76E4                 pcmpeqd xmm4, xmm4
XDIS 8: SSE       SSE2       660F72D41F               psrld xmm4, 0x1f
XDIS d: SSE       SSE2       660F72F401               pslld xmm4, 0x1
XDIS 12: DATAXFER  SSE2       F30F7E07                 movq xmm0, qword ptr [rdi]
XDIS 16: DATAXFER  SSE2       F30F7E4F04               movq xmm1, qword ptr [rdi+0x4]
XDIS 1b: SSE       SSE2       660F61C5                 punpcklwd xmm0, xmm5
XDIS 1f: SSE       SSE2       660F61CD                 punpcklwd xmm1, xmm5
XDIS 23: DATAXFER  SSE2       660F6FD0                 movdqa xmm2, xmm0
XDIS 27: DATAXFER  SSE2       660F6FD9                 movdqa xmm3, xmm1
XDIS 2b: SSE       SSE2       660F70D24E               pshufd xmm2, xmm2, 0x4e
XDIS 30: SSE       SSE2       660F70DB4E               pshufd xmm3, xmm3, 0x4e
XDIS 35: SSE       SSE2       660FFED4                 paddd xmm2, xmm4
XDIS 39: SSE       SSE2       660FFEDC                 paddd xmm3, xmm4
XDIS 3d: SSE       SSE2       660FFED0                 paddd xmm2, xmm0
XDIS 41: SSE       SSE2       660FFED9                 paddd xmm3, xmm1
XDIS 45: SSE       SSE2       660FFEC0                 paddd xmm0, xmm0
XDIS 49: SSE       SSE2       660FFEC9                 paddd xmm1, xmm1
XDIS 4d: SSE       SSE2       660FFEC2                 paddd xmm0, xmm2
XDIS 51: SSE       SSE2       660FFECB                 paddd xmm1, xmm3
XDIS 55: SSE       SSE2       660F72D002               psrld xmm0, 0x2
XDIS 5a: SSE       SSE2       660F72D102               psrld xmm1, 0x2
XDIS 5f: SSE       SSE4       660F382BC1               packusdw xmm0, xmm1
XDIS 64: DATAXFER  SSE2       F30F7F06                 movdqu xmmword ptr [rsi], xmm0
XDIS 68: MISC      BASE       488D7F08                 lea rdi, ptr [rdi+0x8]
XDIS 6c: MISC      BASE       488D7610                 lea rsi, ptr [rsi+0x10]
XDIS 70: BINARY    BASE       83EA04                   sub edx, 0x4
XDIS 73: COND_BR   BASE       7F9D                     jnle 0x12 <ScaleUVRowUp2_Linear_16_SSE2+0x12>
XDIS 75: RET       BASE       C3                       ret

SYM ScaleUVRowUp2_Bilinear_16_SSE2:
XDIS 0: LOGICAL   SSE2       660FEFFF                 pxor xmm7, xmm7
XDIS 4: SSE       SSE2       660F76F6                 pcmpeqd xmm6, xmm6
XDIS 8: SSE       SSE2       660F72D61F               psrld xmm6, 0x1f
XDIS d: SSE       SSE2       660F72F603               pslld xmm6, 0x3
XDIS 12: DATAXFER  SSE2       F30F7E07                 movq xmm0, qword ptr [rdi]
XDIS 16: DATAXFER  SSE2       F30F7E4F04               movq xmm1, qword ptr [rdi+0x4]
XDIS 1b: SSE       SSE2       660F61C7                 punpcklwd xmm0, xmm7
XDIS 1f: SSE       SSE2       660F61CF                 punpcklwd xmm1, xmm7
XDIS 23: DATAXFER  SSE2       660F6FD0                 movdqa xmm2, xmm0
XDIS 27: DATAXFER  SSE2       660F6FD9                 movdqa xmm3, xmm1
XDIS 2b: SSE       SSE2       660F70D24E               pshufd xmm2, xmm2, 0x4e
XDIS 30: SSE       SSE2       660F70DB4E               pshufd xmm3, xmm3, 0x4e
XDIS 35: SSE       SSE2       660FFED0                 paddd xmm2, xmm0
XDIS 39: SSE       SSE2       660FFED9                 paddd xmm3, xmm1
XDIS 3d: SSE       SSE2       660FFEC0                 paddd xmm0, xmm0
XDIS 41: SSE       SSE2       660FFEC9                 paddd xmm1, xmm1
XDIS 45: SSE       SSE2       660FFEC2                 paddd xmm0, xmm2
XDIS 49: SSE       SSE2       660FFECB                 paddd xmm1, xmm3
XDIS 4d: DATAXFER  SSE2       F30F7E1477               movq xmm2, qword ptr [rdi+rsi*2]
XDIS 52: DATAXFER  SSE2       F30F7E5C7704             movq xmm3, qword ptr [rdi+rsi*2+0x4]
XDIS 58: SSE       SSE2       660F61D7                 punpcklwd xmm2, xmm7
XDIS 5c: SSE       SSE2       660F61DF                 punpcklwd xmm3, xmm7
XDIS 60: DATAXFER  SSE2       660F6FE2                 movdqa xmm4, xmm2
XDIS 64: DATAXFER  SSE2       660F6FEB                 movdqa xmm5, xmm3
XDIS 68: SSE       SSE2       660F70E44E               pshufd xmm4, xmm4, 0x4e
XDIS 6d: SSE       SSE2       660F70ED4E               pshufd xmm5, xmm5, 0x4e
XDIS 72: SSE       SSE2       660FFEE2                 paddd xmm4, xmm2
XDIS 76: SSE       SSE2       660FFEEB                 paddd xmm5, xmm3
XDIS 7a: SSE       SSE2       660FFED2                 paddd xmm2, xmm2
XDIS 7e: SSE       SSE2       660FFEDB                 paddd xmm3, xmm3
XDIS 82: SSE       SSE2       660FFED4                 paddd xmm2, xmm4
XDIS 86: SSE       SSE2       660FFEDD                 paddd xmm3, xmm5
XDIS 8a: DATAXFER  SSE2       660F6FE0                 movdqa xmm4, xmm0
XDIS 8e: DATAXFER  SSE2       660F6FEA                 movdqa xmm5, xmm2
XDIS 92: SSE       SSE2       660FFEE0                 paddd xmm4, xmm0
XDIS 96: SSE       SSE2       660FFEEE                 paddd xmm5, xmm6
XDIS 9a: SSE       SSE2       660FFEE0                 paddd xmm4, xmm0
XDIS 9e: SSE       SSE2       660FFEE5                 paddd xmm4, xmm5
XDIS a2: SSE       SSE2       660F72D404               psrld xmm4, 0x4
XDIS a7: DATAXFER  SSE2       660F6FEA                 movdqa xmm5, xmm2
XDIS ab: SSE       SSE2       660FFEEA                 paddd xmm5, xmm2
XDIS af: SSE       SSE2       660FFEC6                 paddd xmm0, xmm6
XDIS b3: SSE       SSE2       660FFEEA                 paddd xmm5, xmm2
XDIS b7: SSE       SSE2       660FFEE8                 paddd xmm5, xmm0
XDIS bb: SSE       SSE2       660F72D504               psrld xmm5, 0x4
XDIS c0: DATAXFER  SSE2       660F6FC1                 movdqa xmm0, xmm1
XDIS c4: DATAXFER  SSE2       660F6FD3                 movdqa xmm2, xmm3
XDIS c8: SSE       SSE2       660FFEC1                 paddd xmm0, xmm1
XDIS cc: SSE       SSE2       660FFED6                 paddd xmm2, xmm6
XDIS d0: SSE       SSE2       660FFEC1                 paddd xmm0, xmm1
XDIS d4: SSE       SSE2       660FFEC2                 paddd xmm0, xmm2
XDIS d8: SSE       SSE2       660F72D004               psrld xmm0, 0x4
XDIS dd: DATAXFER  SSE2       660F6FD3                 movdqa xmm2, xmm3
XDIS e1: SSE       SSE2       660FFED3                 paddd xmm2, xmm3
XDIS e5: SSE       SSE2       660FFECE                 paddd xmm1, xmm6
XDIS e9: SSE       SSE2       660FFED3                 paddd xmm2, xmm3
XDIS ed: SSE       SSE2       660FFED1                 paddd xmm2, xmm1
XDIS f1: SSE       SSE2       660F72D204               psrld xmm2, 0x4
XDIS f6: SSE       SSE4       660F382BE0               packusdw xmm4, xmm0
XDIS fb: DATAXFER  SSE2       F30F7F22                 movdqu xmmword ptr [rdx], xmm4
XDIS ff: SSE       SSE4       660F382BEA               packusdw xmm5, xmm2
XDIS 104: DATAXFER  SSE2       F30F7F2C4A               movdqu xmmword ptr [rdx+rcx*2], xmm5
XDIS 109: MISC      BASE       488D7F08                 lea rdi, ptr [rdi+0x8]
XDIS 10d: MISC      BASE       488D5210                 lea rdx, ptr [rdx+0x10]
XDIS 111: BINARY    BASE       4183E804                 sub r8d, 0x4
XDIS 115: COND_BR   BASE       0F8FF7FEFFFF             jnle 0x12 <ScaleUVRowUp2_Bilinear_16_SSE2+0x12>
XDIS 11b: RET       BASE       C3                       ret

Change-Id: Ia20860e9c3c45368822cfd8877167ff0bf973dcc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3587602
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-15 18:46:09 +00:00
Wan-Teh Chang
18f9110516 Avoid AVX instructions in ScaleRowUp2_Linear_SSSE3
The "vpackuswb   %%xmm2,%%xmm0,%%xmm0" and "vmovdqu     %%xmm0,(%1)"
instructions in ScaleRowUp2_Linear_SSSE3() are AVX instructions. They
cause an illegal instruction exception on CPUs that do not support AVX.

Bug: libyuv:927
Bug: chromium:1312551
Change-Id: I87b2aaf041e7d185e7e8fb07172d4f37482e9d08
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3585881
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2022-04-15 00:18:39 +00:00
Frank Barchard
2ad73733d9 I422Rotate update to remove name space for ios build warning
- Remove libyuv:: from within libyuv to resolve a build warning on IOS.
- Check src_y parameter is not NULL if there is a dst_y parameter
- Apply clang-format
- Bump version

Performance on Intel Skylake Xeon
ARGBRotate90_Opt (795 ms)
I420Rotate90_Opt (283 ms)
I422Rotate90_Opt (867 ms)  <-- scales and rotates
I444Rotate90_Opt (565 ms)
NV12Rotate90_Opt (289 ms)

Performance on Pixel 4 (Cortex A76)
ARGBRotate90_Opt (4208 ms)
I420Rotate90_Opt (273 ms)
I422Rotate90_Opt (1207 ms)
I444Rotate90_Opt (718 ms)
NV12Rotate90_Opt (282 ms)


Bug: libyuv:926
Change-Id: I42e1b93a9595f6ed075918e91bed977dd3d23f6f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3576778
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-07 21:06:44 +00:00
Sergio Garcia Murillo
a77d615e10 Add tentative I422Rotate.
When doing 90 or 270 degrees rotation we need to do a rotate&scale of the UV planes, as there are no helper optimized functions to do this, we use the Y plane as temporal memory and perform each of the transforms independently:

First U plane is rotated, putting the result in the Y plane. After the rotation, the output has double the samples horizontally and half the samples vertically, so it is scaled into the final U plane. Same process is done with the V plane.

Last the Y plane that can be just rotated without scaling.

It would be great to have an optimized version for this, but maybe this is helpfull for triggering the discussions.

Bug: libyuv:926
Change-Id: I188af103c4d0e3f9522021b4bf2b63c9d5de8b93
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3568424
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-04-06 23:49:35 +00:00
Sergio Garcia Murillo
4589081cea Add I422 and I210 functions
Bug: webrtc:13826
Change-Id: I68235a668abecf76133f7b89472b192b1442bed4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3557217
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-31 15:30:53 +00:00
Wan-Teh Chang
f4d2530846 Declare RgbConstants structs as static const
These RgbConstants structs were added in
https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023. They
should be declared as static const. These four symbols were detected by
Chrome's android-binary-size trybot as "Mutable Constants".

Change-Id: I3b4d4ff4b32e261ba528c07647b9d69ac368ab5b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3553035
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2022-03-25 20:39:38 +00:00
Wan-Teh Chang
ebd9e130f0 Fix bugs in I010AlphaToARGBMatrixBilinear()
Add a missing increment of src_a and ARGBAttenuateRow() call.

Bug: libyuv:922
Change-Id: I26e04e70c6a1a231cbe54b60c249f4c2e8af112a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3549976
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2022-03-25 15:39:52 +00:00
Wan-Teh Chang
173ed374c0 Add null pointer checks for the src_a parameters
Change-Id: Icc96e18eab07080c18b6542171a340c97f059c78
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3550016
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
2022-03-25 15:14:38 +00:00
Frank Barchard
124bf08fee RGBScale function using 3 steps: RGB24ToARGB, ARGBScale, ARGBToRGB24
1920x1080 to/from 1280x720 to ARGB on Intel Skylake Xeon
RGBScaleTo1920x1080_Bilinear (2625 ms)
RGBScaleFrom1920x1080_Bilinear (2115 ms)
ARGBScaleTo1920x1080_Bilinear (1668 ms)
ARGBScaleFrom1920x1080_Bilinear (1164 ms)

Bug: b/224814071
Change-Id: Ifc7611b597409771728b13c9c39e5a7e06131021
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3537341
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-19 01:44:06 +00:00
Frank Barchard
95b14b2446 RAWToJ400 faster version for ARM
- Unrolled to 16 pixels
- Take constants via structure, allowing different colorspace and channel order
- Use ADDHN to add 16.5 and take upper 8 bits of 16 bit values, narrowing to 8 bits
- clang-format applied, affecting mips code

On Cortex A510
Was RAWToJ400_Opt (1623 ms)
Now RAWToJ400_Opt (862 ms)

C   RAWToJ400_Opt (1627 ms)

Bug: b/220171611
Change-Id: I06a9baf9650ebe2802fb6ff6dfbd524e2c06ada0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-18 07:22:36 +00:00
Yuan Tong
25d0a5110b Fix FilterMode enum type
When used in C enum keyword can't be eliminated.

Bug: libyuv:872
Change-Id: Iacff5a8bd84ec7caa1f90889e48f81ffc10071ae
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3513317
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 21:25:44 +00:00
Yuan Tong
ebb27d6916 Add YUV to RGB conversion function with filter parameter
Add the following functions:

I420ToARGBMatrixFilter
I422ToARGBMatrixFilter
I010ToAR30MatrixFilter
I210ToAR30MatrixFilter
I010ToARGBMatrixFilter
I210ToARGBMatrixFilter
I420AlphaToARGBMatrixFilter
I422AlphaToARGBMatrixFilter
I010AlphaToARGBMatrixFilter
I210AlphaToARGBMatrixFilter

Bug: libyuv:872
Change-Id: Ib33b09fd7d304688c5e06c55e0a576a964665a51
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3430334
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 11:50:35 +00:00
Hao Chen
91bae707e1 Optimize functions for LASX in row_lasx.cc.
1. Optimize 18 functions in source/row_lasx.cc file.
2. Make small modifications to LSX.
3. Remove some unnecessary content.

Bug: libyuv:912
Change-Id: Ifd1d85366efb9cdb3b99491e30fa450ff1848661
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3507640
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-09 08:52:54 +00:00
Frank Barchard
42d76a342f RAWToJNV21 function with 2 step conversion
RAWToJ420 + J420ToNV21 on row level

Pixel 6
RAWToJNV21_Opt (320 ms)

Skylake Xeon
RAWToJNV21_Opt (302 ms)

Bug: b/220171611
Change-Id: I39dcce9cf56c576b95666bb4fb1baccf9fbc7f7a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3495876
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-01 19:33:49 +00:00
Hao Chen
2dd3ea6f39 Fix Bugs on mips platform V2.
This patch adds some deleted control macros so that these MSA
optimization functions can be called normally on mips platform.
There are also some modifications to adapt to the clang compiler.

Bug: libyuv:918
Change-Id: I6ffadc6582682b5eaeae2e0f4033d66d370b48b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3494667
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-03-01 13:16:31 +00:00
Frank Barchard
e77531f6f1 Fix RotatePlane by 90 on Neon when source width is not a multiple of 8
Bug: b/220888716, b/218875554, b/220205245
Change-Id: I17e118ac9b9a7013386a5f0ad27a2dd249474ae5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3483576
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-02-23 19:16:53 +00:00
Hao Chen
3b8c86d23a Fix bugs on mips platform.
This patch fixes compilation errors caused by the removal of kUVBias
and two failed test cases of LibYUVConvertTest.RGB565ToI420_Opt and
LibYUVConvertTest.ARGB1555ToI420_Opt.

Bug: libyuv:918
Change-Id: I1a66bcd7ef616aacbeca5b4015013015ccdf0f18
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3477416
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-02-22 18:24:02 +00:00
Justin Green
b4ddbaf549 Add support for MM21.
Add support for MM21 to NV12 and I420 conversion, and add SIMD
optimizations for arm, aarch64, SSE2, and SSSE3 machines.

Bug: libyuv:915, b/215425056
Change-Id: Iecb0c33287f35766a6169d4adf3b7397f1ba8b5d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3433269
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Justin Green <greenjustin@google.com>
2022-02-03 17:01:49 +00:00
Frank Barchard
804980bbab DetilePlane and unittest for NEON
Bug: libyuv:915, b/215425056
Change-Id: Iccab1ed3f6d385f02895d44faa94d198ad79d693
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3424820
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-31 20:05:55 +00:00
Frank Barchard
2c6bfc02d5 Remove MMI support
Bug: libyuv:916
Change-Id: I345b7e271ceb4b32fe91e292915e66be40812810
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3415817
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-26 08:41:33 +00:00
Hao Chen
2f87e9a713 Add optimization functions in scale_lsx.cc file.
Optimize 20 functions in source/scale_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913

Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
f8e2da48ae Add optimization functions in rotate_lsx.cc file.
Optimize two functions in source/rotate_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913
Change-Id: Idf670a1bc078f6284a499a292e0cb795f5b603b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
dfe046d272 Add optimization functions in row_lsx.cc file.
Optimize 44 functions in source/row_lsx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:913
Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
de8ae8c679 Add optimization functions in row_lasx.cc file.
Optimize 32 functions in source/row_lasx.cc file.
All test cases passed on loongarch platform.

Bug: libyuv:912
Signed-off-by: Hao Chen <chenhao@loongson.cn>
Change-Id: I7d3f649f753f72ca9bd052d5e0562dbc6f6ccfed
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351466
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-21 01:34:38 +00:00
Hao Chen
51de1e16f2 Add supports for loongarch LSX and LASX.
1. Add supports for LSX and LASX.
2. Three optimization functions are added in loongarch/row_lasx.cc file:
   I422ToARGBRow_LASX,I422ToRGBARow_LASX,I422AlphaToARGBRow_LASX.

Bug: libyuv:912, Bug: libyuv:913
Change-Id: I043c2704f99a5215724b5c0b7f97e6bf5f7a199b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329189
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-20 19:25:38 +00:00
Frank Barchard
90ffd5cba9 I420ToARGB for AVX512
On Skylake Xeon
AVX512  I420ToARGB_Opt (2050 ms)
AVX2    I420ToARGB_Opt (2533 ms)
SSSE3   I420ToARGB_Opt (3688 ms)

Bug: libyuv:911
Change-Id: I2214cc15dec24b06541895ca59d88990edbb2216
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3382100
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-14 09:17:33 +00:00
Frank Barchard
cdd62da670 VNNI detect
Bug: libyuv:911
Change-Id: Ic4e7720b4d5c20010470f06a7021d1a2426e765f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3381495
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-01-12 07:08:20 +00:00
Frank Barchard
78625492cb InterpolateRow_AVX2 use AVX2 instead of ERMS for 100%
Bug: b/210066781
Change-Id: I709e403f03bd6b9f8fe693b165b242b784076fe0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329072
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-15 03:54:18 +00:00
Frank Barchard
fdc71956bd InterpolateRow_AVX2 - extend width count to 64 bits
Bug: b/210066781
Change-Id: Ib9052d8edfce29b95ca02a6f7254d3ff35d2b64d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329070
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-10 04:11:18 +00:00
Frank Barchard
d7a2d5da87 J400ToARGB optimized for Exynos using ZIP+ST1
Bug: 204562143
Change-Id: I56c98198c02bd0dd1283f1c14837730c92832c39
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3328702
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-12-10 01:00:07 +00:00
Frank Barchard
000806f373 NV21ToYUV24 replace ST3 with ST1. ARGBToAR64 replace ST2 with ST1
On Samsung S8 Exynos M2
Was ST3 NV21ToYUV24_Opt (769 ms)
Now ST1 NV21ToYUV24_Opt (473 ms)
Was ST2 ARGBToAR64_Opt (1759 ms)
Now ST1 ARGBToAR64_Opt (987 ms)

Skylake Xeon, AVX2 version:
Was NV21ToYUV24_Opt (885 ms)
Now NV21ToYUV24_Opt (194 ms)

Bug: b/204562143, b/124413599
Change-Id: Icc9cb64d822cd11937789a4e04fbb773b3e33aa3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3290664
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-11-24 07:38:49 +00:00
Frank Barchard
a04e4f87fb Fix scale any mask parameter bug for NV12Scale
Bug: None
Change-Id: Ib4e174c086162ee709faf4b04c7d5d5847a7de3d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3267488
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-11-08 20:00:04 +00:00
Frank Barchard
fa043c7a64 Android420ToI420Rotate function to convert with rotation
- adapted from Android420ToI420, adding a rotation parameter
- SplitRotateUV added to rotate and split the UV channel of NV12 or NV21
- rename RotateUV functions to SplitRotateUV

Bug: b/203549508
Change-Id: I6774da5fb5908fdf1fc12393f0001f41bbda9851
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3251282
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-10-28 22:38:04 +00:00
Frank Barchard
b179f1847a Enable SIMD for exact RGB to Y conversions
Bug: libyuv:908, b/202888439
Change-Id: Icc5470b85d91b441ded9958ee04b4f32246646f0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3230489
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-10-19 07:54:50 +00:00
Frank Barchard
f0cfc1f1c8 ubsan friendly unaligned tests
- ubsan complains on unaligned tests when an int16 or int32 is stored unaligned in C.
Although current Intel, ARM, Mips and PPC can do unaligned load/store, its not guaranteed
and could crash a CPU that doesnt support it.
- unaligned tests use offset of 2 or 4, which ubsan accepts.
- unittest fills in random buffer with 2 bytes at a time instead of a short.
- row common functions for int16 types use 2 shorts instead of 1 int.

Bug: libyuv:908, b/203243873
Change-Id: Idf13fa901647d7b0975f1947291caa781999a9bc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3229782
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-10-18 18:03:28 +00:00
Frank Barchard
55b97cb48f BIT_EXACT for unattenuate and attenuate.
- reenable Intel SIMD unaffected by BIT_EXACT
- add bit exact version of ARGBAttenuate, which uses ARM version of formula.
- add bit exact version of ARGBUnatenuate, which mimics the AVX code.

Apply clang format to cleanup code.

Bug: libyuv:908, b/202888439
Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-10-15 19:46:02 +00:00
Frank Barchard
11cbf8f976 Add LIBYUV_BIT_EXACT macro to force C to match SIMD
- C code use ARM path, so NEON and C match
- C used on Intel platforms, disabling AVX.

Bug: libyuv:908, b/202888439
Change-Id: Ie035a150a60d3cf4ee7c849a96819d43640cf020
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3223507
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-10-14 20:37:39 +00:00
Frank Barchard
daf9778a24 Fix for failed compile with armv-7a neon gcc
Bug: libyuv:907
Change-Id: I955e83c72b57ce5ba45730030b32f337be610a21
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3216739
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-10-12 18:17:50 +00:00
Frank Barchard
b92a60320f ConvertFromI420 respect destination stride for NV12 and NV21
Bug: libyuv:904
Change-Id: Ie1fd39c693e64661eb52f75492a261384db70776
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3176483
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-09-22 19:44:06 +00:00
Frank Barchard
33a68ec779 JPeg decoder remove assert when out of data
Bug: b/186665202
Change-Id: I406cc2ef8cfa2cdf987d41c4bd85d3024aedfaab
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3166710
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-09-16 23:11:14 +00:00
Frank Barchard
ed5a9c81de change ld1 to ldr for memory references to allow GCC to use an offset
Bug: chromium:819294, libyuv:903
Change-Id: I1cd19cc5a068c421d1112c9ea6090e18fb002a4c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3152821
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-09-09 23:46:46 +00:00
Stephan Hartmann
c6ed1b8f0e GCC: force memory address without offset on aarch64
With "m" GCC generates a memory address with offset which is
not allowed with ld1 on aarch64. Change constraint to "Q" to
force address without offset.

Bug: chromium:819294, libyuv:903
Change-Id: Iaae24bc6882cdef823259040a37fdbfc31f91185
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922146
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-09-09 22:20:27 +00:00
Frank Barchard
639dd4ea76 Fix ConvertToI420 when using YUY2 or UYVY with odd crop_x.
- swap U and V when crop x is odd
- document YUY2 and UYVY formats
- apply clang-format

Bug: libyuv:902
Change-Id: I045e44c907f4a9eb625d7c024b669bb308055f32
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3039549
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2021-07-19 22:22:22 +00:00
Frank Barchard
d19f69d9df Update Android.bp to always enable NEON
Relax Cpu unittest to allow ARM emulator to run.

Bug: libyuv:863, libyuv:877, b/178283356
Change-Id: I3c751574219fdf731a3f9d4a79934a349acba446
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2950938
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2021-06-10 19:31:48 +00:00
Stephan Hartmann
6ea7647b6e GCC: replace mov .8h with mov .16b
mov Vy.8h, Vx.8h isn't a valid instruction. Clang/LLVM
automatically replace it with mov Vy.16b, Vx.16b.

Bug: chromium:819294
Change-Id: I8a0cbf2e6c4efcc6c1e38812cee949bde7e99b11
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922147
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2021-06-01 17:44:56 +00:00
Frank Barchard
5b3351bd07 Fix ARGB1555ToI420 odd width bug in C code.
Was
[ RUN      ] LibYUVConvertTest.ARGB1555ToI420_Any
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure
Expected equality of these values:
  dst_uv_c[i * kStrideUV + j]
    Which is: '\x8B' (139)
  dst_uv_opt[i * kStrideUV + j]
    Which is: '\x92' (146)
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure

[  FAILED  ] LibYUVConvertTest.ARGB1555ToI420_Any

Now
[ RUN      ] LibYUVConvertTest.ARGB1555ToI420_Any
[       OK ] LibYUVConvertTest.ARGB1555ToI420_Any (0 ms)

Bug: libyuv:894, b/155722711
Change-Id: I12dcacd0ecfff4ede5693a2554e9bb10dc8586c1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2870484
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2021-05-04 19:03:22 +00:00
Frank Barchard
49ebc996aa Make 2 step transitive tests measure 2 step time.
Add tests of all macros used by libyuv public headers

When a 1 step conversion is added, a 2 step test can compare
the old 2 step method to the 1 step.  A 1 step unittest is
also added which compares C to SIMD.  Making the 2 step
conversions measure performance of the 2 steps allows the
old 2 step performance to be compared to 1 step.

All macros used in public headers are added to an ifdef test.
Showing them in a unittest allows some diagnostics when
a test is failing.

Bug: libyuv:901
Change-Id: I7ffa6ed0cb3b506fa1b7fd4b7b1b729658c3c266
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2857916
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-30 18:14:57 +00:00
Yuan Tong
99cddd8051 Fix ARM YuvConstants value
R=fbarchard@chromium.org

Bug: libyuv:901
Change-Id: Ie2f9ac214a2a7462cc613f510b64308d3b861b74
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2856225
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2021-04-29 15:57:20 +00:00