- Unrolled to 16 pixels
- Take constants via structure, allowing different colorspace and channel order
- Use ADDHN to add 16.5 and take upper 8 bits of 16 bit values, narrowing to 8 bits
- clang-format applied, affecting mips code
On Cortex A510
Was RAWToJ400_Opt (1623 ms)
Now RAWToJ400_Opt (862 ms)
C RAWToJ400_Opt (1627 ms)
Bug: b/220171611
Change-Id: I06a9baf9650ebe2802fb6ff6dfbd524e2c06ada0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3534023
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
When used in C enum keyword can't be eliminated.
Bug: libyuv:872
Change-Id: Iacff5a8bd84ec7caa1f90889e48f81ffc10071ae
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3513317
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This patch adds some deleted control macros so that these MSA
optimization functions can be called normally on mips platform.
There are also some modifications to adapt to the clang compiler.
Bug: libyuv:918
Change-Id: I6ffadc6582682b5eaeae2e0f4033d66d370b48b9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3494667
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This patch fixes compilation errors caused by the removal of kUVBias
and two failed test cases of LibYUVConvertTest.RGB565ToI420_Opt and
LibYUVConvertTest.ARGB1555ToI420_Opt.
Bug: libyuv:918
Change-Id: I1a66bcd7ef616aacbeca5b4015013015ccdf0f18
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3477416
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Add support for MM21 to NV12 and I420 conversion, and add SIMD
optimizations for arm, aarch64, SSE2, and SSSE3 machines.
Bug: libyuv:915, b/215425056
Change-Id: Iecb0c33287f35766a6169d4adf3b7397f1ba8b5d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3433269
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Justin Green <greenjustin@google.com>
Optimize 20 functions in source/scale_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: I85bcb3b0bfd9461bb6f93202546507352cbd624a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351469
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Optimize two functions in source/rotate_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: Idf670a1bc078f6284a499a292e0cb795f5b603b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Optimize 44 functions in source/row_lsx.cc file.
All test cases passed on loongarch platform.
Bug: libyuv:913
Change-Id: Ic80a5751314adc2e9bd435f2bbd928ab017a90f9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3351467
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
1. Add supports for LSX and LASX.
2. Three optimization functions are added in loongarch/row_lasx.cc file:
I422ToARGBRow_LASX,I422ToRGBARow_LASX,I422AlphaToARGBRow_LASX.
Bug: libyuv:912, Bug: libyuv:913
Change-Id: I043c2704f99a5215724b5c0b7f97e6bf5f7a199b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3329189
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- adapted from Android420ToI420, adding a rotation parameter
- SplitRotateUV added to rotate and split the UV channel of NV12 or NV21
- rename RotateUV functions to SplitRotateUV
Bug: b/203549508
Change-Id: I6774da5fb5908fdf1fc12393f0001f41bbda9851
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3251282
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
- ubsan complains on unaligned tests when an int16 or int32 is stored unaligned in C.
Although current Intel, ARM, Mips and PPC can do unaligned load/store, its not guaranteed
and could crash a CPU that doesnt support it.
- unaligned tests use offset of 2 or 4, which ubsan accepts.
- unittest fills in random buffer with 2 bytes at a time instead of a short.
- row common functions for int16 types use 2 shorts instead of 1 int.
Bug: libyuv:908, b/203243873
Change-Id: Idf13fa901647d7b0975f1947291caa781999a9bc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3229782
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
- reenable Intel SIMD unaffected by BIT_EXACT
- add bit exact version of ARGBAttenuate, which uses ARM version of formula.
- add bit exact version of ARGBUnatenuate, which mimics the AVX code.
Apply clang format to cleanup code.
Bug: libyuv:908, b/202888439
Change-Id: Ie842b1b3956b48f4190858e61c02998caedc2897
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3224702
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
- C code use ARM path, so NEON and C match
- C used on Intel platforms, disabling AVX.
Bug: libyuv:908, b/202888439
Change-Id: Ie035a150a60d3cf4ee7c849a96819d43640cf020
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3223507
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
With "m" GCC generates a memory address with offset which is
not allowed with ld1 on aarch64. Change constraint to "Q" to
force address without offset.
Bug: chromium:819294, libyuv:903
Change-Id: Iaae24bc6882cdef823259040a37fdbfc31f91185
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2922146
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
- swap U and V when crop x is odd
- document YUY2 and UYVY formats
- apply clang-format
Bug: libyuv:902
Change-Id: I045e44c907f4a9eb625d7c024b669bb308055f32
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3039549
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Was
[ RUN ] LibYUVConvertTest.ARGB1555ToI420_Any
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure
Expected equality of these values:
dst_uv_c[i * kStrideUV + j]
Which is: '\x8B' (139)
dst_uv_opt[i * kStrideUV + j]
Which is: '\x92' (146)
third_party/libyuv/files/unit_test/convert_test.cc:1139: Failure
[ FAILED ] LibYUVConvertTest.ARGB1555ToI420_Any
Now
[ RUN ] LibYUVConvertTest.ARGB1555ToI420_Any
[ OK ] LibYUVConvertTest.ARGB1555ToI420_Any (0 ms)
Bug: libyuv:894, b/155722711
Change-Id: I12dcacd0ecfff4ede5693a2554e9bb10dc8586c1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2870484
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Add tests of all macros used by libyuv public headers
When a 1 step conversion is added, a 2 step test can compare
the old 2 step method to the 1 step. A 1 step unittest is
also added which compares C to SIMD. Making the 2 step
conversions measure performance of the 2 steps allows the
old 2 step performance to be compared to 1 step.
All macros used in public headers are added to an ifdef test.
Showing them in a unittest allows some diagnostics when
a test is failing.
Bug: libyuv:901
Change-Id: I7ffa6ed0cb3b506fa1b7fd4b7b1b729658c3c266
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2857916
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Use unsigned coefficient and signed UV value in YUVTORGB.
R=fbarchard@chromium.org
Bug: libyuv:862, libyuv:863
Change-Id: I32e58b2cee383fb98104c055beb0867a7ad05bfe
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2850016
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Use unsigned coefficients on Intel.
Make C, NEON and AVX2 match under LIBYUV_UNLIMITED_DATA.
Bug: libyuv:862, libyuv:863
Change-Id: I6c02147ea3c1875c4fc23863435aea86dcf5880a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2830180
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Refactor NEON YUVToRGB Assembly to support HBD data as input and output.
Work on YUV444 internally, remove subsampling in I444ToARGB.
libyuv_unittest --gtest_filter=*.NV??ToARGB_Opt:*UYVYToARGB_Opt:*YUY2ToARGB_Opt:*I4*ToARGB_Opt
Bug: libyuv:895, libyuv:862, libyuv:863
Change-Id: I05b56ea8ea56d9e523720b842fa6e4b122ed4115
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2810060
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
These functions merge high bit depth planar RGB pixels into packed format.
Change-Id: I506935a164b069e6b2fed8bf152cb874310c0916
Bug: libyuv:886, libyuv:889
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780468
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Planar functions pass depth instead of scale factor.
Row functions pass shift instead of depth. Add assert to C.
AVX shift instruction expects a single shift value in XMM.
Neon pass shift as input (not output).
Split Neon reimplemented as left shift on shorts by negative to achieve right shift.
Add planar unitests
Bug: libyuv:888
Change-Id: I8fe62d3d777effc5321c361cd595c58b7f93807e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2782086
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Add J420 output from RAW.
Optimize RGB24 and RAW To J420 on ARM by using NEON for the 2 step conversion.
Also fix sign-compare warning that was breaking Windows build
Bug: libyuv:887, b/183534734
Change-Id: I8c39334552dc0b28414e638708db413d6adf8d6e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2783382
Reviewed-by: Wan-Teh Chang <wtc@google.com>
MOV Vy.4s, Vx.4s is not a valid instruction form (even though LLVM allows it).
It should be MOV Vy.16b, Vx.16b (.8b for 64-bit variants)
Bug: None
Change-Id: I3c3b42288a0ebc275962fa3adad707b351d00d4c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2780155
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
These NEON functions produce 16 pixels per iteration each, thus
use the mask 15, not 7.
Change-Id: I1f3eb691a9ca4af705393b2842b18b65f6878926
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2731801
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
The following functions are added:
planar YUV:
I410ToAR30, I410ToARGB
planar YUVA:
I010AlphaToARGB, I210AlphaToARGB, I410AlphaToARGB
biplanar YUV:
P010ToARGB, P210ToARGB
P010ToAR30, P210ToAR30
biplanar functions can also handle 12 bit and 16 bit samples.
libyuv_unittest --gtest_filter=LibYUVConvertTest.*10*ToA*:LibYUVConvertTest.*P?1?ToA*
R=fbarchard@chromium.org
Bug: libyuv:751, libyuv:844
Change-Id: I2be02244dfa23335e1e7bc241fb0613990208de5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2707003
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Rename yuvconstants to .c and use round from math.h
Bug: libyuv:882, b/180472591
Change-Id: I70720bf3e0833ba00df0d721f12020fba0b07a03
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2706966
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
These are 16 bit bi-planar convert functions to scale UV plane to
Y plane's size using (bi)linear filter.
libyuv_unittest --gtest_filter=*ToP41*
R=fbarchard@chromium.org
Bug: libyuv:872
Change-Id: I3cb4fafe2b2c9eedd0d91cf4c619abb9ee107bc1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2690102
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
These are bi-planar convert functions to scale UV plane to Y plane's size using (bi)linear filter.
libyuv_unittest --gtest_filter=*ToNV24*
R=fbarchard@chromium.org
Change-Id: I3d98f833feeef00af3c903ac9ad0e41bdcbcb51f
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2682152
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
new color util to compute constants needed based on white point.
[ RUN ] LibYUVColorTest.TestFullYUVV
hist -2 -1 0 1 2
red 0 1627136 13670144 1479936 0
green 319285 3456836 9243059 3440771 317265
blue 0 1561088 14202112 1014016 0
Bug: libyuv:877, b/178283356
Change-Id: If432ebfab76b01302fdb416a153c4f26ca0832d6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2678859
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
These functions use (bi)linear filter, to scale U and V planes to the size of Y plane.
This will help enhance the quality of YUV to RGB conversion.
Also added 10bit and 12bit version:
I010ToI410
I210ToI410
I012ToI412
I212ToI412
libyuv_unittest --gtest_filter=LibYUVConvertTest.I42*ToI444*:LibYUVConvertTest.I*1*ToI41*
R=fbarchard@chromium.org
Change-Id: Ie4a711a5ba28f2ff1f44c021f7a5c149022264c5
Bug: libyuv:872
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2658097
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
subq is only available for x64
sub works for both 32 bit x86 and 64 bit x64
Fox in row_gcc.cc for 32 bit x86 running out of registers.
Fix in row_neon.cc for split function argb paramter name.
Bug: libyuv:877, b/178283356, b/178713286
Change-Id: If2b12a2d6168eab08005a2cdf2c17a470a924dd1
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2656771
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
These functions convert between planar and interleaved ARGB,
optionally fill 255 to alpha / discard alpha.
This can help handle YUV(A) with Identity matrix, which is
basically planar ARGB.
libyuv_unittest --gtest_filter=LibYUVPlanarTest.*ARGBPlane*:LibYUVPlanarTest.*XRGBPlane*
R=fbarchard@google.com
Change-Id: I522a189b434f490ba1723ce51317727e7c5eb112
Bug: libyuv:877
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2649887
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
MAKEYUVCONSTANTS macro to generate struct for YUV to RGB
Fix I444AlphaToARGB unit test for ARM by adjusting C version to match Neon implementation.
Bug: libyuv:879, libyuv:878, libyuv:877, libyuv:862, b/178283356
Change-Id: Iedb171fbf668316e7d45ab9e3481de6205ed31e2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2646472
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Duplicate I420ToARGB prototype from convert_argb.h into convert_from.h for webrtc
Apply clang format for white spacing consistency.
Bug: libyuv:838, b/151375918
Change-Id: I0f667ca5350192710dbb135e92e73e18b46135e5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2446613
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This includes:
- fixing a handrolled raw exec-based DEPS parser that was failing
to parse Str, similar to crbug.com/1106435.
- rolling chromium forward by nearly a year. (The last roll that
landed was crrev.com/c/1797295). This required a bunch of changes in
order to be able to successfully sync, run gn, and compile:
- switching the mirrors for three repositories to match chromium,
which switched in crrev.com/c/2062580.
- making libyuv write an empty gclient_args file
- adding a few build_override gn arguments
- adding nasm as a deps entry, as it's now required by libjpeg_turbo
- android:
- adding jdk, libunwindstack, and turbine
- rolling the android sdk
- rolling bazel and r8
- rolling the cipd packages managed by third_party/android_deps
- adding six and requests to .vpython for the test runner
- switching to memcpy in a few places to avoid SIGBUS errors on
arm due to unaligned reads
- linux:
- checking out instrumented libraries for msan (including adding
depot_tools to deps for the hook)
- mac:
- adding mac_xcode_version to gclient_gn_args
- win:
- limit mac_toolchain to checkout_mac
Bug: 1063768, 1097306
Change-Id: Idd86fffcdac174fd2f7899243a56af4f1ed8077e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2384320
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Mirko Bonadei <mbonadei@chromium.org>
Wrong stride used in the for block.
Change the stride of x from 8 to 16.
Change-Id: Ic0cddf8413d1bd2decf5752b7a92c16f0345f0fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2355693
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Failed case: LibYUVConvertTest.TestI400 and LibYUVPlanarTest.ARGBBlend_Unattenuated.
This patch updates the I400ToARGBRow_MSA and ARGBBlendRow_MSA functions in the row_msa.cc file.
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Change-Id: Iec1a647af79be3ca1f2724802f6698deab60eac8
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2330807
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
In commit 6cd1ff, C version has been updated.
This patch update the MMI and MSA version to mach C version.
Change-Id: Iea811e232f9c6019a80364d165f0255a37ce41b4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227755
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Intel
Was ARGBSubtract_Opt (1760 ms)
Now ARGBSubtract_Opt (1546 ms)
ARM
Was ARGBAdd_Opt (1747 ms)
Now ARGBAdd_Opt (1260 ms)
Bug: None
Change-Id: I52436f6390b6b7313f2a8820833bb4f60ae958be
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2299639
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
In commit 0b8bb6, C version has been updated.
This patch update the MMI and MSA version to mach C version.
Change-Id: Ib28da3629a8465990c8e2185278a95af8c27a31d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2227754
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
M420 is a row biplanar variation of NV12 supported on Microsoft webcams.
The code was hardcoded to bt.601 and should be jpeg, but the format is
very old and rare. Is a variation on NV12, so if someone needs it, it
can be re-implemented easily.
Bug: libyuv:858
Change-Id: I246167dba3c190cc76af741b8e91e58e68fde28f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2212608
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Neon move prfm after loads for all functions. Example performance improvement
Was
I444ToARGB_Opt (3275 ms)
I444ToNV12_Opt (1509 ms)
Now
I444ToARGB_Opt (2751 ms)
I444ToNV12_Opt (1367 ms)
Bug: libyuv:447
Change-Id: I78bf797b3600084c1eceb0be44cdbc9a575de803
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2189559
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
This patch is a complement for commit bed9292f2cbba2f8f9ff0f1635a8aa17a311f2f9.
1. Supplement inspection for macro HAS_***TOUV*ROW_MMI/MSA.
2. Reduce calls to function TestCpuFlag().
3. Fix a mistake in source/convert.cc: line 1105.
Change-Id: I5e7f9fe367fa0f6d1db6f7644c5b48d4ad85fedb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2169342
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Some processors support both MSA and MMI.
when they are enabled together, MSA will be preferd.
This patch move MSA initialization after MMI, so that
MSA can overide MMI and be setted to effective.
Change-Id: I8a52cce83ee4ec9727d47c99b287c9580329b149
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2155944
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
1. Switch to 8 bit precision.
2. Fix an error in the implementation of MMI and MSA.
About the error:
MMI and MSA implementation for RGBtoY and RGBToYJ used different
precision according to the C implementation( The C version has been
unified in commit fce0fed542001577e6b10f4cf859e0fa1774974e). This
patch unifies the precision to 8 bit for RGBToYJ in MMI and MSA.
Change-Id: Ic6a6e424d27a2f049b0c954f03174192d2beb091
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2155608
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This unittest help to test MipsCpuCaps.
Change-Id: I9e0ceeed0e5243446eaafa27e8de4c5f8163b09e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2133314
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
1. Refactored function MipsCpuCaps.
2. allow msa and mmi can be enabled together.
Signed-off-by: Jiaxun Yang <jiaxun.yang@flygoat.com>
Signed-off-by: Shiyou Yin <yinshiyou-hf@loongson.cn>
Change-Id: I7330d0551a6a167e4c76d37e4defcc20783f5815
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2131145
Reviewed-by: Hsiu Wang <hsiu@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Call a row function for each row, based on ARGBToI400 code.
But implement row functions as 2 step conversion. Adds the
row functions:
RAWToYJ, RGBToYJ, SSSE3 and AVX2 versions, and Any versions.
The smaller row buffer is more cache friendly on large images.
The max cache size can be configured, and is currently:
// Maximum temporary width for wrappers to process at a time, in pixels.
And the row buffer is
SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4]);
So 8192 bytes are used for the row buffer, leaving the rest for source
and destination buffers.
blaze-bin/third_party/libyuv/libyuv_test '--gunit_filter=*R*To?400_Opt' --libyuv_width=3600 --libyuv_height=2500 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1 | sortms
Was
RAWToJ400_Opt (3996 ms)
ARGBToI400_Opt (3964 ms)
RGB24ToJ400_Opt (3960 ms)
ARGBToJ400_Opt (3909 ms)
RGBAToJ400_Opt (3885 ms)
Now
ARGBToJ400_Opt (4091 ms)
ARGBToI400_Opt (3936 ms)
RGBAToJ400_Opt (3428 ms)
RGB24ToJ400_Opt (3324 ms)
RAWToJ400_Opt (3309 ms)
Bug: libyuv:854, b/147753855
Change-Id: Ieb65fbda94e812c737f4c3c74107354b73c4bcd2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/2016203
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This adds some missing prototypes from the BT.2020 CL as well as expands
the H444 and J444 results.
BUG=960620, libyuv:845, b/129864744
Change-Id: I8ea3959379f1bb2edb857d4eb90fb9a1f6aa4e03
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1899093
Reviewed-by: Dale Curtis <dalecurtis@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This pulls in the changes that Firefox made to add BT.2020 support as well
as expands them to the existing 10-bit support. So we now have the following
input formats: U420, U422, U444, U010.
BUG=960620, libyuv:845
Change-Id: If0c47853a465d0ed660f849db08e71489fe1b9c2
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1884468
Commit-Queue: Dale Curtis <dalecurtis@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Recent versions of Clang started warning when the loop doesn't get
vectorized, such as when compiling with -Oz (see bug).
To fix the build, remove the pragma and let the compiler decide on its
own when to vectorize.
Bug: chromium:1015665
Change-Id: I40a610c9e0d94cfd577a6cd2b01e6fdaa08bef7d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1872580
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Hans Wennborg <hans@chromium.org>
Replace ARM64 only row function with high level function
that implements SSSE3, 32 bit Neon and C.
Compared to 2 step RAWToARGB + ARGBToRGBA on row level:
3.1x faster on ARM
6.2% faster on Intel
BUG=b/140748379
Change-Id: Ia8636d9e4fcdbe10b8c2e81610a54728e29845cd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1860914
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Neon and GCC Intel optimized, but win32 and mips not optimized.
BUG=libyuv:842, b/141482243
Change-Id: Ia56fa85c8cc1db51f374bd0c89b56d21ec94afa7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1825642
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Based on ARGBShuffle but with count adjusted and new shuffle mask
BUG=libyuv:809
Change-Id: Idd936ee6bedcf285607a68c2fc54d876b4becc01
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1711882
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Apply clang-format to fix jpeg if() for lint fix.
Change comments about 4th pixel for open source compliance.
Rename UVToVU to SwapUV for consistency with MergeUV.
BUG=b/135532289, b/136515133
Change-Id: I9ce377c57b1d4d8f8b373c4cb44cd3f836300f79
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1685936
Reviewed-by: Chong Zhang <chz@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Includes a rounding change for neon.
BUG=b/135532289
Change-Id: I36ffb57b55db6c64804ad169def865be1ac6d66e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1684439
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
Gaussian blur low levels ported to 32 bit neon.
But they are not hooked up to anything but a unittest.
Bug:b/248041731, b/132108021, b/129908793
Change-Id: Iccebb8ffd6b719810aa11dd770a525227da4c357
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1611206
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Chong Zhang <chz@google.com>
Alternatives to RGB24 and AYUV for working with GPU.
BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*NV21To???24* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
R=rrwinterton@gmail.com
Change-Id: I5559c63f4bd4c847492fcb1571f7b03c58146689
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1501735
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
I422ToRGB24 is implemented as a C wrapper for Intel, calling
I422ToARGB and ARGBToRGB24. The ARGBToRGB24 for AVX2 requires 32
pixels.
This CL increases the width alignment required to use I422ToRGB24_AVX2
TBR=rrwinterton0gmail.com
Bug: libyuv:822, b:118386049
Change-Id: I4454f4eece33fbd5f593655f577c9ef5c00d1f63
Tested: locally tested with app that crashed using this function.
Reviewed-on: https://chromium-review.googlesource.com/c/1299931
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Add jpeg to NV21 conversions, unittests and conversions
for I444, I422, I420 and I420 to NV21 needed for internals.
Bug: libyuv:820
Change-Id: Idf0f15f91307e80a82cd23943f6eed5508f13fe2
Tested: out/Release/libyuv_unittest --sandbox_unittests --gtest_filter=*MJ*
Reviewed-on: https://chromium-review.googlesource.com/c/1297710
Reviewed-by: Johann Koenig <johannkoenig@google.com>
RAW is a big endian style RGB buffer with R first in memory, then G and B.
Convert NV21 and NV12 to RAW format.
Performance on SkylakeX for 720p with AVX2
I420ToRAW_Opt (388 ms)
H420ToRAW_Opt (371 ms)
NV12ToRAW_Opt (341 ms)
NV21ToRAW_Opt (339 ms)
SSSE3
I420ToRAW_Opt (507 ms)
H420ToRAW_Opt (481 ms)
NV12ToRAW_Opt (498 ms)
NV21ToRAW_Opt (493 ms)
C
I420ToRAW_Opt (2287 ms)
H420ToRAW_Opt (2246 ms)
NV12ToRAW_Opt (2191 ms)
NV21ToRAW_Opt (2204 ms)
Performance on Pixel 2 for 720p
out/Release/bin/run_libyuv_unittest -v -t 7200 --gtest_filter=*NV??ToR*Opt --libyuv_repeat=1000 --libyuv_width=1280 --libyuv_height=720
LibYUVConvertTest.NV12ToRGB24_Opt (1739 ms)
LibYUVConvertTest.NV21ToRGB24_Opt (1734 ms)
LibYUVConvertTest.NV12ToRAW_Opt (1719 ms)
LibYUVConvertTest.NV21ToRAW_Opt (1691 ms)
LibYUVConvertTest.NV12ToRGB565_Opt (2152 ms)
Bug: libyuv:778, b:117522975
Test: add new NV21ToRAW and NV12ToRAW tests
Change-Id: Ieabb68a2c6d8c26743e609c5696c81bb14fb253f
Reviewed-on: https://chromium-review.googlesource.com/c/1272615
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
The original src_u calculation of FOURCC_I420 shifted half width if
crop_y is odd.
This CL fixs the problem and also add a test case for it.
Bug: b:115278653
Test: pass libyuv_unittest
Change-Id: Ia9732d22e64e13de26df47726ba44ad1c5a06484
Reviewed-on: https://chromium-review.googlesource.com/c/1258743
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
When loading or storing the data, the unaligned address will greatly degrade
the optimization performance, so non-aligned access instructions are required
on the loongson platform.
Also delete the optimization function:ScaleARGBFilterCols_MMI,
because it degraded the performance.
BUG=libyuv:804
R=fbarchard@chromium.org
Change-Id: If4c15886a21cdcbac7ae8b336292e4549acf1e47
Reviewed-on: https://chromium-review.googlesource.com/1164627
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.
Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).
In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.
BUG=libyuv:804
Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
the built in __msa_ld_b() expects a void * without const.
Cast pointers to void * to avoid build warning.
TBR=johannkoenig@google.com
Bug: libyuv:805
Change-Id: Iabc4820ecf4a3a7dcb0063e67ce276ae2a4f0501
Tested: gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest
Reviewed-on: https://chromium-review.googlesource.com/1125400
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
H420/H422 are bt.720 variants
TBR=braveyao@chromium.org
BUG=libyuv:799
TESTED=try bots tested build on all platforms
Change-Id: I007d8981d91ca0748c59403759109bbcd88f286c
Reviewed-on: https://chromium-review.googlesource.com/1115719
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Consistently use one style of line endings for the repository
Change-Id: Idd70e3d7f3a7a6641b268a81e51eebf9c705b67d
Reviewed-on: https://chromium-review.googlesource.com/1107877
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Avoid warnings regarding loss of qualifiers:
warning: cast from type ‘const uint8_t* {aka const unsigned char*}’
to type ‘v16i8* {aka __vector(16) signed char*}’ casts away
qualifiers
BUG=libyuv:793
Change-Id: Ie0d215bc07b49285b5d06ee91ccc2c9a7979799e
Reviewed-on: https://chromium-review.googlesource.com/1107879
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This reverts commit a8aa921c4614f9d6a0e8f3459648ca1ae75cdbe6.
Reason for revert: breaks a webrtc unittest on Windows.
https://bugs.chromium.org/p/webrtc/issues/detail?id=9263&can=2&start=0&num=100&q=&colspec=ID%20Pri%20M%20ReleaseBlock%20Component%20Status%20Owner%20Summary&groupby=&sort=
Original change's description:
> Allow negative height when ConvertToI420/ARGB is called with NV12/NV21
>
> ConvertToI420 and ConvertToARGB support the use of a negative height
> parameter to flip the image vertically. When converting from NV12 or
> NV21 this parameter was misinterpreted, resulting in invalid output.
> This CL introduces the use of abs_src_height to correctly calculate
> the location of the source UV plane.
>
> The sign of crop_height is not used, to reduce confusion ConvertToI420
> and ConvertToARGB no longer accept negative crop height.
>
> Unit tests for Android420ToI420 are updated to fix miscalculation of
> src_stride_uv, fix incorrect pixel strides, and to test inversion.
> New unit tests are included to test inversion for ConvertToARGB,
> ConvertToI420, Android420ToARGB, and Android420ToABGR.
> For consistency the test NV12Crop is renamed ConvertToI420_NV12_Crop.
>
> Bug: libyuv:446
> Test: out/Release/libyuv_unittest --gtest_filter=*.ConvertTo*:*.Android420To*
> Change-Id: Idc98e62671cb30272cfa7e24fafbc8b73712f7c6
> Reviewed-on: https://chromium-review.googlesource.com/994074
> Commit-Queue: Frank Barchard <fbarchard@chromium.org>
> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
TBR=fbarchard@chromium.org,robert@bares.me
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: libyuv:446, chromium:9263, libyuv:801
Change-Id: I7c55b3fcb477f9754c249b9c2c54b24da2c29283
Reviewed-on: https://chromium-review.googlesource.com/1081267
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Mask was set to 32, but should have been 31.
BUG=libyuv:798
TESTED=try bots tested
Change-Id: I6120928873a4a2f1efef907d8e8296ca8c20bb03
Reviewed-on: https://chromium-review.googlesource.com/1054830
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
ConvertToI420 and ConvertToARGB support the use of a negative height
parameter to flip the image vertically. When converting from NV12 or
NV21 this parameter was misinterpreted, resulting in invalid output.
This CL introduces the use of abs_src_height to correctly calculate
the location of the source UV plane.
The sign of crop_height is not used, to reduce confusion ConvertToI420
and ConvertToARGB no longer accept negative crop height.
Unit tests for Android420ToI420 are updated to fix miscalculation of
src_stride_uv, fix incorrect pixel strides, and to test inversion.
New unit tests are included to test inversion for ConvertToARGB,
ConvertToI420, Android420ToARGB, and Android420ToABGR.
For consistency the test NV12Crop is renamed ConvertToI420_NV12_Crop.
Bug: libyuv:446
Test: out/Release/libyuv_unittest --gtest_filter=*.ConvertTo*:*.Android420To*
Change-Id: Idc98e62671cb30272cfa7e24fafbc8b73712f7c6
Reviewed-on: https://chromium-review.googlesource.com/994074
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
When casting a const value, ensure the cast is const as well.
BUG=webm:1509
Change-Id: I5b597fdcc148d111e9824bc7cf918fc5f24e970f
Reviewed-on: https://chromium-review.googlesource.com/996553
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
iOS simulator has the option to build with xcode instead of clang.
GN use_xcode_clang=true enables the xcode build.
As of version Xcode 9.2, the clang version used does not support
AVX512. The version reported is version 9, but for normal clang,
version 7 is sufficient to AVX512.
When a version of XCode does support AVX512, the version check can
be updated to allow AVX512 for newer versions of XCode.
with XCode 9.2 the following macro is set.
__APPLE_CC__ 6000
Bug: libyuv:789
Test: gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"x86\" use_xcode_clang=true"
Change-Id: I5a9a0b4a2760c7d09a4bcb464b3668979113b07e
Reviewed-on: https://chromium-review.googlesource.com/991595
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Bug: libyuv:789
Test: builds locally on linux with clang
Change-Id: I3000494d4b0b18f59d7852bc1bc0c9e422d2d63a
Reviewed-on: https://chromium-review.googlesource.com/987331
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Scalar multiply expects a 'd' register. The "w" (float) uses 's' for float
and wont work with the multiply in 32 bit (it does in 64 bit).
A vector 2 of float passes as 'd' register.
A vector 4 of float passes as 'q' register.
This change copies the float into the first entry of a vector 2
and passes that. The optimizer removes the extra copy, allowing
the single float to use referenced as
Test: LibYUVPlanarTest.TestByteToFloat
Bug: libyuv:786
Change-Id: I8773c5bae043c7b84e1d1db7fdea6731aa0b1323
Reviewed-on: https://chromium-review.googlesource.com/973984
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
row.h adds CLANG_HAS_AVX512
function ifdefs in row.h for avx512
source code ifdefed function by function for
avx512 and avx2.
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: If32b51459685d0d5785c5c1e94c8f668f8e74b55
Reviewed-on: https://chromium-review.googlesource.com/982402
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Adds a method that forces the CPU flags. Useful when using libyuv inside
a sandboxed process which may not have access to the file system.
Bug: libyuv:787
Change-Id: I01f71e39a7301085d9de388eba930b4cac0fd7be
Reviewed-on: https://chromium-review.googlesource.com/972338
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Move getenv to unittest.cc to allow libyuv to be
run in sandbox for x86, x64 and aarch64
Bug: libyuv:767
Test: unittests still run and respect environment variables
Change-Id: I84cb1717977828776142b51c029774b3e6b142a3
Reviewed-on: https://chromium-review.googlesource.com/969645
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Use VMBI instructions but on AVX2 registers to avoid clockrate change.
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: Id4f8ad1e0e142a380c8a46c5eab90ce145a10edd
Reviewed-on: https://chromium-review.googlesource.com/956609
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Change unittest flags to decimal so they can be used for --libyuv_cpu_info=
Add environment variables to disable AVX 512 bits.
Bug: libyuv:784
Test: LibYUVBaseTest.TestCpuHas
Change-Id: Iea6704368fbe9f6d3395933da7993fb2a3453225
Reviewed-on: https://chromium-review.googlesource.com/957704
Reviewed-by: richard winterton <rrwinterton@gmail.com>
AVX2 port of SSSE3 conversion to output 24 bit RGB
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I14f7815522d1b790ecd2bb39d9a3441e803b694a
Reviewed-on: https://chromium-review.googlesource.com/953303
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Use 2 step conversion for NV21ToRGB24 to leverage AVX2
low levels instead of C.
Was C
NV21ToRGB24_Opt (882 ms)
Now SSSE3
NV21ToRGB24_Opt (218 ms)
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I58faf766bbec4cc595aab2e217f6c874dd4b4363
Reviewed-on: https://chromium-review.googlesource.com/951629
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Each byte is converted to float (0.0 to 255.0) and then multiplied
by a scale parameter.
Bug: None
Test: arm 64 build passes.
Change-Id: I04736798540b8d985f60abdf0388e24a209d075b
Reviewed-on: https://chromium-review.googlesource.com/930226
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ian Field <ianfield@google.com>
ARGB rotation using scaling code. Previously it had forward
declarations of the low level row functions used. This CL
uses the header and hooks up Any and MSA versions of the code.
Bug: libyuv:779
Test: perf record out/Release/libyuv_unittest --gtest_filter=*ARGBRotate90_Opt --libyuv_width=640 --libyuv_height=359 --libyuv_repeat=999
Change-Id: Ifacd58b26bb17a236181a404fad589fd2543b911
Reviewed-on: https://chromium-review.googlesource.com/927530
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
AR30ToARGB will vectorize if the output is masked
together as an int instead of 4 byte stores.
Performance is 2x faster
Was AR30ToARGB_Opt (1585 ms)
Now AR30ToARGB_Opt (746 ms)
Bug: libyuv:777
Test:LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: Idd47ae599d5d125207bb53e618d6d7e784d4a37c
Reviewed-on: https://chromium-review.googlesource.com/923169
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
BGR variation of 10 bit conversion using swapped U and V
and mirrored matrix to produce AB30 format instead of AR30.
Bug: libyuv:777
Test: LibYUVConvertTest.H010ToAB30_Opt
Change-Id: I96d115a5d1e12138f40cb548871e03aa3ab210eb
Reviewed-on: https://chromium-review.googlesource.com/922284
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
This reverts commit 7b9ff4a0355c778f2cf03bdb15029d60a1259061.
Reason for revert: ios build bots are red
Original change's description:
> tidy applied with readability-*
>
> TBR=braveyao@chromium.org
> Bug: libyuv:750
> Test: builds and runs and passes more tidy tests
> Change-Id: I316822f7d13b370b88b92a693912e880b21f92c8
> Reviewed-on: https://chromium-review.googlesource.com/907371
> Reviewed-by: Frank Barchard <fbarchard@chromium.org>
TBR=fbarchard@chromium.org,braveyao@chromium.org
# Not skipping CQ checks because original CL landed > 1 day ago.
Bug: libyuv:750
Change-Id: I4a73ffee2b71664c6cb93f38f2b5d70ebd76953e
Reviewed-on: https://chromium-review.googlesource.com/912175
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281
Reviewed-on: https://chromium-review.googlesource.com/902170
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu (%1),%%xmm2
vmovdqu 0x00(%1,%2,1),%%xmm3
vpermq $0xd8,%%ymm2,%%ymm2
vpermq $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2
..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd (%1),%%ymm2
vpmovzxbd 0x00(%1,%2,1),%%ymm3
vpslld $0x10,%%ymm3,%%ymm3
vpor %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.
Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt
Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
vpshufb is used to reverse R and B channels;
Code is otherwise the same as ARGBToAR30.
Bug: libyuv:751
Test: ABGRToAR30 unittest
Change-Id: I30e02925f5c729e4496c5963ba4ba4af16633b3b
Reviewed-on: https://chromium-review.googlesource.com/891807
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
ABGR is the more common format on Android.
This CL converts 10 bit AR30, to standard 8 bit ABGR.
Unoptimized but allows better testing and feature completeness.
Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToABGR_Opt
Change-Id: I0c7e7273158be215129e0a1d355587ae15942299
Reviewed-on: https://chromium-review.googlesource.com/891694
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Removes macros that were part of standard basic_types
header but not used by libyuv itself.
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: try bots still build
Change-Id: I8de6fad5a9277df0a50959881392ba212b1b5972
Reviewed-on: https://chromium-review.googlesource.com/879591
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.
[ RUN ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[ OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[ OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[ OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
ABGR output is implemented using the same source code as ARGB, by swapping
the u and v and supplying the mirrored conversion matrix.
ABGR format (RGBA in memory) is popular on Android.
Bug: libyuv:751
Test: H010ToABGR, I010ToABGR and I010ToARGB unittests
Change-Id: I0b5103628c58dcb22a6442c03814d4d5972e0339
Reviewed-on: https://chromium-review.googlesource.com/852985
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Previously H010ToAR30 was done in a 3 step conversion:
H010ToH420, H420ToARGB, ARGBToAR30.
This CL merges the first 2 steps into H010ToARGB, to
improve performance.
Caveat - only 10 bit YUV is supported at this time.
Previously the low level code supported different numbers
of bits - 9, 10, 12 or 16.
Was 3 step conversion:
LibYUVConvertTest.H010ToAR30_Any (1263 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
LibYUVConvertTest.H010ToAR30_Invert (913 ms)
LibYUVConvertTest.H010ToAR30_Opt (901 ms)
Now 2 step conversion:
LibYUVConvertTest.H010ToAR30_Any (853 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
LibYUVConvertTest.H010ToAR30_Invert (781 ms)
LibYUVConvertTest.H010ToAR30_Opt (755 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
Reviewed-on: https://chromium-review.googlesource.com/853288
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.
Bug: libyuv:751
Test: I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Convert planar 8 bit formats to planar 16 bit formats.
Includes msan fix for Convert8To16Row_Opt unittest.
I420 is YUV bt.601 8 bits per channel with 420 subsampling.
I010 is YUV bt.601 10 bits per channel with 420 subsampling.
I is color space - bt.601. The function does no color space
conversion so H420ToI010 is aliased to this function as well.
0 = 420 subsampling. The chroma channels are half width / height.
10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.
For SSSE3 version:
out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
[ RUN ] LibYUVConvertTest.I420ToI010_Opt
[ OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToI010_Opt
Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
Reviewed-on: https://chromium-review.googlesource.com/846421
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.
Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.
Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned
Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Initial AR30ToARGB function to allow converion
from AR30 to other formats if necessary and/or
for testing.
Not optimized at this point.
Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: I38ef192315240f3caa7aee0218b38d5e88a2849f
Reviewed-on: https://chromium-review.googlesource.com/833025
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
For more complete support of AR30 format, add I420ToAR30 allowing
the new RGB 10 bit format to be used from standard 8 bit I420 format.
Bug: libyuv:751
Test: I420ToAR30 unittest added
Change-Id: Ia8b0857447408bd6adab485158ce5f38d6dc2faa
Reviewed-on: https://chromium-review.googlesource.com/823243
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time. R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B
Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
AR30 is optimized with 3 techniques
1. vpmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time. R and B, and A and G.
3. vpshufb is used to shift and mask 2 channels of R and B
Red Blue
With the 8 bit value in the upper bits, vpmulhuw by (1024+4) will produce a 10
bit value in the low 10 bits of each 16 bit value. This is whats wanted for the
blue channel. The red needs to be shifted 4 left, so multiply by (1024+4)*16 for
red.
Alpha Green
Alpha and Green are already in the high bits so vpand can zero out the other
bits, keeping just 2 upper bits of alpha and 8 bit green. The same multiplier
could be used for Green - (1024+4) putting the 10 bit green in the lsb. Alpha
would be a simple multiplier to shift it into position. It wants a gap of 10
above the green. Green is 10 bits, so there are 6 bits in the low short. 4
more are needed, so a multiplier of 4 gets the 2 bits into the upper 16 bits,
and then a shift of 4 is a multiply of 16, so (4*16) = 64. Then shift the
result left 10 to position the A and G channels.
Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: Ie4f20dce18203bae7b75acb1fd5232db8a8a4f11
Reviewed-on: https://chromium-review.googlesource.com/820046
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reduce allocations of row buffers to 1 alloc/free.
Do 2 rows at a time to avoid converting U and V planes twice.
Bug: libyuv:715
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I2f3a03b4875df5e3b969112a78a1a0b28399fa2f
Reviewed-on: https://chromium-review.googlesource.com/816021
Reviewed-by: Cheng Wang <wangcheng@google.com>
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used. This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.
Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
This allows the linker to move the variables from the .data section to
the .rodata section.
Bug: libyuv:254
Test: out/Release/libyuv_unittest --gtest_filter=* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I6998570f1af4337d7b80313d9e18e36aa20d6ec0
Reviewed-on: https://chromium-review.googlesource.com/777033
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2
Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.
Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
The unittest compares the results of 8 and 16 bit scaling
and expects them to be the same. This CL makes the 16 bit
scaling filter logic match.
Bug: libyuv:749
Test: LibYUVScaleTest.DISABLED_ScaleDownBy4_Linear_16
Change-Id: Ifb3ca4d770ef38f9f16abe9b9aeb843b779bf371
Reviewed-on: https://chromium-review.googlesource.com/772370
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@google.com>
When converting from lsb 10 bit formats to msb, the values
need to be shifted to the top 10 bits. Using a multiply
allows the different numbers of bits to be copied:
// 128 = 9 bits
// 64 = 10 bits
// 16 = 12 bits
// 1 = 16 bits
Bug: libyuv:751
Test: LibYUVPlanarTest.MultiplyRow_16_Opt
Change-Id: I9cf226053a164baa14155215cb175065b1c4f169
Reviewed-on: https://chromium-review.googlesource.com/762951
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
clang does not require -msse2 or -msse for inline, except
the "x" parameter. So change this to "m" for 32 bit. 64 bit
requires sse2 so use "x" for 64 bit.
gcc requires -msse for xmm registers in clobber list.
Reduce compiler requirement from -msse2 to -msse for enabling
assembly.
Bug: libyuv:754, libyuv:757
Test: CC=clang CXX=clang++ CFLAGS="-m32" CXXFLAGS="-m32 -mno-sse -O2" make -f linux.mk
Change-Id: I86df72cfee80b7d349561c1fd7c97ad360767255
Reviewed-on: https://chromium-review.googlesource.com/759303
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
cleanup to remove ifdefs around functions affected by
a clang bug.
gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest
Bug: libyuv:634
Test: build for mips with clang
Change-Id: I278b368dbb2fe89082240e280267d0a27a214c78
Reviewed-on: https://chromium-review.googlesource.com/757980
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
Key instruction sets added for each microarchitecture:
AVX512BW, AVX512VL, AVX512DQ - skylake server or later
AVX512_VBMI, AVX512_IFMA - cannon lake or later
AVX512_BITALG, AVX512_VBMI2, AVX512_VPOPCNTDQ, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ - ice lake or later
Bug: libyuv:752
Test: ~/intelsde/sde -icl -- out/Release/libyuv_unittest --gtest_filter=*Cpu*
Change-Id: I9ee28904c90009d66721b9f805a440c5fc2da122
Reviewed-on: https://chromium-review.googlesource.com/755617
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
vmovd is an AVX instruction. This will crash on an older CPU with
only SSSE3 but not AVX. Use movd instead.
Bug: libyuv:753
Test: ~/intelsde/sde -mrm -- out/Release/libyuv_unittest --gtest_filter=LibYUVCompareTest.BenchmarkHammingDistance_Opt
Change-Id: I1fb0026039d5f83d124f5d03fed7dc0d2d723e49
Reviewed-on: https://chromium-review.googlesource.com/756200
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>