RAW is a big endian style RGB buffer with R first in memory, then G and B.
Convert NV21 and NV12 to RAW format.
Performance on SkylakeX for 720p with AVX2
I420ToRAW_Opt (388 ms)
H420ToRAW_Opt (371 ms)
NV12ToRAW_Opt (341 ms)
NV21ToRAW_Opt (339 ms)
SSSE3
I420ToRAW_Opt (507 ms)
H420ToRAW_Opt (481 ms)
NV12ToRAW_Opt (498 ms)
NV21ToRAW_Opt (493 ms)
C
I420ToRAW_Opt (2287 ms)
H420ToRAW_Opt (2246 ms)
NV12ToRAW_Opt (2191 ms)
NV21ToRAW_Opt (2204 ms)
Performance on Pixel 2 for 720p
out/Release/bin/run_libyuv_unittest -v -t 7200 --gtest_filter=*NV??ToR*Opt --libyuv_repeat=1000 --libyuv_width=1280 --libyuv_height=720
LibYUVConvertTest.NV12ToRGB24_Opt (1739 ms)
LibYUVConvertTest.NV21ToRGB24_Opt (1734 ms)
LibYUVConvertTest.NV12ToRAW_Opt (1719 ms)
LibYUVConvertTest.NV21ToRAW_Opt (1691 ms)
LibYUVConvertTest.NV12ToRGB565_Opt (2152 ms)
Bug: libyuv:778, b:117522975
Test: add new NV21ToRAW and NV12ToRAW tests
Change-Id: Ieabb68a2c6d8c26743e609c5696c81bb14fb253f
Reviewed-on: https://chromium-review.googlesource.com/c/1272615
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
The original src_u calculation of FOURCC_I420 shifted half width if
crop_y is odd.
This CL fixs the problem and also add a test case for it.
Bug: b:115278653
Test: pass libyuv_unittest
Change-Id: Ia9732d22e64e13de26df47726ba44ad1c5a06484
Reviewed-on: https://chromium-review.googlesource.com/c/1258743
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Legacy types can cause build errors with code that defines
them differently. Disable them by default. Allow the types
to be enabled with #define LIBYUV_LEGACY_TYPES
BUG=libyuv:808
TESTED=libyuv try bots still build
Change-Id: I48928329393f44a377cec781e645570b14569668
Reviewed-on: https://chromium-review.googlesource.com/1129558
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
the built in __msa_ld_b() expects a void * without const.
Cast pointers to void * to avoid build warning.
TBR=johannkoenig@google.com
Bug: libyuv:805
Change-Id: Iabc4820ecf4a3a7dcb0063e67ce276ae2a4f0501
Tested: gn gen out/Release "--args=is_debug=false target_os=\"android\" target_cpu=\"mips64el\" mips_arch_variant=\"r6\" mips_use_msa=true is_component_build=true is_clang=true"
ninja -v -C out/Release libyuv_unittest
Reviewed-on: https://chromium-review.googlesource.com/1125400
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
H420/H422 are bt.720 variants
TBR=braveyao@chromium.org
BUG=libyuv:799
TESTED=try bots tested build on all platforms
Change-Id: I007d8981d91ca0748c59403759109bbcd88f286c
Reviewed-on: https://chromium-review.googlesource.com/1115719
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Mask was set to 32, but should have been 31.
BUG=libyuv:798
TESTED=try bots tested
Change-Id: I6120928873a4a2f1efef907d8e8296ca8c20bb03
Reviewed-on: https://chromium-review.googlesource.com/1054830
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
iOS simulator has the option to build with xcode instead of clang.
GN use_xcode_clang=true enables the xcode build.
As of version Xcode 9.2, the clang version used does not support
AVX512. The version reported is version 9, but for normal clang,
version 7 is sufficient to AVX512.
When a version of XCode does support AVX512, the version check can
be updated to allow AVX512 for newer versions of XCode.
with XCode 9.2 the following macro is set.
__APPLE_CC__ 6000
Bug: libyuv:789
Test: gn gen out/Release "--args=is_debug=false target_os=\"ios\" ios_enable_code_signing=false target_cpu=\"x86\" use_xcode_clang=true"
Change-Id: I5a9a0b4a2760c7d09a4bcb464b3668979113b07e
Reviewed-on: https://chromium-review.googlesource.com/991595
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Scalar multiply expects a 'd' register. The "w" (float) uses 's' for float
and wont work with the multiply in 32 bit (it does in 64 bit).
A vector 2 of float passes as 'd' register.
A vector 4 of float passes as 'q' register.
This change copies the float into the first entry of a vector 2
and passes that. The optimizer removes the extra copy, allowing
the single float to use referenced as
Test: LibYUVPlanarTest.TestByteToFloat
Bug: libyuv:786
Change-Id: I8773c5bae043c7b84e1d1db7fdea6731aa0b1323
Reviewed-on: https://chromium-review.googlesource.com/973984
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
row.h adds CLANG_HAS_AVX512
function ifdefs in row.h for avx512
source code ifdefed function by function for
avx512 and avx2.
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: If32b51459685d0d5785c5c1e94c8f668f8e74b55
Reviewed-on: https://chromium-review.googlesource.com/982402
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Move getenv to unittest.cc to allow libyuv to be
run in sandbox for x86, x64 and aarch64
Bug: libyuv:767
Test: unittests still run and respect environment variables
Change-Id: I84cb1717977828776142b51c029774b3e6b142a3
Reviewed-on: https://chromium-review.googlesource.com/969645
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Use VMBI instructions but on AVX2 registers to avoid clockrate change.
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: Id4f8ad1e0e142a380c8a46c5eab90ce145a10edd
Reviewed-on: https://chromium-review.googlesource.com/956609
Reviewed-by: richard winterton <rrwinterton@gmail.com>
ARGB rotation using scaling code. Previously it had forward
declarations of the low level row functions used. This CL
uses the header and hooks up Any and MSA versions of the code.
Bug: libyuv:779
Test: perf record out/Release/libyuv_unittest --gtest_filter=*ARGBRotate90_Opt --libyuv_width=640 --libyuv_height=359 --libyuv_repeat=999
Change-Id: Ifacd58b26bb17a236181a404fad589fd2543b911
Reviewed-on: https://chromium-review.googlesource.com/927530
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
BGR variation of 10 bit conversion using swapped U and V
and mirrored matrix to produce AB30 format instead of AR30.
Bug: libyuv:777
Test: LibYUVConvertTest.H010ToAB30_Opt
Change-Id: I96d115a5d1e12138f40cb548871e03aa3ab210eb
Reviewed-on: https://chromium-review.googlesource.com/922284
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Bug: libyuv:750
Test: builds and runs and passes more tidy tests
Change-Id: I023699a7aa61ea3f5e4a21647112691ea5739281
Reviewed-on: https://chromium-review.googlesource.com/902170
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
I422ToUYVYRow_AVX2 optimized from 7 cycles per 32 pixels to 4.6 cycles.
Instead of 2 vpermq and vpunpcklbw:
vmovdqu (%1),%%xmm2
vmovdqu 0x00(%1,%2,1),%%xmm3
vpermq $0xd8,%%ymm2,%%ymm2
vpermq $0xd8,%%ymm3,%%ymm3
vpunpcklbw %%ymm3,%%ymm2,%%ymm2
..use vpmovzxbd to expand the bytes to shorts, then vpslld and vpor
vpmovzxbd (%1),%%ymm2
vpmovzxbd 0x00(%1,%2,1),%%ymm3
vpslld $0x10,%%ymm3,%%ymm3
vpor %%ymm3,%%ymm2,%%ymm2
which reduces the port 5 bottleneck by 1 cycle.
Bug: libyuv:556
Test: out/Release/libyuv_unittest --gtest_filter=*I42?To*UY*Opt
Change-Id: I53799e53cc6b090a1a695c839094c193be3eecaf
Reviewed-on: https://chromium-review.googlesource.com/899873
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Use C99 types internally but define old types for compatibility
with older API. (r1690 and earlier)
TBR=braveyao@chromium.org
Bug: libyuv:774
Test: try bots build on all platforms
Change-Id: I06f89537da3875f74e65189897e67b69af2c2ec2
Reviewed-on: https://chromium-review.googlesource.com/882501
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.
[ RUN ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[ OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[ OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[ OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
ABGR output is implemented using the same source code as ARGB, by swapping
the u and v and supplying the mirrored conversion matrix.
ABGR format (RGBA in memory) is popular on Android.
Bug: libyuv:751
Test: H010ToABGR, I010ToABGR and I010ToARGB unittests
Change-Id: I0b5103628c58dcb22a6442c03814d4d5972e0339
Reviewed-on: https://chromium-review.googlesource.com/852985
Commit-Queue: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.
Bug: libyuv:751
Test: I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Convert planar 8 bit formats to planar 16 bit formats.
Includes msan fix for Convert8To16Row_Opt unittest.
I420 is YUV bt.601 8 bits per channel with 420 subsampling.
I010 is YUV bt.601 10 bits per channel with 420 subsampling.
I is color space - bt.601. The function does no color space
conversion so H420ToI010 is aliased to this function as well.
0 = 420 subsampling. The chroma channels are half width / height.
10 = 10 bits per channel, stored in low 10 bits of 16 bit samples.
For SSSE3 version:
out/Release/libyuv_unittest --gtest_filter=*LibYUVConvertTest.I420ToI010_Opt --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
[ RUN ] LibYUVConvertTest.I420ToI010_Opt
[ OK ] LibYUVConvertTest.I420ToI010_Opt (276 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.I420ToI010_Opt
Change-Id: I072876ee4fd74a2b74f459b628838bc808f9bdd2
Reviewed-on: https://chromium-review.googlesource.com/846421
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
pshufd requires 16 byte aligned memory or a register.
Use movd to a register to avoid a segfault if memory for float
is misaligned
Bug: libyuv:759
Test: 32 bit build of LibYUVPlanarTest.TestHalfFloatPlane_16bit_denormal
Change-Id: I6fdcc4317453af5acd4700f9d46425bb2f4a205b
Reviewed-on: https://chromium-review.googlesource.com/840459
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Initial AR30ToARGB function to allow converion
from AR30 to other formats if necessary and/or
for testing.
Not optimized at this point.
Bug: libyuv:751
Test: LibYUVConvertTest.AR30ToARGB_Opt
Change-Id: I38ef192315240f3caa7aee0218b38d5e88a2849f
Reviewed-on: https://chromium-review.googlesource.com/833025
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reduce allocations of row buffers to 1 alloc/free.
Do 2 rows at a time to avoid converting U and V planes twice.
Bug: libyuv:715
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I2f3a03b4875df5e3b969112a78a1a0b28399fa2f
Reviewed-on: https://chromium-review.googlesource.com/816021
Reviewed-by: Cheng Wang <wangcheng@google.com>
This version of the H010ToAR30 provides a 3 step conversion
Convert16To8Row_AVX2
H420ToARGB_AVX2
ARGBToAR30_AVX2
Low level function added to convert 16 bit to 8 bit using multiply
to adjust 10 bit or other bit depths and then save the upper 16 bits.
Bug: libyuv:751
Test: LibYUVPlanarTest.Convert16To8Row_Opt unittest added
Change-Id: I9cc576fda8afa1003cb961d03e0e656e0b478f03
Reviewed-on: https://chromium-review.googlesource.com/783554
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Key instruction sets added for each microarchitecture:
AVX512BW, AVX512VL, AVX512DQ - skylake server or later
AVX512_VBMI, AVX512_IFMA - cannon lake or later
AVX512_BITALG, AVX512_VBMI2, AVX512_VPOPCNTDQ, AVX512_VNNI, GFNI, VAES, VPCLMULQDQ - ice lake or later
Bug: libyuv:752
Test: ~/intelsde/sde -icl -- out/Release/libyuv_unittest --gtest_filter=*Cpu*
Change-Id: I9ee28904c90009d66721b9f805a440c5fc2da122
Reviewed-on: https://chromium-review.googlesource.com/755617
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
mingw gcc sets the macro _M_IX86 which is normally only set
by Visual C and clangcl which are Visual C style source code
style for assembly, but gcc is not Visual C compatible.
Add _MSC_VER to most ifdefs to detect that its really Visual C
or clangcl and not mingw gcc so the gcc source code will be used.
Bug: libyuv:744
Test: CXXFLAGS=-m32 CXX=~/prebuilts/gcc/linux-x86/host/x86_64-w64-mingw32-4.8/bin/x86_64-w64-mingw32-g++ make -f linux.mk
Change-Id: I3431aa486eb769b145faa8d5eb75ed639f9d6f5e
Reviewed-on: https://chromium-review.googlesource.com/722319
Reviewed-by: Cheng Wang <wangcheng@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
The low level hamming distance functions have size limitations
based on counter sizes. The higher level calls the low level
in blocks that avoid overflow and then accumulators in int64.
This test compares the results of the low levels to the high
level and against a known value (all ones) to ensure the
count is correct for any specified size.
The the size is very large, the result is expected to be
different.
Bug: libyuv:701
Test: TestHammingDistance_Opt
Change-Id: I6716af7cd09ac4d88a8afa25bc845a1b62af7c93
Reviewed-on: https://chromium-review.googlesource.com/710800
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
This reverts commit ec75df5894845b8d6b1341885a78db1de83decd8.
Reason for revert: <INSERT REASONING HERE>
Original change's description:
> ComputeHammingDistance reduce SIMD loop to 1 call when possible.
>
> 32 bit x86 has high overhead due to -fpic. So this reduces the
> number of calls by 1.
>
> TBR=kjellander@chromium.org
> Bug: libyuv:701
> Test: BenchmarkHammingDistance
> Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
> Reviewed-on: https://chromium-review.googlesource.com/701755
> Reviewed-by: Frank Barchard <fbarchard@google.com>
> Reviewed-by: Cheng Wang <wangcheng@google.com>
TBR=rrwinterton@gmail.com,fbarchard@google.com,wangcheng@google.com
Change-Id: Ia61e8558a8f083c14be5f51e0e141550b6f2b5c1
No-Presubmit: true
No-Tree-Checks: true
No-Try: true
Bug: libyuv:701
Reviewed-on: https://chromium-review.googlesource.com/707823
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>
32 bit x86 has high overhead due to -fpic. So this reduces the
number of calls by 1.
TBR=kjellander@chromium.org
Bug: libyuv:701
Test: BenchmarkHammingDistance
Change-Id: I7f557ef047920db65eab362a5f93abbd274ca051
Reviewed-on: https://chromium-review.googlesource.com/701755
Reviewed-by: Frank Barchard <fbarchard@google.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
If length of HammingDistance was not a multiple of 4,
the result was incorrect. The old tests did not catch this
so a new test is done to count 1s.
Bug: libyuv:740
Test: LibYUVCompareTest.TestHammingDistance
Change-Id: I93db5437821c597f1f162ac263d4a594bb83231f
Reviewed-on: https://chromium-review.googlesource.com/699614
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Cheng Wang <wangcheng@google.com>
Reviewed-by: Frank Barchard <fbarchard@google.com>
Commit-Queue: Frank Barchard <fbarchard@google.com>