1703 Commits

Author SHA1 Message Date
Bruce Lai
f4bd840794 Fix compile error for riscv scalar & simplify cmake cross build flow
1. Fix compile error when build riscv without using vector

2. Fix run_qemu.sh misused v=true for USE_RVV=OFF case

3. [cmake] Fix warning by rename TEST to UNIT_TEST
Warning log:
CMake Warning (dev) at CMakeLists.txt:57 (if):                                                                                                                                                                                                                  [54/1931]
  Policy CMP0064 is not set: Support new TEST if() operator.  Run "cmake
  --help-policy CMP0064" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  TEST will be interpreted as an operator when the policy is set to NEW.
  Since the policy is not set the OLD behavior will be used.
This warning is for project developers.  Use -Wno-dev to suppress it.

4. [cmake] Simplify logic for cross-build

Bug: libyuv:956

Change-Id: I120402fc7d6d86403e7d974180b81f4f9c663e36
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4486239
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-04 18:09:00 +00:00
Bruce Lai
8811ad8ba1 Fix TestLinuxRVV test fail
Fail log:
[ RUN      ] LibYUVBaseTest.TestLinuxRVV
Note: testing to load "../../unit_test/testdata/riscv64.txt"
/scratch/brucel/libyuv/src/unit_test/cpu_test.cc:290: Failure
Expected equality of these values:
  kCpuHasRVV | kCpuHasRVVZVFH
    Which is: 1610612736
  RiscvCpuCaps("../../unit_test/testdata/riscv64_rvv_zvfh.txt")
    Which is: 536870912
[  FAILED  ] LibYUVBaseTest.TestLinuxRVV (17 ms)

Reason:
The root cause is "\n"  may be contained in the ext variable.
The last of extension substring contains "\n".
For instance, test case riscv64_rvv_zvfh.txt, the last substring is "zvfh\n" instead of "zvfh".
Solved this failure by removing "\n" which is at the end of line.

NOTE: We avoid using strstr() to solve the problem here.
Becasue using strstr() will violate the parsing rule, if future extension contains "zvfh"(e.g zvfhxxx).

Log after modification:
[ RUN      ] LibYUVBaseTest.TestLinuxRVV
Note: testing to load "../../unit_test/testdata/riscv64.txt"
[       OK ] LibYUVBaseTest.TestLinuxRVV (38 ms)

Change-Id: I7b7db98dbc5388cbc148423da6892b8f0be64599
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4498101
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-04 03:26:25 +00:00
Darren Hsieh
1b3c4c12d4 Add Split/Merge RGB/ARGB/XRGB Row_RVV
* Run on SiFive internal FPGA:

SplitRGBPlane_Opt (~6.87x vs scalar)

SplitARGBPlane_Opt (~10.77x vs scalar)

SplitXRGBPlane_Opt (~18.69x vs scalar)

MergeRGBPlane_Opt (~3.63x vs scalar)

MergeARGBPlane_Opt (~3.50x vs scalar)

MergeXRGBPlane_Opt (~2.90x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

- include a fix to avoid implict conversion warning between size_t & int.

Bug: libyuv:956

Change-Id: Icd79b282b04ea3981e7fd4e6d547da6708d82516
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4443411
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-28 18:34:46 +00:00
Frank Barchard
7c6a7e5737 cpuid for arm/mips/riscv initialize buffer
- change cpu printf to hex to better show flags

util/cpuid:
Cpu Flags 0x30000001
Has RISCV 0x10000000
Has RVV 0x20000000


[ RUN      ] LibYUVBaseTest.TestCpuHas
Cpu Flags 0x30000001
Has RISCV 0x10000000
Has RVV 0x20000000
Has RVVZVFH 0x0
[       OK ] LibYUVBaseTest.TestCpuHas (1 ms)
[ RUN      ] LibYUVBaseTest.TestCompilerMacros
__ATOMIC_RELAXED 0
__cplusplus 201703
__clang_major__ 9999
__clang_minor__ 0
__GNUC__ 4
__GNUC_MINOR__ 2
__riscv 1
__riscv_vector 1
__clang__ 1
__llvm__ 1
__pic__ 2
INT_TYPES_DEFINED
__has_feature

Bug: libyuv:956
Change-Id: Iee4f1f34799434390e756de1e6c2c4596d82ace5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4484957
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-27 22:46:27 +00:00
Frank Barchard
cf21b5ea5c Rename variables to match layout of ABGR
Bug: None
Change-Id: Ia1d596b6e108307fe042a03c34162b25152293d4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4461967
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-26 16:57:33 +00:00
Bruce Lai
1330a79e9f Optimized AR64/AB64 <-> ARGB with RVV
* Run on SiFive internal FPGA:

ARGBToAR64_Opt (~13.7x vs scalar)
ARGBToAB64_Opt (~5.81x vs scalar)
AR64ToARGB_Opt (~15.8x vs scalar)
AB64ToARGB_Opt (~2.40x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Bug: libyuv:956

Change-Id: Ida642a5077f59d25fb7c5328f671956b2293dadd
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4442913
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-20 19:49:55 +00:00
Frank Barchard
c994782086 Enable RVV if qemu is detected
- include a fix for jpeg unittests to do at least 1 iteration
- include a fix for scale uv to only use linearup2 if filter is linear

Tested on qemu with Intel host:
[ RUN      ] LibYUVBaseTest.TestCpuHas
Cpu Flags 805306369
Has RISCV 268435456
Has RVV 536870912
Has RVVZVFH 0
Has X86 0

Bug: libyuv:956, libyuv:959, libyuv:960
Change-Id: I4a1b66f83d82ba127780f52526153d586db90111
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4429570
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Randall Bosetti <rlb@google.com>
2023-04-18 20:29:04 +00:00
Darren Hsieh
44396e6e9a Add ARGBToRAWRow_RVV, ARGBToRGB24Row_RVV, RGB24ToARGBRow_RVV
* Run on SiFive internal FPGA:

ARGBToRAW_Opt (~1.55x vs scalar)

ARGBToRGB24_Opt (~1.44x vs scalar)

RGB24ToARGB_Opt (~1.77x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Bug: libyuv:956

Change-Id: I26722f6848cd68684d95d9a7ee06ce0416e7985d
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4413083
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-13 19:33:16 +00:00
Frank Barchard
68659d0d68 UVScale down by 2 fix for C and optimize for NEON
- update cpu_id to use "re" for fopen to avoid leaking handles if a thread is started while the file is open.

Bug: libyuv:958
Change-Id: I1af9de68fce12e440e1226fc8070634ccb1bf090
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4417176
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-12 22:49:20 +00:00
Frank Barchard
ee3e71c7ce Any functions use memset(vin, 0, sizeof(vin)) for GCC warning fix
- Fix -Wmemset-elt-size warning for GCC
- Use vin for inputs and vout for outputs

Bug: None
Change-Id: Iefd418dc884b4d062e1fdd9215319c8838c49eaa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4412065
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2023-04-10 20:44:10 +00:00
Darren Hsieh
724e7aee03 Fix macro define typo in scale_uv.cc
The correct define can be found in scale_row.h

Change-Id: I633ed47006c7bd8014038493005c2d934489ff18
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4411353
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-10 16:55:48 +00:00
James Zern
0200037a5a row_any,ANYDETILE: fix -Wmemset-elt-size warning
under gcc 12.2.0 using -Wall:

source/row_any.cc: In function ‘void libyuv::DetileRow_16_Any_SSE2(const
                       uint16_t*, ptrdiff_t, uint16_t*, int)’:
source/row_any.cc:2287:11: warning: ‘memset’ used with length equal to
number of elements without multiplication by element size
[-Wmemset-elt-size]
 2287 |     memset(temp, 0, 16 * BPP); /* for msan */
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
source/row_any.cc:2308:1: note: in expansion of macro ‘ANYDETILE’
 2308 | ANYDETILE(DetileRow_16_Any_SSE2, DetileRow_16_SSE2, uint16_t, 2, 15)

This increases the memset to the full buffer size, which may not be
strictly necessary.

Change-Id: Iea2fc649990ee84ea9aa8020d6f6b25e012b18fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4406599
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-08 19:01:02 +00:00
Darren Hsieh
e8af6cb2e4 Add RAWToARGBRow_RVV,RAWToRGBARow_RVV,RAWToRGB24Row_RVV
* Run on SiFive internal FPGA:

RAWToARGB_Opt (~2x vs scalar)

RAWToRGBA_Opt (~2x vs scalar)

RAWToRGB24_Opt (~1.5x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Change-Id: I21a13d646589ea2aa3822cb9225f5191068c285b
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408357
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-07 18:45:08 +00:00
Darren Hsieh
aa47d668d8 Add riscv cpu info detection.
* Supports:

 * The standard single-letter Vector detection.

 * Vector fp16 detection.

Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Change-Id: Ia7ee1bd8ec1a990f1b2b1700805942e99c0aa87b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4401738
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-06 15:58:29 +00:00
Wan-Teh Chang
ec48e4328e Add assertions for the Clang static analyzer
The Clang static analyzer (scan-build) in LLVM 14 warns about
array index out of bounds in scaletbl[boxwidth - minboxwidth] in
ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two
elements. It's not clear the index boxwidth - minboxwidth is either 0 or
1.

Change-Id: I072476e86950154beffe6b1a89915755118b3cbd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2023-04-05 21:58:22 +00:00
Frank Barchard
464c51a035 AArch32 YUVTORGB_SETUP use load and dup to avoid modifying pointer
- Allows code to be optimized with clang 17 -flto-thin
- Bump version number to 1864 to allow detection of fix
- Apply clang format to standardize formatting; No impact on code generated

Bug: chromium:1424089
Change-Id: Ib745836b27915a5e4cb1d7d928ee52659360612b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370052
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2023-03-24 19:32:30 +00:00
Frank Barchard
1a971f8cc3 clang 17 -flto-thin bug fix for Neon YUVtoRGB and ARGBToRGB565Dither
- YUV to RGB AArch32 kRGBCoeffBias rewind pointer
- ARGBToRGB565Dither declare width and source pointers as modified


Bug: chromium:1424089
Change-Id: I987180652331bab16ce27d8d166399a687ee890e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370099
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-03-24 10:59:40 +00:00
Frank Barchard
3f219a3501 GCC warning fix for MT2T
- Fix redundent assignment compile warning in GCC
- Apply clang-format
- Bump version to 1863

Bug: libyuv:955
Change-Id: If2b6588cd5a7f068a1745fe7763e90caa7277101
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4344729
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2023-03-16 06:57:20 +00:00
Justin Green
76468711d5 M2T2 Unpack fixes
Fix the algorithm for unpacking the lower 2 bits of M2T2 pixels.

Bug: b:258474032
Change-Id: Iea1d63f26e3f127a70ead26bc04ea3d939e793e3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4337978
Commit-Queue: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-03-14 14:59:26 +00:00
Frank Barchard
f9b23b9cc0 Transpose 4x4 for SSE2 and AVX2
Skylake Xeon
AVX2 Transpose4x4_Opt (290 ms)
SSE2 Transpose4x4_Opt (302 ms)
C    Transpose4x4_Opt (522 ms)

AMD Zen2
AVX2 Transpose4x4_Opt (136 ms)
SSE2 Transpose4x4_Opt (137 ms)
C    Transpose4x4_Opt (431 ms)

Bug: None
Change-Id: I4997dbd5c5387c22bfd6c5960b421504e4bc8a2a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4292946
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-03-03 17:46:23 +00:00
Frank Barchard
88b050f337 MergeUV AVX512BW use assembly
- Convert MergeUVRow_AVX512BW to assembly
- Enable MergeUVRow_AVX512BW for Windows with clangcl
- MergeUVRow_AVX2 use vpmovzxbw and vpsllw
- MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V

AMD Zen 4 640x360 100000 iterations
Was
AVX512 MergeUVPlane_Opt (884 ms)
AVX2   MergeUVPlane_Opt (945 ms)
AVX2   MergeUVPlane_16_Opt (2167 ms)

Now
AVX512 MergeUVPlane_Opt (865 ms)
AVX2   MergeUVPlane_Opt (943 ms)
SSE2   MergeUVPlane_Opt (973 ms)
AVX2   MergeUVPlane_16_Opt (2102 ms)

Bug: None
Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230
Reviewed-by: Fritz Koenig <frkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-22 21:19:08 +00:00
Frank Barchard
2bdc210be9 MergeUV_AVX512BW for I420ToNV12
On Skylake Xeon 640x360 100000 iterations
AVX512   MergeUVPlane_Opt (1196 ms)
AVX2     MergeUVPlane_Opt (1565 ms)
SSE2     MergeUVPlane_Opt (1780 ms)
Pixel 7  MergeUVPlane_Opt (1177 ms)

Bug: None
Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-13 20:14:57 +00:00
Sergio Garcia Murillo
b2528b0be9 Add support for odd width and height in I410ToI420
Bug: libyuv:950
Change-Id: Ic9a094463af875aefd927023f730b5f35f8551de
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4154630
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-01-23 19:05:00 +00:00
Hao Chen
0809713775 Refine some functions on the Longarch platform.
Add ARGBToYMatrixRow_LSX/LASX, RGBAToYMatrixRow_LSX/LASX and
RGBToYMatrixRow_LSX/LASX functions with RgbConstants argument.

Bug: libyuv:912
Change-Id: I956e639d1f0da4a47a55b79c9d41dcd29e29bdc5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4167860
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-18 18:54:14 +00:00
Frank Barchard
0faf8dd0e0 Fix for DivideRow_NEON functions
- was dup of 8h but mul of 4s.  now use umull

Bug: libyuv:951
Change-Id: If6cb01f5f006c2235886b81ce120642d7e24a9bb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166563
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-18 00:30:05 +00:00
Frank Barchard
541d8efbaf Fix for divide row functions used by P010ToI010
Bug: libyuv:951
Change-Id: Id323656cb6f99b1be0be7aaa854d3cc15feeba69
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166562
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-17 21:40:45 +00:00
Frank Barchard
d5aa3d4a76 P010ToI010 and P012ToI012 conversion functions
- Convert 10 and 12 bit biplanar formats to planar.
- Shift 10 MSB to 10 LSB
- P010 is similar to NV12 in layout, but uses 10 MSB of 16 bit values.
- I010 is similar to I420 in layout, but uses 10 LSB of 16 bit values.

Bug: libyuv:951
Change-Id: I16a1bc64239d0fa4f41810910da448bf5720935f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166560
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-13 19:20:12 +00:00
Frank Barchard
6e4b0acb4b I422Rotate take stride for temporary buffers
- Minor variable name changes first/last to top/bottom
- Comments explaining rotate temporary buffers usage
- Add asserts for scale parameter
- Use NULL and stddef.h instead of 0
- Use void * for allocation in row.h
- Add () around size parameter in macros


Bug: libyuv:926, libyuv:949
Change-Id: Ib55417570926ccada0a0f8abd1753dc12e5b162e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4136762
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-04 23:11:52 +00:00
Sergio Garcia Murillo
f8626a7224 Add 10 bit rotate methods.
This initial implementation is based on current unoptimized code in webrtc using just plain for loops.

Bug: libyuv:949
Change-Id: Ic87ee49c3a0b62edbaaa4255c263c1f7be4ea02b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110782
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-04 21:10:01 +00:00
Sergio Garcia Murillo
22a579c438 Use ScalePlaneDown2_16To8 for avoiding the 2 step process
Bug: libyuv:950
Change-Id: I5a77bca9a0230fe00abd810939e217833a14683f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4134524
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-03 21:41:21 +00:00
Sergio Garcia Murillo
f583b1b4b8 Add I410Copy and I410ToI420 methods
The I410To420 implementation does a two step approach for scaling down and 10-to-8 bit conversion using the Y plane as temporal storage.

Bug: libyuv:950
Change-Id: I3d35fad4b99e17253230456233fbd947e013c0ec
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110783
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-01-03 20:27:28 +00:00
Frank Barchard
3abd6f36b6 Casting for scale functions
- MT2T support for source strides added, but only works for positive values.
- Reduced casting in row_common - one cast per assignment.
- scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t

Bug: libyuv:948, b/257266635, b/262468594
Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ernest Hua <ernesthua@google.com>
2022-12-15 22:34:22 +00:00
Frank Barchard
610e0cdead MT2T Warning fixes for fuchsia
Bug: b/258474032, b/257266635
Change-Id: Ic5cbbc60e2e1463361e359a2fe3e97976c1ea929
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4081348
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-12-06 19:54:40 +00:00
Frank Barchard
ea26d7adb1 DetilePlane_16 AVX version
- fix ifdefs for DetilePlane_16 to use 16 bit versions, not 8 bit.  (no functional change)

Bug: b/258474032
Change-Id: Ic07e02d9801e21126ebee0ceb5779aa712a493ce
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4034812
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-18 23:59:06 +00:00
Frank Barchard
8713ba3f0b Add vzeroupper to AVX row functions
- move power of two macro to planar functions source
- revert row.h IS_ALIGNED change

Bug: b/258474032
Change-Id: If87bb8d55c9b9930dd3e378614f8e4faae0870e9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4035166
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-11-17 23:00:08 +00:00
Frank Barchard
2d2cee418a Add Detile_16 planar function for 10 bit MT2T format
- Neon and SSE2
- Any for odd widths

Pixel 2 little core AArch32 build
C
TestDetilePlane_16 (1275 ms)
TestDetilePlane (1203 ms)
Neon
TestDetilePlane_16 (693 ms)
TestDetilePlane (660 ms)

Bug: b/258474032
Change-Id: Idbd09c5e9324e4deef5f1d54090d4b63cc7db812
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4031848
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-11-17 02:47:57 +00:00
Frank Barchard
fe9ced6e3c ScaleRowUp2_Bilinear_12_SSSE3 preserve xmm7 for Windows
- Preserve xmm7 in ScaleRowUp2_Bilinear_12_SSSE3
- Previously xmm7 was used in ScaleRowUp2_Bilinear_12_SSSE3 without being preserved, which violates the Windows x64 calling conventions and can cause undefined behavior.


Bug: libyuv:945, 1218384
Change-Id: If18b292b588573355f9b4ba8c5b9c3fbe143d36b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3972137
Reviewed-by: Bruce Dawson <brucedawson@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-10-21 19:35:17 +00:00
Frank Barchard
3da24c3ca3 Detile vld for gcc build fix
- add {} around loaded register

Bug: libyuv:944
Change-Id: I0d916e37beb50bda0838e4867742eb7afa57e1cc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3957634
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-10-14 19:12:53 +00:00
Frank Barchard
cb35d5f90e BGRAToI420 use SSSE3 for Y but C for UV when LIBYUV_BIT_EXACT enabled
- Previously was C for both Y and UV.

Was BGRAToI420_Opt (17780 ms)
Now BGRAToI420_Opt (9546 ms)

Bug: b/253491233
Change-Id: Id103d8d5ba0fed0f7a427dd5955e1830275eff6b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3953131
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-10-14 03:09:56 +00:00
Jeremy Maitin-Shepard
c365da9c6c Use find_package(JPEG) in place of include(FindJPEG)
The former allows the package to be overridden with a local build by
`FetchContent`.

Also includes the fix to _MSC_VER conditions from:
https://aomedia.googlesource.com/aom/+/refs/heads/main/third_party/libyuv/README.libaom

Change-Id: I666e591becb3efaa7b5b68d27476319a2909b88e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3933570
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-10-03 22:19:28 +00:00
Frank Barchard
00950840d1 YUY2ToNV12 using YUY2ToY and YUY2ToNVUV
- Optimized YUY2ToNV12 that reduces it from 3 steps to 2 steps
  - Was SplitUV, memcpy Y, InterpolateUV
  - Now YUY2ToY, YUY2ToNVUV
- rollback LIBYUV_UNLIMITED_DATA

3840x2160 1000 iterations:

Pixel 2 Cortex A73
Was YUY2ToNV12_Opt (6515 ms)
Now YUY2ToNV12_Opt (3350 ms)

AB7 Mediatek P35 Cortex A53
Was YUY2ToNV12_Opt (6435 ms)
Now YUY2ToNV12_Opt (3301 ms)

Skylake AVX2 x64
Was YUY2ToNV12_Opt (1872 ms)
Now YUY2ToNV12_Opt (1657 ms)

SSE2 x64
Was YUY2ToNV12_Opt (2008 ms)
Now YUY2ToNV12_Opt (1691 ms)

Windows Skylake AVX2 32 bit x86
Was YUY2ToNV12_Opt (2161 ms)
Now YUY2ToNV12_Opt (1628 ms)

Bug: libyuv:943
Change-Id: I6c2ba2ae765413426baf770b837de114f808f6d0
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3929843
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-30 22:41:21 +00:00
Frank Barchard
b9adaef113 Enable unlimited data for YUV to RGB
- Provide LIBYUV_LIMITED_DATA macro for backwards compatiblity

Bug: b/474156256
Change-Id: I5d5d7fb640d51ae3c5ad363f2a28c8bfbd3048a5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3912081
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-23 12:51:37 +00:00
Frank Barchard
f9fda6e7d8 Fix shift amount for SSSE3 assembly for I012 format conversions
Bug: libyuv:938, libyuv:942
Change-Id: I6fb6e7e17fa941785e398bc630f465baf72fcabd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906091
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 23:07:53 +00:00
Frank Barchard
8fc02134c8 10/12 bit YUV replicate upper bits to low bits before converting to RGB
- shift high bits of 10 and 12 bit into lower bits

Bug: libyuv:941, libyuv:942,
Change-Id: I14381dbf226ef27dcce06893ea88860835639baa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906085
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-20 20:56:43 +00:00
Frank Barchard
e4b1ddd8fe Fix immediate offsets for row_neon build on gcc
Bug: libyuv:942
Change-Id: I7d2dc87a44cc1cc5c79c37f407583e0c907dc2de
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906088
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-20 20:16:13 +00:00
Frank Barchard
248172e2ba I422ToRGB24, I422ToRAW, I422ToRGB24MatrixFilter conversion functions added.
- YUV to RGB use linear for first and last row.
- add assert(yuvconstants)
- rename pointers to match row functions.
- use macros that match row functions.
- use 12 bit upsampler for conversions of 10 and 12 bits

Cortex A53 AArch32
I420ToRGB24_Opt (3627 ms)
I422ToRGB24_Opt (4099 ms)
I444ToRGB24_Opt (4186 ms)
I420ToRGB24Filter_Opt (5451 ms)
I422ToRGB24Filter_Opt (5430 ms)

AVX2
Was I420ToRGB24Filter_Opt (583 ms)
Now I420ToRGB24Filter_Opt (560 ms)

Neon Cortex A7
Was I420ToRGB24Filter_Opt (5447 ms)
Now I420ToRGB24Filter_Opt (5439 ms)

Bug: libyuv:938


Change-Id: I1731f2dd591073ae11a756f06574103ba0f803c7
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3906082
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2022-09-20 02:00:52 +00:00
Frank Barchard
f71c83552d I420ToRGB24MatrixFilter function added
- Implemented as 3 steps: Upsample UV to 4:4:4, I444ToARGB, ARGBToRGB24
- Fix some build warnings for missing prototypes.

Pixel 4
I420ToRGB24_Opt (743 ms)
I420ToRGB24Filter_Opt (1331 ms)

Windows with skylake xeon:
x86 32 bit
I420ToRGB24_Opt (387 ms)
I420ToRGB24Filter_Opt (571 ms)
x64 64 bit
I420ToRGB24_Opt (384 ms)
I420ToRGB24Filter_Opt (582 ms)


Bug: libyuv:938, libyuv:830
Change-Id: Ie27f70816ec084437014f8a1c630ae011ee2348c
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3900298
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-09-16 19:46:47 +00:00
Frank Barchard
3e38ce5058 SSE2 MM21->YUY2 conversion
Add SSE2 optimization for MM21ToYUY2 conversion.

Bug: b/238137982
Change-Id: I189f712514308322f651b082b496bce9c015c4ee
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832525
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-08-17 18:39:05 +00:00
Frank Barchard
65e7c9d570 MM21ToYUY2 and ABGRToJ420 conversion
MM21 to YUY2 use zip1 for performance

Cortex A510
Was MM21ToYUY2 (612 ms)
Now MM21ToYUY2 (573 ms)

Prefetches help Cortex A53
Was MM21ToYUY2 (4998 ms)
Now MM21ToYUY2 (1900 ms)

Pixel 4 Cortex A76
Was MM21ToYUY2 (215 ms)
Now MM21ToYUY2 (173 ms)

ABGRToJ420
- NEON, SSSE3 and AVX2 row functions
- J400, J420 and J422 formats.
- Added AVX2 for UV on ARGBToJ420.  Was SSSE3

Same code/performance as ARGBToJ420 but with constants re-ordered.
Pixel 4
ABGRToJ420_Opt (623 ms)
ABGRToJ422_Opt (702 ms)
ABGRToJ400_Opt (238 ms)

Skylake Xeon
With LIBYUV_BIT_EXACT which uses C for UV
ABGRToJ420_Opt (988 ms)
ABGRToJ422_Opt (1872 ms)
ABGRToJ400_Opt (186 ms)
Skylake Xeon using AVX2
ABGRToJ420_Opt (251 ms)
ABGRToJ422_Opt (245 ms)
ABGRToJ400_Opt (184 ms)
Skylake Xeon using SSSE3
ABGRToJ420_Opt (328 ms)
ABGRToJ422_Opt (362 ms)
ABGRToJ400_Opt (185 ms)

Bug: b/238137982
Change-Id: I559c3fe3fb80fa2ce5be3d8218736f9cbc627666
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3832111
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2022-08-16 22:07:38 +00:00
Frank Barchard
1c5a8bb17a AB64ToARGB fix for inplace conversion
- add tests for all single plane formats that reduce or stay same in size

Bug: b/242233673
Change-Id: Ic25d808114f11995ac56ea9c31b99f66ba36d345
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/3828485
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2022-08-12 01:28:13 +00:00