1720 Commits

Author SHA1 Message Date
Darren Hsieh
d14bd701c8 [RVV] Enable CopyRow_RVV, InterpolateRow_RVV, {Merge,Split}UVRow_RVV
* Run on SiFive internal FPGA:

MergeUVPlane_Opt(~6x vs scalar)
SplitUVPlane_Opt(~6x vs scalar)
TestCopyPlane(~8x vs scalar)
ARGBInterpolate0_Opt(~10x vs scalar)
ARGBInterpolate64_Opt(~9x vs scalar)
ARGBInterpolate168_Opt(~9x vs scalar)
ARGBInterpolate192_Opt(~8.5x vs scalar)
ARGBInterpolate255_Opt(~8x vs scalar)

Bug: libyuv:956
Change-Id: I8372341865f75f42e30371ef943d5c2e4be7b79a
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4574186
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-30 09:10:35 +00:00
Frank Barchard
78d168054b Remove extraneous quote from clobber list
Bug: None
Change-Id: Ie20574d0f9c8c2f074247405b294b49c3406448d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4568770
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2023-05-30 09:03:05 +00:00
Justin Green
0e111d2c58 Wrap neon registers in {} for the neon MT2T unpack implementation. Some compilers throw a syntax error otherwise.
Change-Id: Ic169dcfe4d9bb9bf6d0dcae977d6cf510a7a60bf
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4568904
Commit-Queue: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-26 17:12:02 +00:00
Frank Barchard
22c7a51452 Fix SplitRGB clobber list to include all registers used
Bug:  None
Change-Id: Icac4becb0537903ab87495fb0e2a2b750e1eca4f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4563355
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: David Gao <davidgao@google.com>
2023-05-24 21:44:59 +00:00
Wan-Teh Chang
dcbe082070 Save boxwidth - minboxwidth in a local variable
Avoid repetitions of the expression boxwidth - minboxwidth.

Change-Id: Ib53fb6b06a926b80ff9a64cc5d499aeef0894c99
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408062
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-22 19:10:13 +00:00
Bruce Lai
de3e7fd147 Manually remove rounding value inside yb(yuvconstant) in row_rvv.cc
After libyuv:961 is completed, yb(yuvconstant) will no longer contain rounding bias +32 for fixed-point.
This CL removes rounding bias(-32) manmually in row_rvv.cc.
Hence, all fixed-point related codes' rounding mode is changed to round-to-nearest-up "0" in row_rvv.cc.

Also, replace vwmul+vnsrl w/ vmulh in I400ToARGBRow_RVV.

Bug: libyuv:956, libyuv:961
Change-Id: I10e34668a2332e38393e9d68414f07aafb6c7cf7
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4550591
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-22 18:15:27 +00:00
Wan-Teh Chang
179b0203e5 Enable {J400/I400}ToARGBRow_RVV
Run on SiFive internal FPGA*:

I400ToARGB_Opt (~8x vs scalar)
J400ToARGB_Opt (~10x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Bug: libyuv:956, libyuv:961
Change-Id: If4e21ec85c4ff79083ec16a6faae0e457129a8de
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4544972
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2023-05-20 23:29:33 +00:00
Lu Wang
8670bcf17f Optimize the following 19 functions with LSX in row_lsx.cc.
UYVYToYRow_LSX, UYVYToUVRow_LSX, UYVYToUV422Row_LSX,
ARGBToUVRow_LSX, ARGBToRGB24Row_LSX, ARGBToRAWRow_LSX,
ARGBToRGB565Row_LSX, ARGBToARGB1555Row_LSX, ARGBToARGB4444Row_LSX,
ARGBToUV444Row_LSX, ARGBMultiplyRow_LSX, ARGBAddRow_LSX,
ARGBSubtractRow_LSX, ARGBAttenuateRow_LSX, ARGBToRGB565DitherRow_LSX,
ARGBShuffleRow_LSX, ARGBShadeRow_LSX, ARGBGrayRow_LSX,
ARGBSepiaRow_LSX

Bug: libyuv:913
Change-Id: I02c0c9d68b229c4a66c96837e9b928c2f5dda1f3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4546814
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-19 18:55:58 +00:00
Frank Barchard
a37799344d ARGBToI420Alpha function to convert ARGB to I420 with Alpha
Bug: b/281866362
Change-Id: Ic1093a887fb483f134c78909cf1ee7495e7345ba
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4534100
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
2023-05-17 00:23:24 +00:00
Bruce Lai
11d4536002 Enable I{422,444}AlphaToARGBRow_RVV & ARGBAttentuateRow_RVV
Run on SiFive internal FPGA:

I444AlphaToARGB_Opt (~16x vs scalar)
I422AlphaToARGB_Opt (~10x vs scalar)
ARGBAttenuate_Opt (~3x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Change-Id: I0046eb7af8104bc8e13cee1cb91a19f90940d5b0
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4535657
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-16 19:20:49 +00:00
Frank Barchard
6a68b18a96 Bump version and apply clang format
Bug: libyuv:956
Change-Id: I2375a02583789af2a5f13f8dba6c663d5975aaa9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4522352
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-11 11:27:28 +00:00
Bruce Lai
59eae49f17 Enable ARGBToYMatrixRow_RVV/RGBAToYMatrixRow_RVV/RGBToYMatrixRow_RVV
Run on SiFive internal FPGA:

ARGBToJ400_Opt (~6x vs scalar)
RGBAToJ400_Opt (~6x vs scalar)
RGB24ToJ400_Opt (~5.5x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Change-Id: Ia3ce8cea7962fbd8618cc23e850a7913c9cabf4f
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4521783
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-11 10:17:51 +00:00
Darren Hsieh
497ea35688 Enable I444To{ARGB,RGB24}Row_RVV
Run on SiFive internal FPGA:

I444ToARGB_Opt (~16x vs scalar)
I444ToRGB24_Opt (~10x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Change-Id: Idae7dc46ef648beaa14b58ba3eb56b67b17c9b3b
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4520761
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-10 19:50:56 +00:00
Darren Hsieh
964d963afb Enable I422To{ARGB,RGBA,RGB24}Row_RVV
Run on SiFive internal FPGA:

I422ToARGB_Opt (~10x vs scalar)
I422ToRGBA_Opt (~10x vs scalar)
I420ToRGB24_Opt (~8x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

This CL manually sets rounding mode,
since we use fixed-point vector narrowing clip.
There is no definition about default value for fixed-point rounding mode.
https://github.com/riscv/riscv-v-spec/blob/master/v-spec.adoc#38-vector-fixed-point-rounding-mode-register-vxrm
The behavior could be different on differet paltforms. To avoid unexpected behavior, we set rounding mode manually.

Change-Id: I90f0dcb90c37f7da7caab8eb1df6c9c7a3c874a8
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4512373
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-10 00:29:20 +00:00
Lu Wang
1d940cc570 Optimize the following functions with LSX.
MirrorRow_LSX, MirrorUVRow_LSX, ARGBMirrorRow_LSX,
I422ToYUY2Row_LSX, I422ToUYVYRow_LSX, I422ToARGBRow_LSX,
I422ToRGBARow_LSX, I422AlphaToARGBRow_LSX, I422ToRGB24Row_LSX,
I422ToRGB565Row_LSX, I422ToARGB4444Row_LSX, I422ToARGB1555Row_LSX,
YUY2ToYRow_LSX, YUY2ToUVRow_LSX, YUY2ToUV422Row_LSX

Bug: libyuv:913
Change-Id: I46cec605001d7ddd73846eed6d0a77f936b6dc53
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4515191
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-10 00:25:48 +00:00
James Zern
b372510c56 row_win.cc: fix ARM64EC build
include intrin.h rather than emmintrin.h; fixes:
C:\...\VC\Tools\MSVC\14.35.32215\include\emmintrin.h(28,1):
fatal  error C1189: #error:  this header should only be included through

Change-Id: Ief9c81f6f1971e552c8aac301d678b64fe5bd7cc
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4513825
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-05-09 19:56:35 +00:00
shaodiwei
4c209d264d MergeUVRow_AVX2 implementation is consistent in row_win.cc and row_gcc.cc,the commit can fix memory is wrote out of bounds
Change-Id: I4b771a46fc853effc4c0fa3ae8032322a8369dc9
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4514810
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-09 18:54:25 +00:00
Bruce Lai
f4bd840794 Fix compile error for riscv scalar & simplify cmake cross build flow
1. Fix compile error when build riscv without using vector

2. Fix run_qemu.sh misused v=true for USE_RVV=OFF case

3. [cmake] Fix warning by rename TEST to UNIT_TEST
Warning log:
CMake Warning (dev) at CMakeLists.txt:57 (if):                                                                                                                                                                                                                  [54/1931]
  Policy CMP0064 is not set: Support new TEST if() operator.  Run "cmake
  --help-policy CMP0064" for policy details.  Use the cmake_policy command to
  set the policy and suppress this warning.

  TEST will be interpreted as an operator when the policy is set to NEW.
  Since the policy is not set the OLD behavior will be used.
This warning is for project developers.  Use -Wno-dev to suppress it.

4. [cmake] Simplify logic for cross-build

Bug: libyuv:956

Change-Id: I120402fc7d6d86403e7d974180b81f4f9c663e36
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4486239
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-04 18:09:00 +00:00
Bruce Lai
8811ad8ba1 Fix TestLinuxRVV test fail
Fail log:
[ RUN      ] LibYUVBaseTest.TestLinuxRVV
Note: testing to load "../../unit_test/testdata/riscv64.txt"
/scratch/brucel/libyuv/src/unit_test/cpu_test.cc:290: Failure
Expected equality of these values:
  kCpuHasRVV | kCpuHasRVVZVFH
    Which is: 1610612736
  RiscvCpuCaps("../../unit_test/testdata/riscv64_rvv_zvfh.txt")
    Which is: 536870912
[  FAILED  ] LibYUVBaseTest.TestLinuxRVV (17 ms)

Reason:
The root cause is "\n"  may be contained in the ext variable.
The last of extension substring contains "\n".
For instance, test case riscv64_rvv_zvfh.txt, the last substring is "zvfh\n" instead of "zvfh".
Solved this failure by removing "\n" which is at the end of line.

NOTE: We avoid using strstr() to solve the problem here.
Becasue using strstr() will violate the parsing rule, if future extension contains "zvfh"(e.g zvfhxxx).

Log after modification:
[ RUN      ] LibYUVBaseTest.TestLinuxRVV
Note: testing to load "../../unit_test/testdata/riscv64.txt"
[       OK ] LibYUVBaseTest.TestLinuxRVV (38 ms)

Change-Id: I7b7db98dbc5388cbc148423da6892b8f0be64599
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4498101
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-05-04 03:26:25 +00:00
Darren Hsieh
1b3c4c12d4 Add Split/Merge RGB/ARGB/XRGB Row_RVV
* Run on SiFive internal FPGA:

SplitRGBPlane_Opt (~6.87x vs scalar)

SplitARGBPlane_Opt (~10.77x vs scalar)

SplitXRGBPlane_Opt (~18.69x vs scalar)

MergeRGBPlane_Opt (~3.63x vs scalar)

MergeARGBPlane_Opt (~3.50x vs scalar)

MergeXRGBPlane_Opt (~2.90x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

- include a fix to avoid implict conversion warning between size_t & int.

Bug: libyuv:956

Change-Id: Icd79b282b04ea3981e7fd4e6d547da6708d82516
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4443411
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-28 18:34:46 +00:00
Frank Barchard
7c6a7e5737 cpuid for arm/mips/riscv initialize buffer
- change cpu printf to hex to better show flags

util/cpuid:
Cpu Flags 0x30000001
Has RISCV 0x10000000
Has RVV 0x20000000


[ RUN      ] LibYUVBaseTest.TestCpuHas
Cpu Flags 0x30000001
Has RISCV 0x10000000
Has RVV 0x20000000
Has RVVZVFH 0x0
[       OK ] LibYUVBaseTest.TestCpuHas (1 ms)
[ RUN      ] LibYUVBaseTest.TestCompilerMacros
__ATOMIC_RELAXED 0
__cplusplus 201703
__clang_major__ 9999
__clang_minor__ 0
__GNUC__ 4
__GNUC_MINOR__ 2
__riscv 1
__riscv_vector 1
__clang__ 1
__llvm__ 1
__pic__ 2
INT_TYPES_DEFINED
__has_feature

Bug: libyuv:956
Change-Id: Iee4f1f34799434390e756de1e6c2c4596d82ace5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4484957
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-27 22:46:27 +00:00
Frank Barchard
cf21b5ea5c Rename variables to match layout of ABGR
Bug: None
Change-Id: Ia1d596b6e108307fe042a03c34162b25152293d4
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4461967
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-26 16:57:33 +00:00
Bruce Lai
1330a79e9f Optimized AR64/AB64 <-> ARGB with RVV
* Run on SiFive internal FPGA:

ARGBToAR64_Opt (~13.7x vs scalar)
ARGBToAB64_Opt (~5.81x vs scalar)
AR64ToARGB_Opt (~15.8x vs scalar)
AB64ToARGB_Opt (~2.40x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Bug: libyuv:956

Change-Id: Ida642a5077f59d25fb7c5328f671956b2293dadd
Signed-off-by: Bruce Lai <bruce.lai@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4442913
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-20 19:49:55 +00:00
Frank Barchard
c994782086 Enable RVV if qemu is detected
- include a fix for jpeg unittests to do at least 1 iteration
- include a fix for scale uv to only use linearup2 if filter is linear

Tested on qemu with Intel host:
[ RUN      ] LibYUVBaseTest.TestCpuHas
Cpu Flags 805306369
Has RISCV 268435456
Has RVV 536870912
Has RVVZVFH 0
Has X86 0

Bug: libyuv:956, libyuv:959, libyuv:960
Change-Id: I4a1b66f83d82ba127780f52526153d586db90111
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4429570
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Randall Bosetti <rlb@google.com>
2023-04-18 20:29:04 +00:00
Darren Hsieh
44396e6e9a Add ARGBToRAWRow_RVV, ARGBToRGB24Row_RVV, RGB24ToARGBRow_RVV
* Run on SiFive internal FPGA:

ARGBToRAW_Opt (~1.55x vs scalar)

ARGBToRGB24_Opt (~1.44x vs scalar)

RGB24ToARGB_Opt (~1.77x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Bug: libyuv:956

Change-Id: I26722f6848cd68684d95d9a7ee06ce0416e7985d
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4413083
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-13 19:33:16 +00:00
Frank Barchard
68659d0d68 UVScale down by 2 fix for C and optimize for NEON
- update cpu_id to use "re" for fopen to avoid leaking handles if a thread is started while the file is open.

Bug: libyuv:958
Change-Id: I1af9de68fce12e440e1226fc8070634ccb1bf090
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4417176
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-12 22:49:20 +00:00
Frank Barchard
ee3e71c7ce Any functions use memset(vin, 0, sizeof(vin)) for GCC warning fix
- Fix -Wmemset-elt-size warning for GCC
- Use vin for inputs and vout for outputs

Bug: None
Change-Id: Iefd418dc884b4d062e1fdd9215319c8838c49eaa
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4412065
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2023-04-10 20:44:10 +00:00
Darren Hsieh
724e7aee03 Fix macro define typo in scale_uv.cc
The correct define can be found in scale_row.h

Change-Id: I633ed47006c7bd8014038493005c2d934489ff18
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4411353
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-10 16:55:48 +00:00
James Zern
0200037a5a row_any,ANYDETILE: fix -Wmemset-elt-size warning
under gcc 12.2.0 using -Wall:

source/row_any.cc: In function ‘void libyuv::DetileRow_16_Any_SSE2(const
                       uint16_t*, ptrdiff_t, uint16_t*, int)’:
source/row_any.cc:2287:11: warning: ‘memset’ used with length equal to
number of elements without multiplication by element size
[-Wmemset-elt-size]
 2287 |     memset(temp, 0, 16 * BPP); /* for msan */
      |     ~~~~~~^~~~~~~~~~~~~~~~~~~~~~~~~
source/row_any.cc:2308:1: note: in expansion of macro ‘ANYDETILE’
 2308 | ANYDETILE(DetileRow_16_Any_SSE2, DetileRow_16_SSE2, uint16_t, 2, 15)

This increases the memset to the full buffer size, which may not be
strictly necessary.

Change-Id: Iea2fc649990ee84ea9aa8020d6f6b25e012b18fb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4406599
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-08 19:01:02 +00:00
Darren Hsieh
e8af6cb2e4 Add RAWToARGBRow_RVV,RAWToRGBARow_RVV,RAWToRGB24Row_RVV
* Run on SiFive internal FPGA:

RAWToARGB_Opt (~2x vs scalar)

RAWToRGBA_Opt (~2x vs scalar)

RAWToRGB24_Opt (~1.5x vs scalar)

LIBYUV_WIDTH=1280 LIBYUV_HEIGHT=720 LIBYUV_REPEAT=10

Change-Id: I21a13d646589ea2aa3822cb9225f5191068c285b
Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4408357
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-04-07 18:45:08 +00:00
Darren Hsieh
aa47d668d8 Add riscv cpu info detection.
* Supports:

 * The standard single-letter Vector detection.

 * Vector fp16 detection.

Signed-off-by: Darren Hsieh <darren.hsieh@sifive.com>
Change-Id: Ia7ee1bd8ec1a990f1b2b1700805942e99c0aa87b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4401738
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-04-06 15:58:29 +00:00
Wan-Teh Chang
ec48e4328e Add assertions for the Clang static analyzer
The Clang static analyzer (scan-build) in LLVM 14 warns about
array index out of bounds in scaletbl[boxwidth - minboxwidth] in
ScaleAddCols2_C() and ScaleAddCols2_16_C(). The scaletbl array has two
elements. It's not clear the index boxwidth - minboxwidth is either 0 or
1.

Change-Id: I072476e86950154beffe6b1a89915755118b3cbd
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4403882
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Wan-Teh Chang <wtc@google.com>
2023-04-05 21:58:22 +00:00
Frank Barchard
464c51a035 AArch32 YUVTORGB_SETUP use load and dup to avoid modifying pointer
- Allows code to be optimized with clang 17 -flto-thin
- Bump version number to 1864 to allow detection of fix
- Apply clang format to standardize formatting; No impact on code generated

Bug: chromium:1424089
Change-Id: Ib745836b27915a5e4cb1d7d928ee52659360612b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370052
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2023-03-24 19:32:30 +00:00
Frank Barchard
1a971f8cc3 clang 17 -flto-thin bug fix for Neon YUVtoRGB and ARGBToRGB565Dither
- YUV to RGB AArch32 kRGBCoeffBias rewind pointer
- ARGBToRGB565Dither declare width and source pointers as modified


Bug: chromium:1424089
Change-Id: I987180652331bab16ce27d8d166399a687ee890e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4370099
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-03-24 10:59:40 +00:00
Frank Barchard
3f219a3501 GCC warning fix for MT2T
- Fix redundent assignment compile warning in GCC
- Apply clang-format
- Bump version to 1863

Bug: libyuv:955
Change-Id: If2b6588cd5a7f068a1745fe7763e90caa7277101
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4344729
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
2023-03-16 06:57:20 +00:00
Justin Green
76468711d5 M2T2 Unpack fixes
Fix the algorithm for unpacking the lower 2 bits of M2T2 pixels.

Bug: b:258474032
Change-Id: Iea1d63f26e3f127a70ead26bc04ea3d939e793e3
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4337978
Commit-Queue: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-03-14 14:59:26 +00:00
Frank Barchard
f9b23b9cc0 Transpose 4x4 for SSE2 and AVX2
Skylake Xeon
AVX2 Transpose4x4_Opt (290 ms)
SSE2 Transpose4x4_Opt (302 ms)
C    Transpose4x4_Opt (522 ms)

AMD Zen2
AVX2 Transpose4x4_Opt (136 ms)
SSE2 Transpose4x4_Opt (137 ms)
C    Transpose4x4_Opt (431 ms)

Bug: None
Change-Id: I4997dbd5c5387c22bfd6c5960b421504e4bc8a2a
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4292946
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-03-03 17:46:23 +00:00
Frank Barchard
88b050f337 MergeUV AVX512BW use assembly
- Convert MergeUVRow_AVX512BW to assembly
- Enable MergeUVRow_AVX512BW for Windows with clangcl
- MergeUVRow_AVX2 use vpmovzxbw and vpsllw
- MergeUVRow_16_AVX2 use vpmovzxbw and vpsllw with different shift for U and V

AMD Zen 4 640x360 100000 iterations
Was
AVX512 MergeUVPlane_Opt (884 ms)
AVX2   MergeUVPlane_Opt (945 ms)
AVX2   MergeUVPlane_16_Opt (2167 ms)

Now
AVX512 MergeUVPlane_Opt (865 ms)
AVX2   MergeUVPlane_Opt (943 ms)
SSE2   MergeUVPlane_Opt (973 ms)
AVX2   MergeUVPlane_16_Opt (2102 ms)

Bug: None
Change-Id: I658ada2a75d44c3f93be8bd3ed96f83d5fa2ab8d
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4271230
Reviewed-by: Fritz Koenig <frkoenig@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-22 21:19:08 +00:00
Frank Barchard
2bdc210be9 MergeUV_AVX512BW for I420ToNV12
On Skylake Xeon 640x360 100000 iterations
AVX512   MergeUVPlane_Opt (1196 ms)
AVX2     MergeUVPlane_Opt (1565 ms)
SSE2     MergeUVPlane_Opt (1780 ms)
Pixel 7  MergeUVPlane_Opt (1177 ms)

Bug: None
Change-Id: If47d4fa957cf27781bba5fd6a2f0bf554101a5c6
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4242247
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
2023-02-13 20:14:57 +00:00
Sergio Garcia Murillo
b2528b0be9 Add support for odd width and height in I410ToI420
Bug: libyuv:950
Change-Id: Ic9a094463af875aefd927023f730b5f35f8551de
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4154630
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-01-23 19:05:00 +00:00
Hao Chen
0809713775 Refine some functions on the Longarch platform.
Add ARGBToYMatrixRow_LSX/LASX, RGBAToYMatrixRow_LSX/LASX and
RGBToYMatrixRow_LSX/LASX functions with RgbConstants argument.

Bug: libyuv:912
Change-Id: I956e639d1f0da4a47a55b79c9d41dcd29e29bdc5
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4167860
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Mirko Bonadei <mbonadei@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-18 18:54:14 +00:00
Frank Barchard
0faf8dd0e0 Fix for DivideRow_NEON functions
- was dup of 8h but mul of 4s.  now use umull

Bug: libyuv:951
Change-Id: If6cb01f5f006c2235886b81ce120642d7e24a9bb
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166563
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-18 00:30:05 +00:00
Frank Barchard
541d8efbaf Fix for divide row functions used by P010ToI010
Bug: libyuv:951
Change-Id: Id323656cb6f99b1be0be7aaa854d3cc15feeba69
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166562
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-17 21:40:45 +00:00
Frank Barchard
d5aa3d4a76 P010ToI010 and P012ToI012 conversion functions
- Convert 10 and 12 bit biplanar formats to planar.
- Shift 10 MSB to 10 LSB
- P010 is similar to NV12 in layout, but uses 10 MSB of 16 bit values.
- I010 is similar to I420 in layout, but uses 10 LSB of 16 bit values.

Bug: libyuv:951
Change-Id: I16a1bc64239d0fa4f41810910da448bf5720935f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4166560
Reviewed-by: Justin Green <greenjustin@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-13 19:20:12 +00:00
Frank Barchard
6e4b0acb4b I422Rotate take stride for temporary buffers
- Minor variable name changes first/last to top/bottom
- Comments explaining rotate temporary buffers usage
- Add asserts for scale parameter
- Use NULL and stddef.h instead of 0
- Use void * for allocation in row.h
- Add () around size parameter in macros


Bug: libyuv:926, libyuv:949
Change-Id: Ib55417570926ccada0a0f8abd1753dc12e5b162e
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4136762
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-04 23:11:52 +00:00
Sergio Garcia Murillo
f8626a7224 Add 10 bit rotate methods.
This initial implementation is based on current unoptimized code in webrtc using just plain for loops.

Bug: libyuv:949
Change-Id: Ic87ee49c3a0b62edbaaa4255c263c1f7be4ea02b
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110782
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-04 21:10:01 +00:00
Sergio Garcia Murillo
22a579c438 Use ScalePlaneDown2_16To8 for avoiding the 2 step process
Bug: libyuv:950
Change-Id: I5a77bca9a0230fe00abd810939e217833a14683f
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4134524
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
2023-01-03 21:41:21 +00:00
Sergio Garcia Murillo
f583b1b4b8 Add I410Copy and I410ToI420 methods
The I410To420 implementation does a two step approach for scaling down and 10-to-8 bit conversion using the Y plane as temporal storage.

Bug: libyuv:950
Change-Id: I3d35fad4b99e17253230456233fbd947e013c0ec
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4110783
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
2023-01-03 20:27:28 +00:00
Frank Barchard
3abd6f36b6 Casting for scale functions
- MT2T support for source strides added, but only works for positive values.
- Reduced casting in row_common - one cast per assignment.
- scaling functions use intptr_t for intermediate calculations, then cast strides to ptrdiff_t

Bug: libyuv:948, b/257266635, b/262468594
Change-Id: I0409a0ce916b777da2a01c0ab0b56dccefed3b33
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4102203
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Wan-Teh Chang <wtc@google.com>
Reviewed-by: Justin Green <greenjustin@google.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ernest Hua <ernesthua@google.com>
2022-12-15 22:34:22 +00:00
Frank Barchard
610e0cdead MT2T Warning fixes for fuchsia
Bug: b/258474032, b/257266635
Change-Id: Ic5cbbc60e2e1463361e359a2fe3e97976c1ea929
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/4081348
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Justin Green <greenjustin@google.com>
2022-12-06 19:54:40 +00:00