BGRAToI420 use BgraConstants for a direct conversion using AVX512BW

row win (msvc)
Was C/SSSE3
BGRAToARGB_Opt (594 ms)
BGRAToARGB_Endswap_Opt (609 ms)
BGRAToI420_Opt (122 ms)

Now AVX2
BGRAToARGB_Opt (100 ms)
BGRAToARGB_Endswap_Opt (99 ms)
BGRAToI420_Opt (115 ms)

Clang/GCC AVX512BW
BGRAToARGB_Opt (86 ms)
BGRAToARGB_Endswap_Opt (91 ms)
BGRAToI420_Opt (110 ms)


Bug: 42280902
Change-Id: I52cb2b0cacea8f2f0b138ec3cc521185dbef8595
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7905821
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
This commit is contained in:
Frank Barchard 2026-06-08 11:14:55 -07:00 committed by libyuv-scoped@luci-project-accounts.iam.gserviceaccount.com
parent 95eedb9687
commit 4be798d7c5
37 changed files with 2192 additions and 1608 deletions

View File

@ -1,44 +1,62 @@
# Gemini Project Context: libyuv Row Functions
This file provides context for the core row-processing architecture of libyuv. Use these guidelines when refactoring, reviewing, or generating code within the `row_*.cc` files.
This file provides context for the core row-processing architecture of
libyuv. Use these guidelines when refactoring, reviewing, or generating
code within the `row_*.cc` files.
## Architectural Overview
Libyuv uses a dispatch system where high-level conversion functions call optimized "Row" functions. These functions are categorized by SIMD architecture and compiler compatibility.
Libyuv uses a dispatch system where high-level conversion functions call
optimized "Row" functions. These functions are categorized by SIMD architecture
and compiler compatibility.
## Source File Map
### x86 Architectures (32-bit and 64-bit)
* **row_gcc.cc**: **Master copy.** Contains inline assembly in GCC syntax for GCC and Clang. Supports AVX, and AVX512. AVX512 implementations are strictly for 64-bit targets.
* **row_win.cc**: Derivative of `row_gcc.cc`. Contains C++ intrinsics specifically for Visual C++ (MSVC). Can be tested with Clang using `-DLIBYUV_ENABLE_ROWWIN`.
* **row_gcc.cc**: **Master copy.** Contains inline assembly in GCC syntax for
GCC and Clang. Supports AVX, and AVX512. AVX512 implementations are strictly
for 64-bit targets.
* **row_win.cc**: Derivative of `row_gcc.cc`. Contains C++ intrinsics
specifically for Visual C++ (MSVC). Can be tested with Clang using
`-DLIBYUV_ENABLE_ROWWIN`.
* **Note**: Use either `row_gcc` or `row_win`, never both.
### ARM Architectures
* **row_neon.cc**: 32-bit ARM. Written entirely in inline assembly for GCC/Clang.
* **row_neon64.cc**: 64-bit ARM (AArch64). Written entirely in inline assembly for GCC/Clang.
* **row_neon.cc**: 32-bit ARM. Written entirely in inline assembly for
GCC/Clang.
* **row_neon64.cc**: 64-bit ARM (AArch64). Written entirely in inline assembly
for GCC/Clang.
* **row_sve.cc**: ARMv9 Scalable Vector Extensions (SVE).
* **row_sme.cc**: ARMv9 Scalable Matrix Extension (SME) and Streaming SVE (SSVE).
* **row_sme.cc**: ARMv9 Scalable Matrix Extension (SME) and Streaming SVE
(SSVE).
### Other Architectures
* **row_rvv.cc**: RISC-V Vector (RVV). Implemented using intrinsics. Optimized for SiFive X280.
* **row_rvv.cc**: RISC-V Vector (RVV). Implemented using intrinsics. Optimized
for SiFive X280.
* **row_lsx.cc / row_lasx.cc**: Loongarch MIPS-like extensions.
### Utility and Fallbacks
* **row_common.cc**: Portable C/C++ versions. This is the reference implementation.
* **row_any.cc**: Handles "remainder" pixels for widths not multiples of SIMD register size. Used for x86, NEON, and MIPS. Not required for SVE, SME, or RVV due to hardware-level masking.
* **row_common.cc**: Portable C/C++ versions. This is the reference
implementation.
* **row_any.cc**: Handles "remainder" pixels for widths not multiples of SIMD
register size. Used for x86, NEON, and MIPS. Not required for SVE, SME, or
RVV due to hardware-level masking.
## Coding Guidelines
1. **AVX512 Logic**: AVX512 row functions are strictly enabled for **64-bit x86 only**.
2. **Feature Macros**: Use the `HAS_` macros in `include/libyuv/row.h` to enable or disable specific AVX512 versions.
1. **AVX512 Logic**: AVX512 row functions are strictly enabled for **64-bit x86
only**.
2. **Feature Macros**: Use the `HAS_` macros in `include/libyuv/row.h` to
enable or disable specific AVX512 versions.
## Changelist (CL) & Commit Guidelines
When generating descriptions, follow the Chromium/Google standard format. Wrap commit message text at 72 characters
When generating descriptions, follow the Chromium/Google standard format. Wrap
commit message text at 72 characters
### Format Example:

View File

@ -1,6 +1,6 @@
Name: libyuv
URL: https://chromium.googlesource.com/libyuv/libyuv/
Version: 1946
Version: 1947
Revision: DEPS
License: BSD-3-Clause
License File: LICENSE

View File

@ -24,9 +24,10 @@ extern "C" {
// This module is for Visual C 32/64 bit
#if !defined(LIBYUV_DISABLE_X86) && \
(defined(__x86_64__) || defined(__i386__) || \
defined(_M_X64) || defined(_M_X86))
#if ((defined(_MSC_VER) && !defined(__clang__)) || defined(LIBYUV_ENABLE_ROWWIN))
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
defined(_M_X86))
#if ((defined(_MSC_VER) && !defined(__clang__)) || \
defined(LIBYUV_ENABLE_ROWWIN))
#define USE_ROW_WIN
#else
#define USE_ROW_GCC
@ -122,8 +123,8 @@ extern "C" {
// The following are available on all x86 platforms, but
// require VS2012, clang 3.4 or gcc 4.7.
#if !defined(LIBYUV_DISABLE_X86) && \
(defined(__x86_64__) || defined(__i386__) || \
defined(_M_X64) || defined(_M_X86))
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
defined(_M_X86))
#define HAS_ARGBMIRRORROW_AVX2
#define HAS_RGB24MIRRORROW_AVX2
#define HAS_ARGBTOUVMATRIXROW_AVX2
@ -343,8 +344,8 @@ extern "C" {
// This module is for Visual C 32/64 bit
#if !defined(LIBYUV_DISABLE_X86) && defined(USE_ROW_WIN) && \
(defined(__x86_64__) || defined(__i386__) || \
defined(_M_X64) || defined(_M_X86)) && \
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
defined(_M_X86)) && \
((defined(_MSC_VER) && !defined(__clang__)) || \
defined(LIBYUV_ENABLE_ROWWIN))
#define HAS_RAWTOARGBROW_AVX2
@ -352,9 +353,11 @@ extern "C" {
#define HAS_RGB565TOARGBROW_AVX2
#define HAS_ARGB1555TOARGBROW_AVX2
#define HAS_ARGB4444TOARGBROW_AVX2
#define HAS_ARGBSHUFFLEROW_AVX2
#if defined(__x86_64__) || defined(_M_X64)
#define HAS_RAWTOARGBROW_AVX512BW
#define HAS_RGB24TOARGBROW_AVX512BW
#define HAS_ARGBSHUFFLEROW_AVX512BW
#endif
#define HAS_ARGBTOYROW_AVX2
#define HAS_ARGBTOYMATRIXROW_AVX2
@ -383,7 +386,6 @@ extern "C" {
#endif
#define HAS_ARGBTORGB24ROW_AVX512VBMI
#define HAS_CONVERT16TO8ROW_AVX512BW
#define HAS_MERGEUVROW_AVX512BW
#endif
// The following are available for AVX512 clang x64 platforms:
@ -401,6 +403,11 @@ extern "C" {
#define HAS_ARGBTOUVJROW_AVX512BW
#define HAS_ARGBTOUVMATRIXROW_AVX512BW
#define HAS_J400TOARGBROW_AVX512BW
#define HAS_MERGEUVROW_AVX512BW
#define HAS_MIRRORROW_AVX512BW
#define HAS_MIRRORSPLITUVROW_AVX512BW
#define HAS_SPLITUVROW_AVX512BW
#define HAS_RGBTOUVMATRIXROW_AVX512BW
#endif
// The following are available on Neon platforms:
@ -1097,8 +1104,7 @@ struct ArgbConstants {
#define IACA_UD_BYTES __asm__ __volatile__("\n\t .byte 0x0F, 0x0B");
#else /* Visual C */
#define IACA_UD_BYTES \
{ __asm _emit 0x0F __asm _emit 0x0B }
#define IACA_UD_BYTES {__asm _emit 0x0F __asm _emit 0x0B}
#define IACA_SSC_MARK(x) \
{__asm mov ebx, x __asm _emit 0x64 __asm _emit 0x67 __asm _emit 0x90}
@ -1107,16 +1113,8 @@ struct ArgbConstants {
#define IACA_VC64_END __writegsbyte(222, 222);
#endif
#define IACA_START \
{ \
IACA_UD_BYTES \
IACA_SSC_MARK(111) \
}
#define IACA_END \
{ \
IACA_SSC_MARK(222) \
IACA_UD_BYTES \
}
#define IACA_START {IACA_UD_BYTES IACA_SSC_MARK(111)}
#define IACA_END {IACA_SSC_MARK(222) IACA_UD_BYTES}
void I210AlphaToARGBRow_NEON(const uint16_t* src_y,
const uint16_t* src_u,
@ -2194,10 +2192,26 @@ void RGB565ToYMatrixRow_C(const uint8_t* src_rgb565,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB1555ToYMatrixRow_C(const uint8_t* src_argb1555, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB1555ToUVMatrixRow_C(const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void ARGB4444ToYMatrixRow_C(const uint8_t* src_argb4444, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB4444ToUVMatrixRow_C(const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void ARGB1555ToYMatrixRow_C(const uint8_t* src_argb1555,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB1555ToUVMatrixRow_C(const uint8_t* src_argb1555,
int src_stride_argb1555,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void ARGB4444ToYMatrixRow_C(const uint8_t* src_argb4444,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB4444ToUVMatrixRow_C(const uint8_t* src_argb4444,
int src_stride_argb4444,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGB565ToUVMatrixRow_C(const uint8_t* src_rgb565,
int src_stride_rgb565,
uint8_t* dst_u,
@ -2210,8 +2224,30 @@ void ARGBToUVMatrixRow_SSSE3(const uint8_t* src_argb,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGBToUVMatrixRow_Any_AVX2(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb,
int src_stride_rgb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGBToUVMatrixRow_Any_AVX2(const uint8_t* src_rgb,
int src_stride_rgb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGBToUVMatrixRow_AVX512BW(const uint8_t* src_rgb,
int src_stride_rgb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGBToUVMatrixRow_Any_AVX512BW(const uint8_t* src_rgb,
int src_stride_rgb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void ARGBToUVMatrixRow_AVX2(const uint8_t* src_argb,
int src_stride_argb,
uint8_t* dst_u,
@ -2301,18 +2337,66 @@ void RGB565ToUVMatrixRow_Any_AVX2(const uint8_t* src_rgb565,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGB565ToYMatrixRow_NEON(const uint8_t* src_rgb565, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB1555ToYMatrixRow_NEON(const uint8_t* src_argb1555, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB1555ToUVMatrixRow_NEON(const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void ARGB4444ToYMatrixRow_NEON(const uint8_t* src_argb4444, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB4444ToUVMatrixRow_NEON(const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGB565ToUVMatrixRow_NEON(const uint8_t* src_rgb565, int src_stride_rgb565, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGB565ToYMatrixRow_Any_NEON(const uint8_t* src_rgb565, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB1555ToYMatrixRow_Any_NEON(const uint8_t* src_argb1555, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB1555ToUVMatrixRow_Any_NEON(const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void ARGB4444ToYMatrixRow_Any_NEON(const uint8_t* src_argb4444, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void ARGB4444ToUVMatrixRow_Any_NEON(const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGB565ToUVMatrixRow_Any_NEON(const uint8_t* src_rgb565, int src_stride_rgb565, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGB565ToYMatrixRow_NEON(const uint8_t* src_rgb565,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB1555ToYMatrixRow_NEON(const uint8_t* src_argb1555,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB1555ToUVMatrixRow_NEON(const uint8_t* src_argb1555,
int src_stride_argb1555,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void ARGB4444ToYMatrixRow_NEON(const uint8_t* src_argb4444,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB4444ToUVMatrixRow_NEON(const uint8_t* src_argb4444,
int src_stride_argb4444,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGB565ToUVMatrixRow_NEON(const uint8_t* src_rgb565,
int src_stride_rgb565,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGB565ToYMatrixRow_Any_NEON(const uint8_t* src_rgb565,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB1555ToYMatrixRow_Any_NEON(const uint8_t* src_argb1555,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB1555ToUVMatrixRow_Any_NEON(const uint8_t* src_argb1555,
int src_stride_argb1555,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void ARGB4444ToYMatrixRow_Any_NEON(const uint8_t* src_argb4444,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void ARGB4444ToUVMatrixRow_Any_NEON(const uint8_t* src_argb4444,
int src_stride_argb4444,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGB565ToUVMatrixRow_Any_NEON(const uint8_t* src_rgb565,
int src_stride_rgb565,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void ARGBToYMatrixRow_AVX2(const uint8_t* src_argb,
uint8_t* dst_y,
@ -2340,9 +2424,22 @@ void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
int width,
const struct ArgbConstants* c);
void RGBToUVMatrixRow_NEON(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGBToYMatrixRow_Any_NEON(const uint8_t* src_rgb, uint8_t* dst_y, int width, const struct ArgbConstants* c);
void RGBToUVMatrixRow_Any_NEON(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
void RGBToUVMatrixRow_NEON(const uint8_t* src_rgb,
int src_stride_rgb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void RGBToYMatrixRow_Any_NEON(const uint8_t* src_rgb,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c);
void RGBToUVMatrixRow_Any_NEON(const uint8_t* src_rgb,
int src_stride_rgb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c);
void ARGBToYMatrixRow_NEON_DotProd(const uint8_t* src_argb,
uint8_t* dst_y,
@ -2374,7 +2471,6 @@ void ARGBToYMatrixRow_Any_LASX(const uint8_t* src_argb,
int width,
const struct ArgbConstants* c);
void ARGBToUV444MatrixRow_SSSE3(const uint8_t* src_argb,
uint8_t* dst_u,
uint8_t* dst_v,
@ -2432,15 +2528,29 @@ void RGBAToYRow_C(const uint8_t* src_rgb, uint8_t* dst_y, int width);
void RGB565ToYRow_C(const uint8_t* src_rgb565, uint8_t* dst_y, int width);
void ARGB1555ToYRow_C(const uint8_t* src_argb1555, uint8_t* dst_y, int width);
void ARGB4444ToYRow_C(const uint8_t* src_argb4444, uint8_t* dst_y, int width);
void ARGBToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void ARGBToYJRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void ABGRToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void ABGRToYJRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void ARGBToYRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void ARGBToYJRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void ABGRToYRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void ABGRToYJRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void RGBAToYRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void RGBAToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void RGBAToYJRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void RGBAToYRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void RGBAToYJRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void BGRAToYRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void BGRAToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void BGRAToYRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void ARGBToYRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void ARGBToYJRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void ABGRToYJRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
@ -3040,12 +3150,16 @@ void ARGBToUVJ444Row_C(const uint8_t* src_argb,
uint8_t* dst_v,
int width);
void MirrorRow_AVX512BW(const uint8_t* src, uint8_t* dst, int width);
void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width);
void MirrorRow_SSSE3(const uint8_t* src, uint8_t* dst, int width);
void MirrorRow_NEON(const uint8_t* src, uint8_t* dst, int width);
void MirrorRow_LSX(const uint8_t* src, uint8_t* dst, int width);
void MirrorRow_LASX(const uint8_t* src, uint8_t* dst, int width);
void MirrorRow_C(const uint8_t* src, uint8_t* dst, int width);
void MirrorRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void MirrorRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void MirrorRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void MirrorRow_Any_SSE2(const uint8_t* src, uint8_t* dst, int width);
@ -3063,6 +3177,10 @@ void MirrorUVRow_Any_NEON(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void MirrorUVRow_Any_LSX(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void MirrorUVRow_Any_LASX(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void MirrorSplitUVRow_AVX512BW(const uint8_t* src,
uint8_t* dst_u,
uint8_t* dst_v,
int width);
void MirrorSplitUVRow_AVX2(const uint8_t* src,
uint8_t* dst_u,
uint8_t* dst_v,
@ -3124,6 +3242,10 @@ void SplitUVRow_SSE2(const uint8_t* src_uv,
uint8_t* dst_u,
uint8_t* dst_v,
int width);
void SplitUVRow_AVX512BW(const uint8_t* src_uv,
uint8_t* dst_u,
uint8_t* dst_v,
int width);
void SplitUVRow_AVX2(const uint8_t* src_uv,
uint8_t* dst_u,
uint8_t* dst_v,
@ -3140,6 +3262,10 @@ void SplitUVRow_RVV(const uint8_t* src_uv,
uint8_t* dst_u,
uint8_t* dst_v,
int width);
void SplitUVRow_Any_AVX512BW(const uint8_t* src_ptr,
uint8_t* dst_u,
uint8_t* dst_v,
int width);
void SplitUVRow_Any_SSE2(const uint8_t* src_ptr,
uint8_t* dst_u,
uint8_t* dst_v,
@ -4160,8 +4286,12 @@ void RGB24ToARGBRow_SSSE3(const uint8_t* src_rgb24,
int width);
void RAWToARGBRow_SSSE3(const uint8_t* src_raw, uint8_t* dst_argb, int width);
void RAWToARGBRow_AVX2(const uint8_t* src_raw, uint8_t* dst_argb, int width);
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width);
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, int width);
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24,
uint8_t* dst_argb,
int width);
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw,
uint8_t* dst_argb,
int width);
void RAWToRGBARow_SSSE3(const uint8_t* src_raw, uint8_t* dst_rgba, int width);
void RAWToRGB24Row_SSSE3(const uint8_t* src_raw, uint8_t* dst_rgb24, int width);
@ -4250,9 +4380,7 @@ void RGB24ToARGBRow_Any_SSSE3(const uint8_t* src_ptr,
void RAWToARGBRow_Any_SSSE3(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void RAWToARGBRow_Any_AVX2(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void RAWToARGBRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
void RGB24ToARGBRow_Any_AVX2(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
@ -4272,7 +4400,6 @@ void RAWToRGB24Row_Any_SSSE3(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);
void RGB565ToARGBRow_Any_AVX2(const uint8_t* src_ptr,
uint8_t* dst_ptr,
int width);

View File

@ -631,8 +631,8 @@ static inline void I422ToRGB565Row_SVE_SC(
// Calculate a predicate for the final iteration to deal with the tail.
"cnth %[vl] \n"
"whilelt p1.b, wzr, %w[width] \n" //
READYUV422_SVE_2X I422TORGB_SVE_2X RGBTOARGB8_SVE_TOP_2X
RGB8TORGB565_SVE_FROM_TOP_2X
READYUV422_SVE_2X I422TORGB_SVE_2X
RGBTOARGB8_SVE_TOP_2X RGB8TORGB565_SVE_FROM_TOP_2X
// Need to permute the data on the final iteration such that the
// predicates (.b) line up with the 16-bit element data.
"trn1 z20.b, z18.b, z19.b \n"
@ -694,8 +694,8 @@ static inline void I422ToARGB1555Row_SVE_SC(
// Calculate a predicate for the final iteration to deal with the tail.
"cnth %[vl] \n"
"whilelt p1.b, wzr, %w[width] \n" //
READYUV422_SVE_2X I422TORGB_SVE_2X RGBTOARGB8_SVE_TOP_2X
RGB8TOARGB1555_SVE_FROM_TOP_2X
READYUV422_SVE_2X I422TORGB_SVE_2X
RGBTOARGB8_SVE_TOP_2X RGB8TOARGB1555_SVE_FROM_TOP_2X
"st2h {z0.h, z1.h}, p1, [%[dst]] \n"
"99: \n"
@ -753,8 +753,8 @@ static inline void I422ToARGB4444Row_SVE_SC(
// Calculate a predicate for the final iteration to deal with the tail.
"cnth %[vl] \n"
"whilelt p1.b, wzr, %w[width] \n" //
READYUV422_SVE_2X I422TORGB_SVE_2X RGBTOARGB8_SVE_TOP_2X
RGB8TOARGB4444_SVE_FROM_TOP_2X
READYUV422_SVE_2X I422TORGB_SVE_2X
RGBTOARGB8_SVE_TOP_2X RGB8TOARGB4444_SVE_FROM_TOP_2X
"st2h {z0.h, z1.h}, p1, [%[dst]] \n"
"99: \n"

View File

@ -11,6 +11,6 @@
#ifndef INCLUDE_LIBYUV_VERSION_H_
#define INCLUDE_LIBYUV_VERSION_H_
#define LIBYUV_VERSION 1946
#define LIBYUV_VERSION 1947
#endif // INCLUDE_LIBYUV_VERSION_H_

View File

@ -41,8 +41,9 @@ uint32_t HammingDistance_SSE42(const uint8_t* src_a,
return diff;
}
__declspec(naked) uint32_t
SumSquareError_SSE2(const uint8_t* src_a, const uint8_t* src_b, int count) {
__declspec(naked) uint32_t SumSquareError_SSE2(const uint8_t* src_a,
const uint8_t* src_b,
int count) {
__asm {
mov eax, [esp + 4] // src_a
mov edx, [esp + 8] // src_b
@ -81,8 +82,9 @@ __declspec(naked) uint32_t
#ifdef HAS_SUMSQUAREERROR_AVX2
// C4752: found Intel(R) Advanced Vector Extensions; consider using /arch:AVX.
#pragma warning(disable : 4752)
__declspec(naked) uint32_t
SumSquareError_AVX2(const uint8_t* src_a, const uint8_t* src_b, int count) {
__declspec(naked) uint32_t SumSquareError_AVX2(const uint8_t* src_a,
const uint8_t* src_b,
int count) {
__asm {
mov eax, [esp + 4] // src_a
mov edx, [esp + 8] // src_b
@ -146,8 +148,9 @@ uvec32 kHashMul3 = {
0x00000001, // 33 ^ 0
};
__declspec(naked) uint32_t
HashDjb2_SSE41(const uint8_t* src, int count, uint32_t seed) {
__declspec(naked) uint32_t HashDjb2_SSE41(const uint8_t* src,
int count,
uint32_t seed) {
__asm {
mov eax, [esp + 4] // src
mov ecx, [esp + 8] // count
@ -197,8 +200,9 @@ __declspec(naked) uint32_t
// Visual C 2012 required for AVX2.
#ifdef HAS_HASHDJB2_AVX2
__declspec(naked) uint32_t
HashDjb2_AVX2(const uint8_t* src, int count, uint32_t seed) {
__declspec(naked) uint32_t HashDjb2_AVX2(const uint8_t* src,
int count,
uint32_t seed) {
__asm {
mov eax, [esp + 4] // src
mov ecx, [esp + 8] // count

View File

@ -13,12 +13,11 @@
#include <limits.h>
#include "libyuv/basic_types.h"
#include "libyuv/convert_from_argb.h"
#include "libyuv/cpu_id.h"
#include "libyuv/planar_functions.h"
#include "libyuv/convert_from_argb.h"
#include "libyuv/rotate.h"
#include "libyuv/row.h"
#include "libyuv/scale.h" // For ScalePlane()
#include "libyuv/scale_row.h" // For FixedDiv
#include "libyuv/scale_uv.h" // For UVScale()
@ -2034,8 +2033,8 @@ int ARGBToI420(const uint8_t* src_argb,
int width,
int height) {
return ARGBToI420Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kArgbI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kArgbI601Constants, width, height);
}
LIBYUV_API
@ -2439,8 +2438,8 @@ int BGRAToI420(const uint8_t* src_bgra,
int width,
int height) {
return ARGBToI420Matrix(src_bgra, src_stride_bgra, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kBgraI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kBgraI601Constants, width, height);
}
// Convert BGRA to I422.
@ -2456,8 +2455,8 @@ int BGRAToI422(const uint8_t* src_bgra,
int width,
int height) {
return ARGBToI422Matrix(src_bgra, src_stride_bgra, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kBgraI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kBgraI601Constants, width, height);
}
// Convert ABGR to I422.
@ -2473,8 +2472,8 @@ int ABGRToI422(const uint8_t* src_abgr,
int width,
int height) {
return ARGBToI422Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kAbgrI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kAbgrI601Constants, width, height);
}
// Convert RGBA to I422.
@ -2490,8 +2489,8 @@ int RGBAToI422(const uint8_t* src_rgba,
int width,
int height) {
return ARGBToI422Matrix(src_rgba, src_stride_rgba, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kRgbaI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kRgbaI601Constants, width, height);
}
// Convert ABGR to I420.
@ -2507,8 +2506,8 @@ int ABGRToI420(const uint8_t* src_abgr,
int width,
int height) {
return ARGBToI420Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kAbgrI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kAbgrI601Constants, width, height);
}
// Convert RGBA to I420.
@ -2524,8 +2523,8 @@ int RGBAToI420(const uint8_t* src_rgba,
int width,
int height) {
return ARGBToI420Matrix(src_rgba, src_stride_rgba, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kRgbaI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kRgbaI601Constants, width, height);
}
// Enabled if 1 pass is available
@ -2569,6 +2568,14 @@ int RGB24ToI420(const uint8_t* src_rgb24,
}
}
#endif
#if defined(HAS_RGBTOUVMATRIXROW_AVX512BW)
if (TestCpuFlag(kCpuHasAVX512BW)) {
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_AVX512BW;
if (IS_ALIGNED(width, 64)) {
RGBToUVMatrixRow = RGBToUVMatrixRow_AVX512BW;
}
}
#endif
#if defined(HAS_RGBTOUVMATRIXROW_NEON)
if (TestCpuFlag(kCpuHasNEON)) {
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_NEON;
@ -2603,9 +2610,11 @@ int RGB24ToI420(const uint8_t* src_rgb24,
}
for (y = 0; y < height - 1; y += 2) {
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width, &kArgbI601Constants);
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width,
&kArgbI601Constants);
RGBToYMatrixRow(src_rgb24, dst_y, width, &kArgbI601Constants);
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width, &kArgbI601Constants);
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width,
&kArgbI601Constants);
src_rgb24 += src_stride_rgb24 * 2;
dst_y += dst_stride_y * 2;
dst_u += dst_stride_u;
@ -2886,6 +2895,14 @@ int RAWToI420(const uint8_t* src_rgb24,
}
}
#endif
#if defined(HAS_RGBTOUVMATRIXROW_AVX512BW)
if (TestCpuFlag(kCpuHasAVX512BW)) {
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_AVX512BW;
if (IS_ALIGNED(width, 64)) {
RGBToUVMatrixRow = RGBToUVMatrixRow_AVX512BW;
}
}
#endif
#if defined(HAS_RGBTOUVMATRIXROW_NEON)
if (TestCpuFlag(kCpuHasNEON)) {
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_NEON;
@ -2920,9 +2937,11 @@ int RAWToI420(const uint8_t* src_rgb24,
}
for (y = 0; y < height - 1; y += 2) {
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width, &kArgbI601Constants);
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width,
&kArgbI601Constants);
RGBToYMatrixRow(src_rgb24, dst_y, width, &kArgbI601Constants);
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width, &kArgbI601Constants);
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width,
&kArgbI601Constants);
src_rgb24 += src_stride_rgb24 * 2;
dst_y += dst_stride_y * 2;
dst_u += dst_stride_u;
@ -3622,9 +3641,11 @@ int RGB565ToI420(const uint8_t* src_rgb565,
int y;
void (*RGB565ToUVMatrixRow)(const uint8_t* src_rgb565, int src_stride_rgb565,
uint8_t* dst_u, uint8_t* dst_v, int width,
const struct ArgbConstants* c) = RGB565ToUVMatrixRow_C;
void (*RGB565ToYMatrixRow)(const uint8_t* src_rgb565, uint8_t* dst_y, int width,
const struct ArgbConstants* c) = RGB565ToYMatrixRow_C;
const struct ArgbConstants* c) =
RGB565ToUVMatrixRow_C;
void (*RGB565ToYMatrixRow)(const uint8_t* src_rgb565, uint8_t* dst_y,
int width, const struct ArgbConstants* c) =
RGB565ToYMatrixRow_C;
#if defined(HAS_RGB565TOYMATRIXROW_AVX2)
if (TestCpuFlag(kCpuHasAVX2)) {
@ -3671,9 +3692,11 @@ int RGB565ToI420(const uint8_t* src_rgb565,
}
for (y = 0; y < height - 1; y += 2) {
RGB565ToUVMatrixRow(src_rgb565, src_stride_rgb565, dst_u, dst_v, width, &kArgbI601Constants);
RGB565ToUVMatrixRow(src_rgb565, src_stride_rgb565, dst_u, dst_v, width,
&kArgbI601Constants);
RGB565ToYMatrixRow(src_rgb565, dst_y, width, &kArgbI601Constants);
RGB565ToYMatrixRow(src_rgb565 + src_stride_rgb565, dst_y + dst_stride_y, width, &kArgbI601Constants);
RGB565ToYMatrixRow(src_rgb565 + src_stride_rgb565, dst_y + dst_stride_y,
width, &kArgbI601Constants);
src_rgb565 += src_stride_rgb565 * 2;
dst_y += dst_stride_y * 2;
dst_u += dst_stride_u;
@ -3681,7 +3704,8 @@ int RGB565ToI420(const uint8_t* src_rgb565,
}
if (height & 1) {
RGB565ToYMatrixRow(src_rgb565, dst_y, width, &kArgbI601Constants);
RGB565ToUVMatrixRow(src_rgb565, 0, dst_u, dst_v, width, &kArgbI601Constants);
RGB565ToUVMatrixRow(src_rgb565, 0, dst_u, dst_v, width,
&kArgbI601Constants);
}
return 0;
}
@ -3700,11 +3724,11 @@ int ARGB1555ToI420(const uint8_t* src_argb1555,
int y;
void (*ARGB1555ToUVMatrixRow)(
const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u,
uint8_t* dst_v, int width,
const struct ArgbConstants* c) = ARGB1555ToUVMatrixRow_C;
void (*ARGB1555ToYMatrixRow)(
const uint8_t* src_argb1555, uint8_t* dst_y, int width,
const struct ArgbConstants* c) = ARGB1555ToYMatrixRow_C;
uint8_t* dst_v, int width, const struct ArgbConstants* c) =
ARGB1555ToUVMatrixRow_C;
void (*ARGB1555ToYMatrixRow)(const uint8_t* src_argb1555, uint8_t* dst_y,
int width, const struct ArgbConstants* c) =
ARGB1555ToYMatrixRow_C;
#if defined(HAS_ARGB1555TOYMATRIXROW_AVX2)
if (TestCpuFlag(kCpuHasAVX2)) {
@ -3751,9 +3775,11 @@ int ARGB1555ToI420(const uint8_t* src_argb1555,
}
for (y = 0; y < height - 1; y += 2) {
ARGB1555ToUVMatrixRow(src_argb1555, src_stride_argb1555, dst_u, dst_v, width, &kArgbI601Constants);
ARGB1555ToUVMatrixRow(src_argb1555, src_stride_argb1555, dst_u, dst_v,
width, &kArgbI601Constants);
ARGB1555ToYMatrixRow(src_argb1555, dst_y, width, &kArgbI601Constants);
ARGB1555ToYMatrixRow(src_argb1555 + src_stride_argb1555, dst_y + dst_stride_y, width, &kArgbI601Constants);
ARGB1555ToYMatrixRow(src_argb1555 + src_stride_argb1555,
dst_y + dst_stride_y, width, &kArgbI601Constants);
src_argb1555 += src_stride_argb1555 * 2;
dst_y += dst_stride_y * 2;
dst_u += dst_stride_u;
@ -3761,7 +3787,8 @@ int ARGB1555ToI420(const uint8_t* src_argb1555,
}
if (height & 1) {
ARGB1555ToYMatrixRow(src_argb1555, dst_y, width, &kArgbI601Constants);
ARGB1555ToUVMatrixRow(src_argb1555, 0, dst_u, dst_v, width, &kArgbI601Constants);
ARGB1555ToUVMatrixRow(src_argb1555, 0, dst_u, dst_v, width,
&kArgbI601Constants);
}
return 0;
}
@ -3780,11 +3807,11 @@ int ARGB4444ToI420(const uint8_t* src_argb4444,
int y;
void (*ARGB4444ToUVMatrixRow)(
const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u,
uint8_t* dst_v, int width,
const struct ArgbConstants* c) = ARGB4444ToUVMatrixRow_C;
void (*ARGB4444ToYMatrixRow)(
const uint8_t* src_argb4444, uint8_t* dst_y, int width,
const struct ArgbConstants* c) = ARGB4444ToYMatrixRow_C;
uint8_t* dst_v, int width, const struct ArgbConstants* c) =
ARGB4444ToUVMatrixRow_C;
void (*ARGB4444ToYMatrixRow)(const uint8_t* src_argb4444, uint8_t* dst_y,
int width, const struct ArgbConstants* c) =
ARGB4444ToYMatrixRow_C;
#if defined(HAS_ARGB4444TOYMATRIXROW_AVX2)
if (TestCpuFlag(kCpuHasAVX2)) {
@ -3831,9 +3858,11 @@ int ARGB4444ToI420(const uint8_t* src_argb4444,
}
for (y = 0; y < height - 1; y += 2) {
ARGB4444ToUVMatrixRow(src_argb4444, src_stride_argb4444, dst_u, dst_v, width, &kArgbI601Constants);
ARGB4444ToUVMatrixRow(src_argb4444, src_stride_argb4444, dst_u, dst_v,
width, &kArgbI601Constants);
ARGB4444ToYMatrixRow(src_argb4444, dst_y, width, &kArgbI601Constants);
ARGB4444ToYMatrixRow(src_argb4444 + src_stride_argb4444, dst_y + dst_stride_y, width, &kArgbI601Constants);
ARGB4444ToYMatrixRow(src_argb4444 + src_stride_argb4444,
dst_y + dst_stride_y, width, &kArgbI601Constants);
src_argb4444 += src_stride_argb4444 * 2;
dst_y += dst_stride_y * 2;
dst_u += dst_stride_u;
@ -3841,7 +3870,8 @@ int ARGB4444ToI420(const uint8_t* src_argb4444,
}
if (height & 1) {
ARGB4444ToYMatrixRow(src_argb4444, dst_y, width, &kArgbI601Constants);
ARGB4444ToUVMatrixRow(src_argb4444, 0, dst_u, dst_v, width, &kArgbI601Constants);
ARGB4444ToUVMatrixRow(src_argb4444, 0, dst_u, dst_v, width,
&kArgbI601Constants);
}
return 0;
}

View File

@ -35,8 +35,8 @@ int ARGBToI444(const uint8_t* src_argb,
int width,
int height) {
return ARGBToI444Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kArgbI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kArgbI601Constants, width, height);
}
LIBYUV_API
@ -54,10 +54,9 @@ int ARGBToI444Matrix(const uint8_t* src_argb,
int y;
void (*ARGBToYMatrixRow)(const uint8_t* src_argb, uint8_t* dst_y, int width,
const struct ArgbConstants* c) = ARGBToYMatrixRow_C;
void (*ARGBToUV444MatrixRow)(const uint8_t* src_argb, uint8_t* dst_u,
uint8_t* dst_v, int width,
const struct ArgbConstants* c) =
ARGBToUV444MatrixRow_C;
void (*ARGBToUV444MatrixRow)(
const uint8_t* src_argb, uint8_t* dst_u, uint8_t* dst_v, int width,
const struct ArgbConstants* c) = ARGBToUV444MatrixRow_C;
#if defined(HAS_ARGBTOYMATRIXROW_SSSE3)
if (TestCpuFlag(kCpuHasSSSE3)) {
@ -188,8 +187,8 @@ int ARGBToI422(const uint8_t* src_argb,
int width,
int height) {
return ARGBToI422Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kArgbI601Constants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kArgbI601Constants, width, height);
}
LIBYUV_API
@ -359,8 +358,9 @@ int ARGBToNV12(const uint8_t* src_argb,
int dst_stride_uv,
int width,
int height) {
return ARGBToNV12Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_uv,
dst_stride_uv, &kArgbI601Constants, width, height);
return ARGBToNV12Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y,
dst_uv, dst_stride_uv, &kArgbI601Constants, width,
height);
}
LIBYUV_API
@ -864,7 +864,8 @@ int ARGBToYUY2Matrix(const uint8_t* src_argb,
int y;
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
uint8_t* dst_u, uint8_t* dst_v, int width,
const struct ArgbConstants* c) = ARGBToUVMatrixRow_C;
const struct ArgbConstants* c) =
ARGBToUVMatrixRow_C;
void (*ARGBToYMatrixRow)(const uint8_t* src_argb, uint8_t* dst_y, int width,
const struct ArgbConstants* c) = ARGBToYMatrixRow_C;
void (*I422ToYUY2Row)(const uint8_t* src_y, const uint8_t* src_u,
@ -976,7 +977,8 @@ int ARGBToUYVYMatrix(const uint8_t* src_argb,
int y;
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
uint8_t* dst_u, uint8_t* dst_v, int width,
const struct ArgbConstants* c) = ARGBToUVMatrixRow_C;
const struct ArgbConstants* c) =
ARGBToUVMatrixRow_C;
void (*ARGBToYMatrixRow)(const uint8_t* src_argb, uint8_t* dst_y, int width,
const struct ArgbConstants* c) = ARGBToYMatrixRow_C;
void (*I422ToUYVYRow)(const uint8_t* src_y, const uint8_t* src_u,
@ -1077,8 +1079,6 @@ int ARGBToUYVYMatrix(const uint8_t* src_argb,
return 0;
}
// Same as NV12 but U and V swapped.
LIBYUV_API
int ARGBToNV21(const uint8_t* src_argb,
@ -1089,8 +1089,9 @@ int ARGBToNV21(const uint8_t* src_argb,
int dst_stride_vu,
int width,
int height) {
return ARGBToNV21Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_vu,
dst_stride_vu, &kArgbI601Constants, width, height);
return ARGBToNV21Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y,
dst_vu, dst_stride_vu, &kArgbI601Constants, width,
height);
}
LIBYUV_API
@ -1102,8 +1103,9 @@ int ABGRToNV12(const uint8_t* src_abgr,
int dst_stride_uv,
int width,
int height) {
return ARGBToNV12Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_uv,
dst_stride_uv, &kAbgrI601Constants, width, height);
return ARGBToNV12Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y,
dst_uv, dst_stride_uv, &kAbgrI601Constants, width,
height);
}
// Same as NV12 but U and V swapped.
@ -1116,8 +1118,9 @@ int ABGRToNV21(const uint8_t* src_abgr,
int dst_stride_vu,
int width,
int height) {
return ARGBToNV21Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_vu,
dst_stride_vu, &kAbgrI601Constants, width, height);
return ARGBToNV21Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y,
dst_vu, dst_stride_vu, &kAbgrI601Constants, width,
height);
}
// Convert ARGB to YUY2.
@ -1819,8 +1822,8 @@ int ARGBToJ444(const uint8_t* src_argb,
int width,
int height) {
return ARGBToI444Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kArgbJPEGConstants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kArgbJPEGConstants, width, height);
}
// Convert ARGB to J420. (JPeg full range I420).
@ -1836,8 +1839,8 @@ int ARGBToJ420(const uint8_t* src_argb,
int width,
int height) {
return ARGBToI420Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kArgbJPEGConstants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kArgbJPEGConstants, width, height);
}
// Convert ARGB to J422. (JPeg full range I422).
@ -1853,8 +1856,8 @@ int ARGBToJ422(const uint8_t* src_argb,
int width,
int height) {
return ARGBToI422Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kArgbJPEGConstants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kArgbJPEGConstants, width, height);
}
// Convert ARGB to J400.
@ -1978,8 +1981,8 @@ int ABGRToJ420(const uint8_t* src_abgr,
int width,
int height) {
return ARGBToI420Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kAbgrJPEGConstants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kAbgrJPEGConstants, width, height);
}
// Convert ABGR to J422. (JPeg full range I422).
@ -1995,8 +1998,8 @@ int ABGRToJ422(const uint8_t* src_abgr,
int width,
int height) {
return ARGBToI422Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
dst_stride_u, dst_v, dst_stride_v, &kAbgrJPEGConstants,
width, height);
dst_stride_u, dst_v, dst_stride_v,
&kAbgrJPEGConstants, width, height);
}
// Convert ABGR to J400.
@ -2424,7 +2427,8 @@ int RAWToNV21Matrix(const uint8_t* src_raw,
ARGBToUVMatrixRow(row, row_size, row_u, row_v, width, argbconstants);
MergeUVRow(row_v, row_u, dst_vu, halfwidth);
ARGBToYMatrixRow(row, dst_y, width, argbconstants);
ARGBToYMatrixRow(row + row_size, dst_y + dst_stride_y, width, argbconstants);
ARGBToYMatrixRow(row + row_size, dst_y + dst_stride_y, width,
argbconstants);
src_raw += src_stride_raw * 2;
dst_y += dst_stride_y * 2;
dst_vu += dst_stride_vu;
@ -2482,7 +2486,6 @@ int RGB24ToNV12(const uint8_t* src_rgb24,
height);
}
#ifdef __cplusplus
} // extern "C"
} // namespace libyuv

View File

@ -8,13 +8,13 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "libyuv/convert_from_argb.h" // For ArgbConstants
#include "libyuv/planar_functions.h"
#include <assert.h>
#include <limits.h>
#include <string.h> // for memset()
#include "libyuv/convert_from_argb.h" // For ArgbConstants
#include "libyuv/cpu_id.h"
#include "libyuv/row.h"
#include "libyuv/scale_row.h" // for ScaleRowDown2
@ -630,6 +630,14 @@ void SplitUVPlane(const uint8_t* src_uv,
}
}
#endif
#if defined(HAS_SPLITUVROW_AVX512BW)
if (TestCpuFlag(kCpuHasAVX512BW)) {
SplitUVRow = SplitUVRow_Any_AVX512BW;
if (IS_ALIGNED(width, 64)) {
SplitUVRow = SplitUVRow_AVX512BW;
}
}
#endif
#if defined(HAS_SPLITUVROW_NEON)
if (TestCpuFlag(kCpuHasNEON)) {
SplitUVRow = SplitUVRow_Any_NEON;
@ -2588,6 +2596,14 @@ void MirrorPlane(const uint8_t* src_y,
}
}
#endif
#if defined(HAS_MIRRORROW_AVX512BW)
if (TestCpuFlag(kCpuHasAVX512BW)) {
MirrorRow = MirrorRow_Any_AVX512BW;
if (IS_ALIGNED(width, 64)) {
MirrorRow = MirrorRow_AVX512BW;
}
}
#endif
#if defined(HAS_MIRRORROW_LSX)
if (TestCpuFlag(kCpuHasLSX)) {
MirrorRow = MirrorRow_Any_LSX;

View File

@ -8,11 +8,11 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "libyuv/rotate.h"
#include <assert.h>
#include <limits.h>
#include "libyuv/rotate.h"
#include "libyuv/convert.h"
#include "libyuv/cpu_id.h"
#include "libyuv/planar_functions.h"
@ -403,6 +403,11 @@ void SplitRotateUV180(const uint8_t* src,
MirrorSplitUVRow = MirrorSplitUVRow_AVX2;
}
#endif
#if defined(HAS_MIRRORSPLITUVROW_AVX512BW)
if (TestCpuFlag(kCpuHasAVX512BW) && IS_ALIGNED(width, 32)) {
MirrorSplitUVRow = MirrorSplitUVRow_AVX512BW;
}
#endif
#if defined(HAS_MIRRORSPLITUVROW_LSX)
if (TestCpuFlag(kCpuHasLSX) && IS_ALIGNED(width, 32)) {
MirrorSplitUVRow = MirrorSplitUVRow_LSX;

View File

@ -1919,6 +1919,9 @@ ANY11IS(InterpolateRow_16To8_Any_AVX2,
memcpy(dst_ptr + np * BPP, vout + (MASK + 1 - r) * BPP, r * BPP); \
}
#ifdef HAS_MIRRORROW_AVX512BW
ANY11M(MirrorRow_Any_AVX512BW, MirrorRow_AVX512BW, 1, 63)
#endif
#ifdef HAS_MIRRORROW_AVX2
ANY11M(MirrorRow_Any_AVX2, MirrorRow_AVX2, 1, 31)
#endif
@ -2022,6 +2025,9 @@ ANY1(ARGBSetRow_Any_LSX, ARGBSetRow_LSX, uint32_t, 4, 3)
#ifdef HAS_SPLITUVROW_SSE2
ANY12(SplitUVRow_Any_SSE2, SplitUVRow_SSE2, 0, 2, 0, 15)
#endif
#ifdef HAS_SPLITUVROW_AVX512BW
ANY12(SplitUVRow_Any_AVX512BW, SplitUVRow_AVX512BW, 0, 2, 0, 63)
#endif
#ifdef HAS_SPLITUVROW_AVX2
ANY12(SplitUVRow_Any_AVX2, SplitUVRow_AVX2, 0, 2, 0, 31)
#endif
@ -2291,6 +2297,9 @@ ANY12MS(ARGB4444ToUVMatrixRow_Any_AVX2, ARGB4444ToUVMatrixRow_AVX2, 0, 2, 31)
#ifdef HAS_ARGBTOUVMATRIXROW_AVX512BW
ANY12MS(ARGBToUVMatrixRow_Any_AVX512BW, ARGBToUVMatrixRow_AVX512BW, 0, 4, 63)
#endif
#ifdef HAS_RGBTOUVMATRIXROW_AVX512BW
ANY12MS(RGBToUVMatrixRow_Any_AVX512BW, RGBToUVMatrixRow_AVX512BW, 0, 3, 63)
#endif
#ifdef HAS_ARGBTOUVMATRIXROW_SSSE3
ANY12MS(ARGBToUVMatrixRow_Any_SSSE3, ARGBToUVMatrixRow_SSSE3, 0, 4, 7)
#endif

View File

@ -783,7 +783,8 @@ void ARGBToYMatrixRow_C(const uint8_t* src_argb,
const struct ArgbConstants* c) {
int x;
for (x = 0; x < width; ++x) {
dst_y[0] = RGBToYMatrix(src_argb[0], src_argb[1], src_argb[2], src_argb[3], c);
dst_y[0] =
RGBToYMatrix(src_argb[0], src_argb[1], src_argb[2], src_argb[3], c);
src_argb += 4;
dst_y += 1;
}
@ -4618,8 +4619,7 @@ void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
RGB24ToARGBRow_AVX2(src_rgb, row, twidth);
RGB24ToARGBRow_AVX2(src_rgb + src_stride_rgb,
row + MAXTWIDTH * 4, twidth);
RGB24ToARGBRow_AVX2(src_rgb + src_stride_rgb, row + MAXTWIDTH * 4, twidth);
ARGBToUVMatrixRow_AVX2(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_rgb += twidth * 3;
dst_u += twidth / 2;
@ -4629,6 +4629,29 @@ void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb,
}
#endif
#if defined(HAS_ARGBTOUVMATRIXROW_AVX512BW) && \
defined(HAS_RGB24TOARGBROW_AVX512BW)
void RGBToUVMatrixRow_AVX512BW(const uint8_t* src_rgb,
int src_stride_rgb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c) {
SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4 * 2]);
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
RGB24ToARGBRow_AVX512BW(src_rgb, row, twidth);
RGB24ToARGBRow_AVX512BW(src_rgb + src_stride_rgb, row + MAXTWIDTH * 4,
twidth);
ARGBToUVMatrixRow_AVX512BW(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_rgb += twidth * 3;
dst_u += twidth / 2;
dst_v += twidth / 2;
width -= twidth;
}
}
#endif
#if defined(HAS_ARGBTOUVMATRIXROW_NEON) && defined(HAS_RGB24TOARGBROW_NEON)
void RGBToUVMatrixRow_NEON(const uint8_t* src_rgb,
int src_stride_rgb,
@ -4675,7 +4698,8 @@ void RGB565ToUVMatrixRow_C(const uint8_t* src_rgb565,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
RGB565ToARGBRow_C(src_rgb565, row, twidth);
RGB565ToARGBRow_C(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4, twidth);
RGB565ToARGBRow_C(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4,
twidth);
ARGBToUVMatrixRow_C(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_rgb565 += twidth * 2;
dst_u += twidth / 2;
@ -4712,8 +4736,8 @@ void RGB565ToUVMatrixRow_AVX2(const uint8_t* src_rgb565,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
RGB565ToARGBRow_AVX2(src_rgb565, row, twidth);
RGB565ToARGBRow_AVX2(src_rgb565 + src_stride_rgb565,
row + MAXTWIDTH * 4, twidth);
RGB565ToARGBRow_AVX2(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4,
twidth);
ARGBToUVMatrixRow_AVX2(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_rgb565 += twidth * 2;
dst_u += twidth / 2;
@ -4751,7 +4775,8 @@ void RGB565ToUVMatrixRow_NEON(const uint8_t* src_rgb565,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
RGB565ToARGBRow_NEON(src_rgb565, row, twidth);
RGB565ToARGBRow_NEON(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4, twidth);
RGB565ToARGBRow_NEON(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4,
twidth);
ARGBToUVMatrixRow_NEON(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_rgb565 += twidth * 2;
dst_u += twidth / 2;
@ -4786,7 +4811,8 @@ void ARGB1555ToUVMatrixRow_C(const uint8_t* src_argb1555,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
ARGB1555ToARGBRow_C(src_argb1555, row, twidth);
ARGB1555ToARGBRow_C(src_argb1555 + src_stride_argb1555, row + MAXTWIDTH * 4, twidth);
ARGB1555ToARGBRow_C(src_argb1555 + src_stride_argb1555, row + MAXTWIDTH * 4,
twidth);
ARGBToUVMatrixRow_C(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_argb1555 += twidth * 2;
dst_u += twidth / 2;
@ -4820,7 +4846,8 @@ void ARGB4444ToUVMatrixRow_C(const uint8_t* src_argb4444,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
ARGB4444ToARGBRow_C(src_argb4444, row, twidth);
ARGB4444ToARGBRow_C(src_argb4444 + src_stride_argb4444, row + MAXTWIDTH * 4, twidth);
ARGB4444ToARGBRow_C(src_argb4444 + src_stride_argb4444, row + MAXTWIDTH * 4,
twidth);
ARGBToUVMatrixRow_C(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_argb4444 += twidth * 2;
dst_u += twidth / 2;
@ -4956,7 +4983,8 @@ void ARGB1555ToUVMatrixRow_NEON(const uint8_t* src_argb1555,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
ARGB1555ToARGBRow_NEON(src_argb1555, row, twidth);
ARGB1555ToARGBRow_NEON(src_argb1555 + src_stride_argb1555, row + MAXTWIDTH * 4, twidth);
ARGB1555ToARGBRow_NEON(src_argb1555 + src_stride_argb1555,
row + MAXTWIDTH * 4, twidth);
ARGBToUVMatrixRow_NEON(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_argb1555 += twidth * 2;
dst_u += twidth / 2;
@ -4977,7 +5005,8 @@ void ARGB4444ToUVMatrixRow_NEON(const uint8_t* src_argb4444,
while (width > 0) {
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
ARGB4444ToARGBRow_NEON(src_argb4444, row, twidth);
ARGB4444ToARGBRow_NEON(src_argb4444 + src_stride_argb4444, row + MAXTWIDTH * 4, twidth);
ARGB4444ToARGBRow_NEON(src_argb4444 + src_stride_argb4444,
row + MAXTWIDTH * 4, twidth);
ARGBToUVMatrixRow_NEON(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
src_argb4444 += twidth * 2;
dst_u += twidth / 2;

View File

@ -8,8 +8,8 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "libyuv/row.h"
#include "libyuv/convert_from_argb.h" // For ArgbConstants
#include "libyuv/row.h"
#ifdef __cplusplus
namespace libyuv {
@ -120,11 +120,11 @@ static const lvec8 kShuffleNV21 = {
#if defined(HAS_J400TOARGBROW_AVX2) || defined(HAS_J400TOARGBROW_AVX512BW)
alignas(64) static const uint8_t kShuffleMaskJ400ToARGB[64] = {
0u, 0u, 0u, 128u, 1u, 1u, 1u, 128u, 2u, 2u, 2u, 128u, 3u, 3u, 3u, 128u,
4u, 4u, 4u, 128u, 5u, 5u, 5u, 128u, 6u, 6u, 6u, 128u, 7u, 7u, 7u, 128u,
8u, 8u, 8u, 128u, 9u, 9u, 9u, 128u, 10u, 10u, 10u, 128u, 11u, 11u, 11u, 128u,
12u, 12u, 12u, 128u, 13u, 13u, 13u, 128u, 14u, 14u, 14u, 128u, 15u, 15u, 15u, 128u
};
0u, 0u, 0u, 128u, 1u, 1u, 1u, 128u, 2u, 2u, 2u, 128u, 3u, 3u,
3u, 128u, 4u, 4u, 4u, 128u, 5u, 5u, 5u, 128u, 6u, 6u, 6u, 128u,
7u, 7u, 7u, 128u, 8u, 8u, 8u, 128u, 9u, 9u, 9u, 128u, 10u, 10u,
10u, 128u, 11u, 11u, 11u, 128u, 12u, 12u, 12u, 128u, 13u, 13u, 13u, 128u,
14u, 14u, 14u, 128u, 15u, 15u, 15u, 128u};
#endif
#ifdef HAS_J400TOARGBROW_AVX2
@ -158,7 +158,9 @@ void J400ToARGBRow_AVX2(const uint8_t* src_y, uint8_t* dst_argb, int width) {
#endif // HAS_J400TOARGBROW_AVX2
#ifdef HAS_J400TOARGBROW_AVX512BW
void J400ToARGBRow_AVX512BW(const uint8_t* src_y, uint8_t* dst_argb, int width) {
void J400ToARGBRow_AVX512BW(const uint8_t* src_y,
uint8_t* dst_argb,
int width) {
asm volatile(
"vpternlogd $0xff,%%zmm7,%%zmm7,%%zmm7 \n" // 0xffffffff
"vpslld $0x18,%%zmm7,%%zmm7 \n" // 0xff000000
@ -229,7 +231,9 @@ void RGB24ToARGBRow_SSSE3(const uint8_t* src_rgb24,
}
#ifdef HAS_RGB24TOARGBROW_AVX2
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width) {
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24,
uint8_t* dst_argb,
int width) {
// Reference to prevent discarding of kShuffleMaskRGB24ToARGB[1] which is
// accessed via offset in assembly.
const uvec8* dummy = &kShuffleMaskRGB24ToARGB[1];
@ -358,7 +362,10 @@ void RAWToARGBRow_AVX2(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
static const uint32_t kPermdRAWToARGB_AVX512BW[16] = {
0, 1, 2, 3, 3, 4, 5, 6, 6, 7, 8, 9, 9, 10, 11, 12};
void RGBToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, const uint32_t* shuffler, int width) {
void RGBToARGBRow_AVX512BW(const uint8_t* src_raw,
uint8_t* dst_argb,
const uint32_t* shuffler,
int width) {
asm volatile(
"vpternlogd $0xff,%%zmm6,%%zmm6,%%zmm6 \n" // 0xffffffff
"vpslld $0x18,%%zmm6,%%zmm6 \n" // 0xff000000
@ -399,14 +406,20 @@ void RGBToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, const uint
"+r"(width) // %2
: "m"(kPermdRAWToARGB_AVX512BW), // %3
"m"(*shuffler) // %4
: "memory", "cc", "rax", "k1", "xmm0", "xmm1", "xmm2", "xmm3", "xmm4", "xmm5", "xmm6");
: "memory", "cc", "rax", "k1", "xmm0", "xmm1", "xmm2", "xmm3", "xmm4",
"xmm5", "xmm6");
}
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
RGBToARGBRow_AVX512BW(src_raw, dst_argb, (const uint32_t*)&kShuffleMaskRAWToARGB, width);
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw,
uint8_t* dst_argb,
int width) {
RGBToARGBRow_AVX512BW(src_raw, dst_argb,
(const uint32_t*)&kShuffleMaskRAWToARGB, width);
}
void RGB24ToARGBRow_AVX512BW(const uint8_t* src_rgb24, uint8_t* dst_argb, int width) {
void RGB24ToARGBRow_AVX512BW(const uint8_t* src_rgb24,
uint8_t* dst_argb,
int width) {
RGBToARGBRow_AVX512BW(src_rgb24, dst_argb,
(const uint32_t*)&kShuffleMaskRGB24ToARGB[0], width);
}
@ -1452,9 +1465,7 @@ void ARGBToYMatrixRow_SSSE3(const uint8_t* src_argb,
"movdqa %%xmm4,%%xmm6 \n"
"pmaddubsw %%xmm5,%%xmm6 \n"
"phaddw %%xmm6,%%xmm6 \n"
"psubw %%xmm6,%%xmm7 \n"
LABELALIGN ""
RGBTOY(xmm7)
"psubw %%xmm6,%%xmm7 \n" LABELALIGN "" RGBTOY(xmm7)
: "+r"(src_argb), // %0
"+r"(dst_y), // %1
"+r"(width) // %2
@ -1478,10 +1489,8 @@ void ARGBToYMatrixRow_AVX2(const uint8_t* src_argb,
"vpmaddubsw %%ymm5,%%ymm4,%%ymm6 \n"
"vphaddw %%ymm6,%%ymm6,%%ymm6 \n"
"vpsubw %%ymm6,%%ymm7,%%ymm7 \n"
"vmovdqa %4,%%ymm6 \n"
LABELALIGN ""
RGBTOY_AVX2(ymm7)
"vzeroupper \n"
"vmovdqa %4,%%ymm6 \n" LABELALIGN
"" RGBTOY_AVX2(ymm7) "vzeroupper \n"
: "+r"(src_argb), // %0
"+r"(dst_y), // %1
"+r"(width) // %2
@ -1492,7 +1501,8 @@ void ARGBToYMatrixRow_AVX2(const uint8_t* src_argb,
}
#endif
#if defined(HAS_ARGBTOYROW_AVX512BW) || defined(HAS_ARGBTOUV444ROW_AVX512BW) || defined(HAS_ARGBTOUVROW_AVX512BW)
#if defined(HAS_ARGBTOYROW_AVX512BW) || \
defined(HAS_ARGBTOUV444ROW_AVX512BW) || defined(HAS_ARGBTOUVROW_AVX512BW)
static const uint32_t kPermdARGBToY_AVX512BW[16] = {0, 4, 8, 12, 1, 5, 9, 13,
2, 6, 10, 14, 3, 7, 11, 15};
#endif
@ -1518,8 +1528,7 @@ void ARGBToYMatrixRow_AVX512BW(const uint8_t* src_argb,
"vpmaddwd %%zmm16,%%zmm6,%%zmm6 \n"
"vpackssdw %%zmm6,%%zmm6,%%zmm6 \n"
"vpsubw %%zmm6,%%zmm7,%%zmm7 \n"
"vmovups %4,%%zmm6 \n"
LABELALIGN
"vmovups %4,%%zmm6 \n" LABELALIGN
"1: \n"
"vmovups (%0),%%zmm0 \n"
"vmovups 0x40(%0),%%zmm1 \n"
@ -2209,7 +2218,8 @@ void ARGBToUVMatrixRow_AVX512BW(const uint8_t* src_argb,
"vpmaddubsw %%zmm5,%%zmm0,%%zmm0 \n" // 16 V
"vpmaddwd %%zmm16,%%zmm1,%%zmm1 \n"
"vpmaddwd %%zmm16,%%zmm0,%%zmm0 \n"
"vpackssdw %%zmm0,%%zmm1,%%zmm0 \n" // mutates (U in lower, V in upper)
"vpackssdw %%zmm0,%%zmm1,%%zmm0 \n" // mutates (U in lower, V
// in upper)
"vpaddw %%zmm17,%%zmm0,%%zmm0 \n"
"vpsrlw $0x8,%%zmm0,%%zmm0 \n"
"vpackuswb %%zmm0,%%zmm0,%%zmm0 \n" // mutates
@ -4601,6 +4611,29 @@ void MirrorRow_SSSE3(const uint8_t* src, uint8_t* dst, int width) {
}
#endif // HAS_MIRRORROW_SSSE3
#ifdef HAS_MIRRORROW_AVX512BW
void MirrorRow_AVX512BW(const uint8_t* src, uint8_t* dst, int width) {
ptrdiff_t temp_width = (ptrdiff_t)(width);
asm volatile("vbroadcasti32x4 %3,%%zmm5 \n"
LABELALIGN
"1: \n"
"vmovdqu8 -0x40(%0,%2,1),%%zmm0 \n"
"vpshufb %%zmm5,%%zmm0,%%zmm0 \n"
"vshufi64x2 $0x1b,%%zmm0,%%zmm0,%%zmm0 \n"
"vmovdqu8 %%zmm0,(%1) \n"
"lea 0x40(%1),%1 \n"
"sub $0x40,%2 \n"
"jg 1b \n"
"vzeroupper \n"
: "+r"(src), // %0
"+r"(dst), // %1
"+r"(temp_width) // %2
: "m"(kShuffleMirror) // %3
: "memory", "cc", "zmm0", "zmm5");
}
#endif // HAS_MIRRORROW_AVX512BW
#ifdef HAS_MIRRORROW_AVX2
void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
ptrdiff_t temp_width = (ptrdiff_t)(width);
@ -4624,11 +4657,50 @@ void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
}
#endif // HAS_MIRRORROW_AVX2
#ifdef HAS_MIRRORSPLITUVROW_AVX2
#if defined(HAS_MIRRORSPLITUVROW_AVX2) || defined(HAS_MIRRORSPLITUVROW_AVX512BW)
// Shuffle table for reversing the bytes of UV channels.
static const uvec8 kShuffleMirrorSplitUV = {14u, 12u, 10u, 8u, 6u, 4u, 2u, 0u,
15u, 13u, 11u, 9u, 7u, 5u, 3u, 1u};
#endif
#ifdef HAS_MIRRORSPLITUVROW_AVX512BW
static const uint64_t kMirrorSplitUVPermute[8] = {6, 4, 2, 0, 7, 5, 3, 1};
void MirrorSplitUVRow_AVX512BW(const uint8_t* src,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ptrdiff_t temp_width = (ptrdiff_t)(width);
asm volatile(
"vbroadcasti32x4 %4,%%zmm1 \n"
"lea -0x40(%0,%3,2),%0 \n"
"sub %1,%2 \n"
"vmovdqu64 %5,%%zmm3 \n"
LABELALIGN
"1: \n"
"vmovdqu8 (%0),%%zmm0 \n"
"lea -0x40(%0),%0 \n"
"vpshufb %%zmm1,%%zmm0,%%zmm0 \n"
"vpermq %%zmm0,%%zmm3,%%zmm0 \n"
"vextracti64x4 $0x1,%%zmm0,%%ymm2 \n"
"vmovdqu %%ymm0,(%1) \n"
"vmovdqu %%ymm2,0x00(%1,%2,1) \n"
"lea 0x20(%1),%1 \n"
"sub $0x20,%3 \n"
"jg 1b \n"
"vzeroupper \n"
: "+r"(src), // %0
"+r"(dst_u), // %1
"+r"(dst_v), // %2
"+r"(temp_width) // %3
: "m"(kShuffleMirrorSplitUV), // %4
"m"(kMirrorSplitUVPermute) // %5
: "memory", "cc", "zmm0", "zmm1", "zmm2", "zmm3");
}
#endif // HAS_MIRRORSPLITUVROW_AVX512BW
#ifdef HAS_MIRRORSPLITUVROW_AVX2
void MirrorSplitUVRow_AVX2(const uint8_t* src,
uint8_t* dst_u,
uint8_t* dst_v,
@ -4759,13 +4831,11 @@ void RGB24MirrorRow_SSSE3(const uint8_t* src_rgb24,
#ifdef HAS_RGB24MIRRORROW_AVX2
// Shuffle first 10 pixels to last 10 mirrored. first byte zero
static const uvec8 kShuffleMirrorRGB0_AVX = {
128u, 12u, 13u, 14u, 9u, 10u, 11u, 6u, 7u, 8u, 3u, 4u, 5u, 0u, 1u, 2u
};
128u, 12u, 13u, 14u, 9u, 10u, 11u, 6u, 7u, 8u, 3u, 4u, 5u, 0u, 1u, 2u};
// Shuffle last 2 pixels to first 2 mirrored. last byte zero
static const uvec8 kShuffleMirrorRGB1_AVX = {
13u, 14u, 15u, 10u, 11u, 12u, 7u, 8u, 9u, 4u, 5u, 6u, 1u, 2u, 3u, 128u
};
13u, 14u, 15u, 10u, 11u, 12u, 7u, 8u, 9u, 4u, 5u, 6u, 1u, 2u, 3u, 128u};
void RGB24MirrorRow_AVX2(const uint8_t* src_rgb24,
uint8_t* dst_rgb24,
@ -4894,6 +4964,47 @@ void SplitUVRow_AVX2(const uint8_t* src_uv,
}
#endif // HAS_SPLITUVROW_AVX2
#ifdef HAS_SPLITUVROW_AVX512BW
static const uint64_t kSplitUVPermute[8] = {0, 2, 4, 6, 1, 3, 5, 7};
void SplitUVRow_AVX512BW(const uint8_t* src_uv,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
asm volatile(
"vpternlogd $0xff,%%zmm5,%%zmm5,%%zmm5 \n"
"vpsrlw $0x8,%%zmm5,%%zmm5 \n"
"vmovdqu64 %4,%%zmm4 \n"
"sub %1,%2 \n"
LABELALIGN
"1: \n"
"vmovdqu8 (%0),%%zmm0 \n"
"vmovdqu8 0x40(%0),%%zmm1 \n"
"lea 0x80(%0),%0 \n"
"vpsrlw $0x8,%%zmm0,%%zmm2 \n"
"vpsrlw $0x8,%%zmm1,%%zmm3 \n"
"vpandd %%zmm5,%%zmm0,%%zmm0 \n"
"vpandd %%zmm5,%%zmm1,%%zmm1 \n"
"vpackuswb %%zmm1,%%zmm0,%%zmm0 \n"
"vpackuswb %%zmm3,%%zmm2,%%zmm2 \n"
"vpermq %%zmm0,%%zmm4,%%zmm0 \n"
"vpermq %%zmm2,%%zmm4,%%zmm2 \n"
"vmovdqu8 %%zmm0,(%1) \n"
"vmovdqu8 %%zmm2,0x00(%1,%2,1) \n"
"lea 0x40(%1),%1 \n"
"sub $0x40,%3 \n"
"jg 1b \n"
"vzeroupper \n"
: "+r"(src_uv), // %0
"+r"(dst_u), // %1
"+r"(dst_v), // %2
"+r"(width) // %3
: "m"(kSplitUVPermute) // %4
: "memory", "cc", "zmm0", "zmm1", "zmm2", "zmm3", "zmm4", "zmm5");
}
#endif // HAS_SPLITUVROW_AVX512BW
#ifdef HAS_SPLITUVROW_SSE2
void SplitUVRow_SSE2(const uint8_t* src_uv,
uint8_t* dst_u,
@ -8783,10 +8894,14 @@ void InterpolateRow_16_AVX2(uint16_t* dst_ptr,
"vmovd %3,%%xmm5 \n"
"vpunpcklwd %%xmm0,%%xmm5,%%xmm5 \n"
"vpbroadcastd %%xmm5,%%ymm5 \n"
"mov $0x80008000,%%eax \n" // 0x80008000 used to bias unsigned words to signed range for vpmaddwd.
"mov $0x80008000,%%eax \n" // 0x80008000 used to bias
// unsigned words to
// signed range for
// vpmaddwd.
"vmovd %%eax,%%xmm4 \n"
"vbroadcastss %%xmm4,%%ymm4 \n"
"mov $8388736,%%eax \n" // 32768 * 256 + 128 rounding constant.
"mov $8388736,%%eax \n" // 32768 * 256 + 128
// rounding constant.
"vmovd %%eax,%%xmm3 \n"
"vbroadcastss %%xmm3,%%ymm3 \n"
@ -8811,8 +8926,7 @@ void InterpolateRow_16_AVX2(uint16_t* dst_ptr,
"jg 1b \n"
"jmp 99f \n"
"50: \n"
LABELALIGN
"50: \n" LABELALIGN
"2: \n"
"vmovdqu (%1),%%ymm0 \n"
"vpavgw (%1,%4,2),%%ymm0,%%ymm0 \n"
@ -8822,8 +8936,7 @@ void InterpolateRow_16_AVX2(uint16_t* dst_ptr,
"jg 2b \n"
"jmp 99f \n"
"100: \n"
LABELALIGN
"100: \n" LABELALIGN
"3: \n"
"vmovdqu (%1),%%ymm0 \n"
"vmovdqu %%ymm0,0x00(%1,%0,1) \n"
@ -8905,8 +9018,7 @@ void ARGBShuffleRow_AVX512BW(const uint8_t* src_argb,
uint8_t* dst_argb,
const uint8_t* shuffler,
int width) {
asm volatile(
"vbroadcasti32x4 (%3),%%zmm5 \n"
asm volatile("vbroadcasti32x4 (%3),%%zmm5 \n"
LABELALIGN
"1: \n"

View File

@ -2030,7 +2030,9 @@ static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0},
128,
0};
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, 128, 0};
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0},
128,
0};
// RGB to BT.601 coefficients
// B * 0.1016 coefficient = 25
@ -2224,10 +2226,6 @@ static void RGBToYMatrixRow_LASX(const uint8_t* src_rgba,
: "memory");
}
void ARGBToUVJRow_LASX(const uint8_t* src_argb,
int src_stride_argb,
uint8_t* dst_u,

View File

@ -2815,7 +2815,9 @@ static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0},
128,
0};
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, 128, 0};
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0},
128,
0};
// RGB to BT.601 coefficients
// B * 0.1016 coefficient = 25
@ -2995,10 +2997,6 @@ static void RGBToYMatrixRow_LSX(const uint8_t* src_rgba,
: "memory");
}
// undef for unified sources build
#undef YUVTORGB_SETUP
#undef READYUV422_D

View File

@ -8,8 +8,8 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "libyuv/row.h"
#include "libyuv/convert_from_argb.h" // For ArgbConstants
#include "libyuv/row.h"
#ifdef __cplusplus
namespace libyuv {
@ -325,9 +325,8 @@ void I422ToRGB565Row_NEON(const uint8_t* src_y,
YUVTORGB_SETUP
"vmov.u8 d6, #255 \n"
"1: \n" //
READYUV422
"subs %[width], %[width], #8 \n" YUVTORGB RGBTORGB8
ARGBTORGB565
READYUV422 "subs %[width], %[width], #8 \n" YUVTORGB
RGBTORGB8 ARGBTORGB565
"vst1.8 {q2}, [%[dst_rgb565]]! \n" // store 8 pixels RGB565.
"bgt 1b \n"
: [src_y] "+r"(src_y), // %[src_y]
@ -1912,7 +1911,6 @@ void ARGBToUVJ444Row_NEON(const uint8_t* src_argb,
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width, &kArgbJPEGConstants);
}
// clang-format off
// 16x2 pixels -> 8x1. width is number of argb pixels. e.g. 16.
#define RGBTOUV(QB, QG, QR) \
@ -1935,7 +1933,8 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
const struct ArgbConstants* c) {
const uint8_t* src_argb_1 = src_argb + src_stride_argb;
asm volatile(
"vld1.8 {d24}, [%5] \n" // load kRGBToU (8 bytes, only 4 used)
"vld1.8 {d24}, [%5] \n" // load kRGBToU (8 bytes,
// only 4 used)
"vld1.8 {d25}, [%6] \n" // load kRGBToV
"vmovl.s8 q14, d24 \n" // U coeffs in d28
"vmovl.s8 q15, d25 \n" // V coeffs in d30
@ -1943,7 +1942,8 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
"1: \n"
"vld4.8 {d0, d2, d4, d6}, [%0]! \n" // load 8 ARGB pixels.
"vld4.8 {d1, d3, d5, d7}, [%0]! \n" // load next 8 ARGB pixels.
"vld4.8 {d1, d3, d5, d7}, [%0]! \n" // load next 8 ARGB
// pixels.
"subs %4, %4, #16 \n" // 16 processed per loop.
"vpaddl.u8 q0, q0 \n" // B 16 bytes -> 8 shorts.
"vpaddl.u8 q1, q1 \n" // G 16 bytes -> 8 shorts.
@ -1992,9 +1992,8 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
"+r"(width) // %4
: "r"(&c->kRGBToU), // %5
"r"(&c->kRGBToV) // %6
: "cc", "memory", "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7",
"q8", "q9", "q11", "q12", "q14", "q15"
);
: "cc", "memory", "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8",
"q9", "q11", "q12", "q14", "q15");
}
void ARGBToUVRow_NEON(const uint8_t* src_argb,
@ -2807,10 +2806,6 @@ void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
"d24", "d25");
}
// Bilinear filter 16x2 -> 16x1
void InterpolateRow_NEON(uint8_t* dst_ptr,
const uint8_t* src_ptr,

View File

@ -8,8 +8,8 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "libyuv/row.h"
#include "libyuv/convert_from_argb.h"
#include "libyuv/row.h"
#ifdef __cplusplus
namespace libyuv {
@ -783,9 +783,8 @@ void I422ToRGB565Row_NEON(const uint8_t* src_y,
asm volatile(
YUVTORGB_SETUP
"1: \n" //
READYUV422
"subs %w[width], %w[width], #8 \n" I4XXTORGB RGBTORGB8_TOP
ARGBTORGB565_FROM_TOP
READYUV422 "subs %w[width], %w[width], #8 \n" I4XXTORGB
RGBTORGB8_TOP ARGBTORGB565_FROM_TOP
"st1 {v18.8h}, [%[dst_rgb565]], #16 \n" // store 8 pixels RGB565.
"b.gt 1b \n"
: [src_y] "+r"(src_y), // %[src_y]
@ -1036,9 +1035,8 @@ void NV12ToRGB565Row_NEON(const uint8_t* src_y,
YUVTORGB_SETUP
"ldr q2, [%[kNV12Table]] \n"
"1: \n" //
READNV12
"subs %w[width], %w[width], #8 \n" NVTORGB RGBTORGB8_TOP
ARGBTORGB565_FROM_TOP
READNV12 "subs %w[width], %w[width], #8 \n" NVTORGB
RGBTORGB8_TOP ARGBTORGB565_FROM_TOP
"st1 {v18.8h}, [%[dst_rgb565]], #16 \n" // store 8
// pixels
// RGB565.
@ -2745,8 +2743,10 @@ void ARGBToUV444MatrixRow_NEON(const uint8_t* src_argb,
"ldr q16, [%[c], #16] \n" // kRGBToU
"ldr q17, [%[c], #32] \n" // kRGBToV
"ldr s0, [%[c], #64] \n" // kAddUV
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs to 16-bit
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs to 16-bit
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs
// to 16-bit
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs
// to 16-bit
"dup v20.8h, v16.h[0] \n" // U0
"dup v21.8h, v16.h[1] \n" // U1
"dup v22.8h, v16.h[2] \n" // U2
@ -2788,13 +2788,12 @@ void ARGBToUV444MatrixRow_NEON(const uint8_t* src_argb,
"+r"(dst_v), // %2
"+r"(width) // %3
: [c] "r"(c) // %4
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7",
"v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",
"v26", "v27", "v28");
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16",
"v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26",
"v27", "v28");
}
static void ARGBToUV444MatrixRow_NEON_I8MM(
const uint8_t* src_argb,
static void ARGBToUV444MatrixRow_NEON_I8MM(const uint8_t* src_argb,
uint8_t* dst_u,
uint8_t* dst_v,
int width,
@ -2844,8 +2843,7 @@ void ARGBToUV444Row_NEON(const uint8_t* src_argb,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width,
&kArgbI601Constants);
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width, &kArgbI601Constants);
}
void ARGBToUV444Row_NEON_I8MM(const uint8_t* src_argb,
@ -2860,8 +2858,7 @@ void ARGBToUVJ444Row_NEON(const uint8_t* src_argb,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width,
&kArgbJPEGConstants);
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width, &kArgbJPEGConstants);
}
void ARGBToUVJ444Row_NEON_I8MM(const uint8_t* src_argb,
@ -2906,8 +2903,10 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
asm volatile(
"ldr q16, [%[c], #16] \n" // kRGBToU
"ldr q17, [%[c], #32] \n" // kRGBToV
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs to 16-bit
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs to 16-bit
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs
// to 16-bit
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs
// to 16-bit
"dup v20.8h, v16.h[0] \n" // U0
"dup v21.8h, v16.h[1] \n" // U1
"dup v22.8h, v16.h[2] \n" // U2
@ -2916,10 +2915,12 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
"dup v26.8h, v17.h[1] \n" // V1
"dup v27.8h, v17.h[2] \n" // V2
"dup v28.8h, v17.h[3] \n" // V3
"movi v25.8h, #0x80, lsl #8 \n" // 128.0 in 16-bit (0x8000)
"movi v25.8h, #0x80, lsl #8 \n" // 128.0 in 16-bit
// (0x8000)
"1: \n"
"ld4 {v0.16b,v1.16b,v2.16b,v3.16b}, [%0], #64 \n" // load 16 pixels.
"ld4 {v0.16b,v1.16b,v2.16b,v3.16b}, [%0], #64 \n" // load 16
// pixels.
"subs %w4, %w4, #16 \n" // 16 processed per loop.
"uaddlp v0.8h, v0.16b \n" // B 16 bytes -> 8 shorts.
"prfm pldl1keep, [%0, 448] \n"
@ -2927,7 +2928,8 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
"uaddlp v2.8h, v2.16b \n" // R 16 bytes -> 8 shorts.
"uaddlp v18.8h, v3.16b \n" // A 16 bytes -> 8 shorts.
"ld4 {v4.16b,v5.16b,v6.16b,v7.16b}, [%1], #64 \n" // load 16 more.
"ld4 {v4.16b,v5.16b,v6.16b,v7.16b}, [%1], #64 \n" // load 16
// more.
"uadalp v0.8h, v4.16b \n" // B 16 bytes -> 8 shorts.
"prfm pldl1keep, [%1, 448] \n"
"uadalp v1.8h, v5.16b \n" // G 16 bytes -> 8 shorts.
@ -2964,10 +2966,9 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
"+r"(dst_v), // %3
"+r"(width) // %4
: [c] "r"(c) // %5
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7",
"v16", "v17", "v18", "v20", "v21", "v22", "v23", "v24", "v25", "v26",
"v27", "v28"
);
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16",
"v17", "v18", "v20", "v21", "v22", "v23", "v24", "v25", "v26", "v27",
"v28");
}
void ARGBToUVRow_NEON(const uint8_t* src_argb,
@ -3404,8 +3405,8 @@ void ARGBToUVMatrixRow_NEON_I8MM(const uint8_t* src_argb,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c) {
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v, width,
c);
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v,
width, c);
}
void ARGBToUVRow_NEON_I8MM(const uint8_t* src_argb,
@ -3413,8 +3414,8 @@ void ARGBToUVRow_NEON_I8MM(const uint8_t* src_argb,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v, width,
&kArgbI601Constants);
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v,
width, &kArgbI601Constants);
}
void ABGRToUVRow_NEON_I8MM(const uint8_t* src_abgr,
@ -3422,8 +3423,8 @@ void ABGRToUVRow_NEON_I8MM(const uint8_t* src_abgr,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v, width,
&kAbgrI601Constants);
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v,
width, &kAbgrI601Constants);
}
void BGRAToUVRow_NEON_I8MM(const uint8_t* src_bgra,
@ -3431,8 +3432,8 @@ void BGRAToUVRow_NEON_I8MM(const uint8_t* src_bgra,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_bgra, src_stride_bgra, dst_u, dst_v, width,
&kBgraI601Constants);
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_bgra, src_stride_bgra, dst_u, dst_v,
width, &kBgraI601Constants);
}
void RGBAToUVRow_NEON_I8MM(const uint8_t* src_rgba,
@ -3440,8 +3441,8 @@ void RGBAToUVRow_NEON_I8MM(const uint8_t* src_rgba,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_rgba, src_stride_rgba, dst_u, dst_v, width,
&kRgbaI601Constants);
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_rgba, src_stride_rgba, dst_u, dst_v,
width, &kRgbaI601Constants);
}
void ARGBToUVJRow_NEON_I8MM(const uint8_t* src_argb,
@ -3449,8 +3450,8 @@ void ARGBToUVJRow_NEON_I8MM(const uint8_t* src_argb,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v, width,
&kArgbJPEGConstants);
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v,
width, &kArgbJPEGConstants);
}
void ABGRToUVJRow_NEON_I8MM(const uint8_t* src_abgr,
@ -3458,8 +3459,8 @@ void ABGRToUVJRow_NEON_I8MM(const uint8_t* src_abgr,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v, width,
&kAbgrJPEGConstants);
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v,
width, &kAbgrJPEGConstants);
}
void RGB565ToYRow_NEON(const uint8_t* src_rgb565, uint8_t* dst_y, int width) {
@ -3558,8 +3559,6 @@ void ARGB4444ToYRow_NEON(const uint8_t* src_argb4444,
: "cc", "memory", "v0", "v1", "v2", "v3", "v24", "v25", "v26", "v27");
}
// ARGB expects first 3 values to contain RGB and 4th value is ignored.
void ARGBToYMatrixRow_NEON(const uint8_t* src_argb,
uint8_t* dst_y,
@ -3597,9 +3596,7 @@ void ARGBToYMatrixRow_NEON(const uint8_t* src_argb,
"v19", "v20", "v21", "v22");
}
void ARGBToYMatrixRow_NEON_DotProd(
const uint8_t* src_argb,
void ARGBToYMatrixRow_NEON_DotProd(const uint8_t* src_argb,
uint8_t* dst_y,
int width,
const struct ArgbConstants* c) {
@ -3629,10 +3626,10 @@ void ARGBToYMatrixRow_NEON_DotProd(
"+r"(dst_y), // %1
"+r"(width) // %2
: "r"(c) // %3
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16", "v17", "v18", "v19");
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16",
"v17", "v18", "v19");
}
// RGB to JPeg coefficients
void ARGBToYRow_NEON(const uint8_t* src_argb, uint8_t* dst_y, int width) {
@ -3740,10 +3737,6 @@ void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
"v19", "v20", "v21");
}
// Bilinear filter 16x2 -> 16x1
void InterpolateRow_NEON(uint8_t* dst_ptr,
const uint8_t* src_ptr,

View File

@ -1249,16 +1249,22 @@ void MergeUVRow_RVV(const uint8_t* src_u,
}
#endif
// RGB to JPeg coefficients
// B * 0.1140 coefficient = 29
// G * 0.5870 coefficient = 150
// R * 0.2990 coefficient = 77
// Add 0.5 = 0x80
static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0}, {0}, {0}, {128}, {0}};
static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0},
{0},
{0},
{128},
{0}};
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, {0}, {0}, {128}, {0}};
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0},
{0},
{0},
{128},
{0}};
// RGB to BT.601 coefficients
// B * 0.1016 coefficient = 25
@ -1266,9 +1272,17 @@ static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, {0}, {0
// R * 0.2578 coefficient = 66
// Add 16.5 = 0x1080
static const struct ArgbConstants kRgb24I601Constants = {{25, 129, 66, 0}, {0}, {0}, {0x1080}, {0}};
static const struct ArgbConstants kRgb24I601Constants = {{25, 129, 66, 0},
{0},
{0},
{0x1080},
{0}};
static const struct ArgbConstants kRawI601Constants = {{66, 129, 25, 0}, {0}, {0}, {0x1080}, {0}};
static const struct ArgbConstants kRawI601Constants = {{66, 129, 25, 0},
{0},
{0},
{0x1080},
{0}};
// ARGB expects first 3 values to contain RGB and 4th value is ignored
#ifdef HAS_ARGBTOYMATRIXROW_RVV

View File

@ -1127,9 +1127,10 @@ __arm_locally_streaming void ARGBToUVMatrixRow_SME(
uint8_t* dst_v,
int width,
const struct ArgbConstants* c) {
int8_t uvconstants[8] = {
(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1], (int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1], (int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
int8_t uvconstants[8] = {(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1],
(int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1],
(int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
ARGBToUVMatrixRow_SVE_SC(src_argb, src_stride_argb, dst_u, dst_v, width,
uvconstants);
}

View File

@ -223,9 +223,10 @@ void ARGBToUVMatrixRow_SVE2(const uint8_t* src_argb,
uint8_t* dst_v,
int width,
const struct ArgbConstants* c) {
int8_t uvconstants[8] = {
(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1], (int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1], (int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
int8_t uvconstants[8] = {(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1],
(int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1],
(int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
ARGBToUVMatrixRow_SVE_SC(src_argb, src_stride_argb, dst_u, dst_v, width,
uvconstants);
}

View File

@ -8,19 +8,19 @@
* be found in the AUTHORS file in the root of the source tree.
*/
#include "libyuv/row.h"
#include "libyuv/convert_from_argb.h" // For ArgbConstants
#include "libyuv/row.h"
// This module is for Visual C 32/64 bit
#if !defined(LIBYUV_DISABLE_X86) && \
(defined(__x86_64__) || defined(__i386__) || \
defined(_M_X64) || defined(_M_X86)) && \
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
defined(_M_X86)) && \
((defined(_MSC_VER) && !defined(__clang__)) || \
defined(LIBYUV_ENABLE_ROWWIN))
#include <emmintrin.h>
#include <tmmintrin.h> // For _mm_maddubs_epi16
#include <immintrin.h> // For AVX2 intrinsics
#include <tmmintrin.h> // For _mm_maddubs_epi16
#ifdef __cplusplus
namespace libyuv {
@ -266,27 +266,33 @@ void BGRAToYRow_AVX2(const uint8_t* src_bgra, uint8_t* dst_y, int width) {
LIBYUV_TARGET_AVX2
void RAWToARGBRow_AVX2(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
__m256i ymm_alpha = _mm256_set1_epi32(0xff000000);
__m128i shuf_low = _mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
__m128i shuf_high = _mm_set_epi8(-1, 13, 14, 15, -1, 10, 11, 12, -1, 7, 8, 9, -1, 4, 5, 6);
__m128i shuf_low =
_mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
__m128i shuf_high =
_mm_set_epi8(-1, 13, 14, 15, -1, 10, 11, 12, -1, 7, 8, 9, -1, 4, 5, 6);
__m256i ymm_shuf = _mm256_broadcastsi128_si256(shuf_low);
__m256i ymm_shuf2 = _mm256_broadcastsi128_si256(shuf_high);
while (width > 0) {
__m128i xmm0 = _mm_loadu_si128((const __m128i*)src_raw);
__m256i ymm0 = _mm256_castsi128_si256(xmm0);
ymm0 = _mm256_inserti128_si256(ymm0, _mm_loadu_si128((const __m128i*)(src_raw + 12)), 1);
ymm0 = _mm256_inserti128_si256(
ymm0, _mm_loadu_si128((const __m128i*)(src_raw + 12)), 1);
__m128i xmm1 = _mm_loadu_si128((const __m128i*)(src_raw + 24));
__m256i ymm1 = _mm256_castsi128_si256(xmm1);
ymm1 = _mm256_inserti128_si256(ymm1, _mm_loadu_si128((const __m128i*)(src_raw + 36)), 1);
ymm1 = _mm256_inserti128_si256(
ymm1, _mm_loadu_si128((const __m128i*)(src_raw + 36)), 1);
__m128i xmm2 = _mm_loadu_si128((const __m128i*)(src_raw + 48));
__m256i ymm2 = _mm256_castsi128_si256(xmm2);
ymm2 = _mm256_inserti128_si256(ymm2, _mm_loadu_si128((const __m128i*)(src_raw + 60)), 1);
ymm2 = _mm256_inserti128_si256(
ymm2, _mm_loadu_si128((const __m128i*)(src_raw + 60)), 1);
__m128i xmm3 = _mm_loadu_si128((const __m128i*)(src_raw + 68));
__m256i ymm3 = _mm256_castsi128_si256(xmm3);
ymm3 = _mm256_inserti128_si256(ymm3, _mm_loadu_si128((const __m128i*)(src_raw + 80)), 1);
ymm3 = _mm256_inserti128_si256(
ymm3, _mm_loadu_si128((const __m128i*)(src_raw + 80)), 1);
ymm0 = _mm256_shuffle_epi8(ymm0, ymm_shuf);
ymm1 = _mm256_shuffle_epi8(ymm1, ymm_shuf);
@ -312,10 +318,13 @@ void RAWToARGBRow_AVX2(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
#ifdef HAS_RAWTOARGBROW_AVX512BW
LIBYUV_TARGET_AVX512BW
void RGBToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, const __m128i* shuffler, int width) {
void RGBToARGBRow_AVX512BW(const uint8_t* src_raw,
uint8_t* dst_argb,
const __m128i* shuffler,
int width) {
__m512i zmm_alpha = _mm512_set1_epi32(0xff000000);
__m512i zmm_perm = _mm512_set_epi32(
12, 11, 10, 9, 9, 8, 7, 6, 6, 5, 4, 3, 3, 2, 1, 0);
__m512i zmm_perm =
_mm512_set_epi32(12, 11, 10, 9, 9, 8, 7, 6, 6, 5, 4, 3, 3, 2, 1, 0);
__m512i zmm_shuf = _mm512_broadcast_i32x4(_mm_loadu_si128(shuffler));
while (width > 0) {
@ -351,14 +360,20 @@ void RGBToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, const __m1
}
LIBYUV_TARGET_AVX512BW
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
__m128i shuf = _mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw,
uint8_t* dst_argb,
int width) {
__m128i shuf =
_mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
RGBToARGBRow_AVX512BW(src_raw, dst_argb, &shuf, width);
}
LIBYUV_TARGET_AVX512BW
void RGB24ToARGBRow_AVX512BW(const uint8_t* src_rgb24, uint8_t* dst_argb, int width) {
__m128i shuf = _mm_set_epi8(-1, 11, 10, 9, -1, 8, 7, 6, -1, 5, 4, 3, -1, 2, 1, 0);
void RGB24ToARGBRow_AVX512BW(const uint8_t* src_rgb24,
uint8_t* dst_argb,
int width) {
__m128i shuf =
_mm_set_epi8(-1, 11, 10, 9, -1, 8, 7, 6, -1, 5, 4, 3, -1, 2, 1, 0);
RGBToARGBRow_AVX512BW(src_rgb24, dst_argb, &shuf, width);
}
#endif
@ -374,16 +389,19 @@ void ARGBToUVMatrixRow_AVX2(const uint8_t* src_argb,
__m256i ymm_u = _mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)c->kRGBToU));
__m256i ymm_v = _mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)c->kRGBToV));
__m256i ymm_0101 = _mm256_set1_epi16(0x0101);
__m256i ymm_shuf = _mm256_setr_epi8(0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15,
0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15);
__m256i ymm_shuf =
_mm256_setr_epi8(0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15, 0,
4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15);
__m256i ymm_8000 = _mm256_set1_epi16((short)0x8000);
__m256i ymm_zero = _mm256_setzero_si256();
while (width > 0) {
__m256i ymm0 = _mm256_loadu_si256((const __m256i*)src_argb);
__m256i ymm1 = _mm256_loadu_si256((const __m256i*)(src_argb + 32));
__m256i ymm2 = _mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb));
__m256i ymm3 = _mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb + 32));
__m256i ymm2 =
_mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb));
__m256i ymm3 =
_mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb + 32));
ymm0 = _mm256_shuffle_epi8(ymm0, ymm_shuf);
ymm1 = _mm256_shuffle_epi8(ymm1, ymm_shuf);
@ -455,8 +473,8 @@ void MergeUVRow_AVX2(const uint8_t* src_u,
#ifdef HAS_MIRRORROW_AVX2
LIBYUV_TARGET_AVX2
void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
__m256i ymm_shuf =
_mm256_broadcastsi128_si256(_mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
_mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
src += width;
while (width > 0) {
src -= 32;
@ -473,8 +491,8 @@ void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
#ifdef HAS_MIRRORUVROW_AVX2
LIBYUV_TARGET_AVX2
void MirrorUVRow_AVX2(const uint8_t* src_uv, uint8_t* dst_uv, int width) {
__m256i ymm_shuf =
_mm256_broadcastsi128_si256(_mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1));
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
_mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1));
src_uv += width * 2;
while (width > 0) {
src_uv -= 32;
@ -494,8 +512,8 @@ void MirrorSplitUVRow_AVX2(const uint8_t* src_uv,
uint8_t* dst_u,
uint8_t* dst_v,
int width) {
__m256i ymm_shuf =
_mm256_broadcastsi128_si256(_mm_setr_epi8(14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1));
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
_mm_setr_epi8(14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1));
src_uv += width * 2;
while (width > 0) {
src_uv -= 32;
@ -516,25 +534,28 @@ LIBYUV_TARGET_AVX2
void RGB24MirrorRow_AVX2(const uint8_t* src_rgb24,
uint8_t* dst_rgb24,
int width) {
__m256i shuf0 = _mm256_setr_epi8(
-1, 12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2,
-1, 12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2);
__m128i shuf1 = _mm_setr_epi8(
13, 14, 15, 10, 11, 12, 7, 8, 9, 4, 5, 6, 1, 2, 3, -1);
__m256i shuf0 =
_mm256_setr_epi8(-1, 12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2, -1,
12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2);
__m128i shuf1 =
_mm_setr_epi8(13, 14, 15, 10, 11, 12, 7, 8, 9, 4, 5, 6, 1, 2, 3, -1);
src_rgb24 += width * 3 - 96;
while (width > 0) {
__m128i v0_lo = _mm_loadu_si128((const __m128i*)(src_rgb24 + 0));
__m128i v0_hi = _mm_loadu_si128((const __m128i*)(src_rgb24 + 15));
__m256i v0 = _mm256_inserti128_si256(_mm256_castsi128_si256(v0_lo), v0_hi, 1);
__m256i v0 =
_mm256_inserti128_si256(_mm256_castsi128_si256(v0_lo), v0_hi, 1);
__m128i v1_lo = _mm_loadu_si128((const __m128i*)(src_rgb24 + 30));
__m128i v1_hi = _mm_loadu_si128((const __m128i*)(src_rgb24 + 45));
__m256i v1 = _mm256_inserti128_si256(_mm256_castsi128_si256(v1_lo), v1_hi, 1);
__m256i v1 =
_mm256_inserti128_si256(_mm256_castsi128_si256(v1_lo), v1_hi, 1);
__m128i v2_lo = _mm_loadu_si128((const __m128i*)(src_rgb24 + 60));
__m128i v2_hi = _mm_loadu_si128((const __m128i*)(src_rgb24 + 75));
__m256i v2 = _mm256_inserti128_si256(_mm256_castsi128_si256(v2_lo), v2_hi, 1);
__m256i v2 =
_mm256_inserti128_si256(_mm256_castsi128_si256(v2_lo), v2_hi, 1);
__m128i v3 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 80));
@ -544,11 +565,14 @@ void RGB24MirrorRow_AVX2(const uint8_t* src_rgb24,
v3 = _mm_shuffle_epi8(v3, shuf1);
_mm_storeu_si128((__m128i*)(dst_rgb24 + 80), _mm256_castsi256_si128(v0));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 65), _mm256_extracti128_si256(v0, 1));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 65),
_mm256_extracti128_si256(v0, 1));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 50), _mm256_castsi256_si128(v1));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 35), _mm256_extracti128_si256(v1, 1));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 35),
_mm256_extracti128_si256(v1, 1));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 20), _mm256_castsi256_si128(v2));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 5), _mm256_extracti128_si256(v2, 1));
_mm_storeu_si128((__m128i*)(dst_rgb24 + 5),
_mm256_extracti128_si256(v2, 1));
_mm_storel_epi64((__m128i*)(dst_rgb24 + 0), v3);
src_rgb24 -= 96;
@ -629,7 +653,8 @@ void InterpolateRow_16_AVX2(uint16_t* dst_ptr,
for (i = 0; i < width; i += 16) {
__m256i row0 = _mm256_loadu_si256((const __m256i*)(src_ptr + i));
__m256i row1 = _mm256_loadu_si256((const __m256i*)(src_ptr1 + i));
_mm256_storeu_si256((__m256i*)(dst_ptr + i), _mm256_avg_epu16(row0, row1));
_mm256_storeu_si256((__m256i*)(dst_ptr + i),
_mm256_avg_epu16(row0, row1));
}
} else {
for (i = 0; i < width; i += 16) {
@ -672,21 +697,23 @@ void ARGBMirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
#ifdef HAS_J400TOARGBROW_AVX2
alignas(32) static const uint8_t kShuffleMaskJ400ToARGB_0[32] = {
0u, 0u, 0u, 128u, 1u, 1u, 1u, 128u, 2u, 2u, 2u, 128u, 3u, 3u, 3u, 128u,
4u, 4u, 4u, 128u, 5u, 5u, 5u, 128u, 6u, 6u, 6u, 128u, 7u, 7u, 7u, 128u
};
4u, 4u, 4u, 128u, 5u, 5u, 5u, 128u, 6u, 6u, 6u, 128u, 7u, 7u, 7u, 128u};
alignas(32) static const uint8_t kShuffleMaskJ400ToARGB_1[32] = {
8u, 8u, 8u, 128u, 9u, 9u, 9u, 128u, 10u, 10u, 10u, 128u, 11u, 11u, 11u, 128u,
12u, 12u, 12u, 128u, 13u, 13u, 13u, 128u, 14u, 14u, 14u, 128u, 15u, 15u, 15u, 128u
};
8u, 8u, 8u, 128u, 9u, 9u, 9u, 128u, 10u, 10u, 10u,
128u, 11u, 11u, 11u, 128u, 12u, 12u, 12u, 128u, 13u, 13u,
13u, 128u, 14u, 14u, 14u, 128u, 15u, 15u, 15u, 128u};
LIBYUV_TARGET_AVX2
void J400ToARGBRow_AVX2(const uint8_t* src_y, uint8_t* dst_argb, int width) {
__m256i ymm_mask0 = _mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_0);
__m256i ymm_mask1 = _mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_1);
__m256i ymm_mask0 =
_mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_0);
__m256i ymm_mask1 =
_mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_1);
__m256i ymm_alpha = _mm256_set1_epi32((int)0xff000000u);
while (width > 0) {
__m256i ymm0 = _mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)src_y));
__m256i ymm0 =
_mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)src_y));
__m256i ymm1 = _mm256_shuffle_epi8(ymm0, ymm_mask0);
__m256i ymm2 = _mm256_shuffle_epi8(ymm0, ymm_mask1);
@ -707,13 +734,15 @@ void J400ToARGBRow_AVX2(const uint8_t* src_y, uint8_t* dst_argb, int width) {
#ifdef HAS_RGB24TOARGBROW_AVX2
alignas(16) static const uint8_t kShuffleMaskRGB24ToARGB[2][16] = {
{0u, 1u, 2u, 128u, 3u, 4u, 5u, 128u, 6u, 7u, 8u, 128u, 9u, 10u, 11u, 128u},
{4u, 5u, 6u, 128u, 7u, 8u, 9u, 128u, 10u, 11u, 12u, 128u, 13u, 14u, 15u, 128u}
};
{4u, 5u, 6u, 128u, 7u, 8u, 9u, 128u, 10u, 11u, 12u, 128u, 13u, 14u, 15u,
128u}};
#endif
#ifdef HAS_RGB565TOARGBROW_AVX2
LIBYUV_TARGET_AVX2
void RGB565ToARGBRow_AVX2(const uint8_t* src_rgb565, uint8_t* dst_argb, int width) {
void RGB565ToARGBRow_AVX2(const uint8_t* src_rgb565,
uint8_t* dst_argb,
int width) {
__m256i ymm_scale_rb = _mm256_set1_epi32(0x01080108);
__m256i ymm_scale_g = _mm256_set1_epi32(0x20802080);
__m256i ymm_mask_b = _mm256_set1_epi16((short)0xf800);
@ -755,7 +784,9 @@ void RGB565ToARGBRow_AVX2(const uint8_t* src_rgb565, uint8_t* dst_argb, int widt
#ifdef HAS_ARGB1555TOARGBROW_AVX2
LIBYUV_TARGET_AVX2
void ARGB1555ToARGBRow_AVX2(const uint8_t* src_argb1555, uint8_t* dst_argb, int width) {
void ARGB1555ToARGBRow_AVX2(const uint8_t* src_argb1555,
uint8_t* dst_argb,
int width) {
__m256i ymm_scale_rb = _mm256_set1_epi32(0x01080108);
__m256i ymm_scale_g = _mm256_set1_epi32(0x42004200);
__m256i ymm_mask_b = _mm256_set1_epi16((short)0xf800);
@ -801,7 +832,9 @@ void ARGB1555ToARGBRow_AVX2(const uint8_t* src_argb1555, uint8_t* dst_argb, int
#ifdef HAS_ARGB4444TOARGBROW_AVX2
LIBYUV_TARGET_AVX2
void ARGB4444ToARGBRow_AVX2(const uint8_t* src_argb4444, uint8_t* dst_argb, int width) {
void ARGB4444ToARGBRow_AVX2(const uint8_t* src_argb4444,
uint8_t* dst_argb,
int width) {
__m256i ymm_mask = _mm256_set1_epi32(0x0f0f0f0f);
__m256i ymm_mask2 = _mm256_slli_epi32(ymm_mask, 4);
@ -841,27 +874,35 @@ void ARGB4444ToARGBRow_AVX2(const uint8_t* src_argb4444, uint8_t* dst_argb, int
#ifdef HAS_RGB24TOARGBROW_AVX2
LIBYUV_TARGET_AVX2
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width) {
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24,
uint8_t* dst_argb,
int width) {
__m256i ymm_alpha = _mm256_set1_epi32(0xff000000);
__m256i ymm_shuf = _mm256_broadcastsi128_si256(_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[0]));
__m256i ymm_shuf2 = _mm256_broadcastsi128_si256(_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[1]));
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[0]));
__m256i ymm_shuf2 = _mm256_broadcastsi128_si256(
_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[1]));
while (width > 0) {
__m128i xmm0 = _mm_loadu_si128((const __m128i*)src_rgb24);
__m256i ymm0 = _mm256_castsi128_si256(xmm0);
ymm0 = _mm256_inserti128_si256(ymm0, _mm_loadu_si128((const __m128i*)(src_rgb24 + 12)), 1);
ymm0 = _mm256_inserti128_si256(
ymm0, _mm_loadu_si128((const __m128i*)(src_rgb24 + 12)), 1);
__m128i xmm1 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 24));
__m256i ymm1 = _mm256_castsi128_si256(xmm1);
ymm1 = _mm256_inserti128_si256(ymm1, _mm_loadu_si128((const __m128i*)(src_rgb24 + 36)), 1);
ymm1 = _mm256_inserti128_si256(
ymm1, _mm_loadu_si128((const __m128i*)(src_rgb24 + 36)), 1);
__m128i xmm2 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 48));
__m256i ymm2 = _mm256_castsi128_si256(xmm2);
ymm2 = _mm256_inserti128_si256(ymm2, _mm_loadu_si128((const __m128i*)(src_rgb24 + 60)), 1);
ymm2 = _mm256_inserti128_si256(
ymm2, _mm_loadu_si128((const __m128i*)(src_rgb24 + 60)), 1);
__m128i xmm3 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 68));
__m256i ymm3 = _mm256_castsi128_si256(xmm3);
ymm3 = _mm256_inserti128_si256(ymm3, _mm_loadu_si128((const __m128i*)(src_rgb24 + 80)), 1);
ymm3 = _mm256_inserti128_si256(
ymm3, _mm_loadu_si128((const __m128i*)(src_rgb24 + 80)), 1);
ymm0 = _mm256_shuffle_epi8(ymm0, ymm_shuf);
ymm1 = _mm256_shuffle_epi8(ymm1, ymm_shuf);
@ -886,6 +927,50 @@ void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width)
}
#endif
#ifdef HAS_ARGBSHUFFLEROW_AVX2
LIBYUV_TARGET_AVX2
void ARGBShuffleRow_AVX2(const uint8_t* src_argb,
uint8_t* dst_argb,
const uint8_t* shuffler,
int width) {
__m256i control =
_mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)shuffler));
while (width >= 16) {
__m256i row = _mm256_loadu_si256((const __m256i*)src_argb);
__m256i row1 = _mm256_loadu_si256((const __m256i*)(src_argb + 32));
row = _mm256_shuffle_epi8(row, control);
row1 = _mm256_shuffle_epi8(row1, control);
_mm256_storeu_si256((__m256i*)dst_argb, row);
_mm256_storeu_si256((__m256i*)(dst_argb + 32), row1);
src_argb += 64;
dst_argb += 64;
width -= 16;
}
}
#endif
#ifdef HAS_ARGBSHUFFLEROW_AVX512BW
LIBYUV_TARGET_AVX512BW
void ARGBShuffleRow_AVX512BW(const uint8_t* src_argb,
uint8_t* dst_argb,
const uint8_t* shuffler,
int width) {
__m512i control =
_mm512_broadcast_i32x4(_mm_loadu_si128((const __m128i*)shuffler));
while (width >= 32) {
__m512i row = _mm512_loadu_si512((const __m512i*)src_argb);
__m512i row1 = _mm512_loadu_si512((const __m512i*)(src_argb + 64));
row = _mm512_shuffle_epi8(row, control);
row1 = _mm512_shuffle_epi8(row1, control);
_mm512_storeu_si512((__m512i*)dst_argb, row);
_mm512_storeu_si512((__m512i*)(dst_argb + 64), row1);
src_argb += 128;
dst_argb += 128;
width -= 32;
}
}
#endif
#endif
#ifdef __cplusplus
@ -893,4 +978,7 @@ void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width)
} // namespace libyuv
#endif
#endif // !defined(LIBYUV_DISABLE_X86) && (defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || defined(_M_X86)) && ((defined(_MSC_VER) && !defined(__clang__)) || defined(LIBYUV_ENABLE_ROWWIN))
#endif // !defined(LIBYUV_DISABLE_X86) && (defined(__x86_64__) ||
// defined(__i386__) || defined(_M_X64) || defined(_M_X86)) &&
// ((defined(_MSC_VER) && !defined(__clang__)) ||
// defined(LIBYUV_ENABLE_ROWWIN))

View File

@ -1951,9 +1951,9 @@ int ScalePlane(const uint8_t* src,
// Reject dimensions larger than 32768 (or smaller than -32768 for height).
// This prevents FixedDiv signed integer overflows that can lead to division
// by zero/overflow crashes (SIGFPE on x86) or incorrect step calculations.
if (!src || src_width <= 0 || src_height == 0 ||
src_width > 32768 || src_height < -32768 || src_height > 32768 ||
!dst || dst_width <= 0 || dst_height <= 0) {
if (!src || src_width <= 0 || src_height == 0 || src_width > 32768 ||
src_height < -32768 || src_height > 32768 || !dst || dst_width <= 0 ||
dst_height <= 0) {
return -1;
}
// Simplify filtering when possible.
@ -2059,9 +2059,9 @@ int ScalePlane_16(const uint16_t* src,
// Reject dimensions larger than 32768 (or smaller than -32768 for height).
// This prevents FixedDiv signed integer overflows that can lead to division
// by zero/overflow crashes (SIGFPE on x86) or incorrect step calculations.
if (!src || src_width <= 0 || src_height == 0 ||
src_width > 32768 || src_height < -32768 || src_height > 32768 ||
!dst || dst_width <= 0 || dst_height <= 0) {
if (!src || src_width <= 0 || src_height == 0 || src_width > 32768 ||
src_height < -32768 || src_height > 32768 || !dst || dst_width <= 0 ||
dst_height <= 0) {
return -1;
}
// Simplify filtering when possible.
@ -2171,9 +2171,9 @@ int ScalePlane_12(const uint16_t* src,
// Reject dimensions larger than 32768 (or smaller than -32768 for height).
// This prevents FixedDiv signed integer overflows that can lead to division
// by zero/overflow crashes (SIGFPE on x86) or incorrect step calculations.
if (!src || src_width <= 0 || src_height == 0 ||
src_width > 32768 || src_height < -32768 || src_height > 32768 ||
!dst || dst_width <= 0 || dst_height <= 0) {
if (!src || src_width <= 0 || src_height == 0 || src_width > 32768 ||
src_height < -32768 || src_height > 32768 || !dst || dst_width <= 0 ||
dst_height <= 0) {
return -1;
}
// Simplify filtering when possible.

View File

@ -793,9 +793,9 @@ void ScaleFilterCols64_C(uint8_t* dst_ptr,
// Same as 8 bit arm blender but return is cast to uint16_t
#define BLENDER(a, b, f) \
(uint16_t)( \
(int)(a) + \
(int)((((int64_t)((f)) * ((int64_t)(b) - (int)(a))) + 0x8000) >> 16))
(uint16_t)((int)(a) + \
(int)((((int64_t)((f)) * ((int64_t)(b) - (int)(a))) + 0x8000) >> \
16))
void ScaleFilterCols_16_C(uint16_t* dst_ptr,
const uint16_t* src_ptr,

View File

@ -464,8 +464,7 @@ static void YUVFToRGBReference(int y, int u, int v, int* r, int* g, int* b) {
static void YUVUToRGBReference(int y, int u, int v, int* r, int* g, int* b) {
double y1 = (y - 16) * 1.164384;
*r = RoundToByte(y1 - (v - 128) * -1.67867);
*g = RoundToByte(y1 - (u - 128) * 0.187326 -
(v - 128) * 0.65042);
*g = RoundToByte(y1 - (u - 128) * 0.187326 - (v - 128) * 0.65042);
*b = RoundToByte(y1 - (u - 128) * -2.14177);
}

View File

@ -82,15 +82,19 @@ namespace libyuv {
(kHeight + (TILE_HEIGHT - 1)) & ~(TILE_HEIGHT - 1); \
const int kSrcHalfPaddedWidth = SUBSAMPLE(kPaddedWidth, SRC_SUBSAMP_X); \
const int kSrcHalfPaddedHeight = SUBSAMPLE(kPaddedHeight, SRC_SUBSAMP_Y); \
align_buffer_page_end(src_y, kPaddedWidth* kPaddedHeight* SRC_BPC + OFF); \
align_buffer_page_end(src_y, \
kPaddedWidth * kPaddedHeight * SRC_BPC + OFF); \
align_buffer_page_end( \
src_uv, kSrcHalfPaddedWidth* kSrcHalfPaddedHeight* SRC_BPC * 2 + OFF); \
src_uv, \
kSrcHalfPaddedWidth * kSrcHalfPaddedHeight * SRC_BPC * 2 + OFF); \
align_buffer_page_end(dst_y_c, kWidth * kHeight * DST_BPC); \
align_buffer_page_end(dst_u_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
align_buffer_page_end(dst_v_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
align_buffer_page_end(dst_y_opt, kWidth * kHeight * DST_BPC); \
align_buffer_page_end(dst_u_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
align_buffer_page_end(dst_v_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
align_buffer_page_end(dst_u_opt, \
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
align_buffer_page_end(dst_v_opt, \
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
SRC_T* src_y_p = reinterpret_cast<SRC_T*>(src_y + OFF); \
SRC_T* src_uv_p = reinterpret_cast<SRC_T*>(src_uv + OFF); \
for (int i = 0; i < kPaddedWidth * kPaddedHeight; ++i) { \
@ -389,8 +393,8 @@ TESTPLANARTOB(I444, 1, 1, ARGB, 4, 4, 1)
const int kStrideB = kWidth * BPP_B; \
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
align_buffer_page_end(src_uv, \
kStrideUV* SUBSAMPLE(kHeight, SUBSAMP_Y) * 2 + OFF); \
align_buffer_page_end( \
src_uv, kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y) * 2 + OFF); \
align_buffer_page_end(dst_argb_c, kStrideB * kHeight); \
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight); \
for (int i = 0; i < kHeight; ++i) \
@ -508,7 +512,8 @@ TESTBPTOB(NV12, 2, 2, RGB565, RGB565, 2)
(kWidth * EPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
align_buffer_page_end(src_argb, \
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
align_buffer_page_end(dst_argb_c, kStrideB* kHeightB*(int)sizeof(TYPE_B)); \
align_buffer_page_end(dst_argb_c, \
kStrideB * kHeightB * (int)sizeof(TYPE_B)); \
align_buffer_page_end(dst_argb_opt, \
kStrideB * kHeightB * (int)sizeof(TYPE_B)); \
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
@ -544,7 +549,8 @@ TESTBPTOB(NV12, 2, 2, RGB565, RGB565, 2)
(kWidth * EPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
const int kStrideB = \
(kWidth * EPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
align_buffer_page_end(src_argb, kStrideA* kHeightA*(int)sizeof(TYPE_A)); \
align_buffer_page_end(src_argb, \
kStrideA * kHeightA * (int)sizeof(TYPE_A)); \
align_buffer_page_end(dst_argb_c, \
kStrideB * kHeightB * (int)sizeof(TYPE_B)); \
align_buffer_page_end(dst_argb_opt, \
@ -886,7 +892,8 @@ TESTATOBD(ARGB, 4, 4, 1, RGB565, 2, 2, 1)
(kWidth * EPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
align_buffer_page_end(src_argb, \
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
align_buffer_page_end(dst_argb_c, kStrideA* kHeightA*(int)sizeof(TYPE_A)); \
align_buffer_page_end(dst_argb_c, \
kStrideA * kHeightA * (int)sizeof(TYPE_A)); \
align_buffer_page_end(dst_argb_opt, \
kStrideA * kHeightA * (int)sizeof(TYPE_A)); \
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
@ -2834,13 +2841,20 @@ TEST_F(LibYUVConvertTest, TestARGBToUVMatrixRow_Opt) {
int src_stride = (height == 1) ? 0 : kMaxWidth * 4;
ARGBToUVMatrixRow_C(&orig_argb_pixels[0], src_stride, &dest_u_c[0], &dest_v_c[0], width, &kArgbI601Constants);
ARGBToUVMatrixRow_Any_NEON(&orig_argb_pixels[0], src_stride, &dest_u_opt[0], &dest_v_opt[0], width, &kArgbI601Constants);
ARGBToUVMatrixRow_C(&orig_argb_pixels[0], src_stride, &dest_u_c[0],
&dest_v_c[0], width, &kArgbI601Constants);
ARGBToUVMatrixRow_Any_NEON(&orig_argb_pixels[0], src_stride,
&dest_u_opt[0], &dest_v_opt[0], width,
&kArgbI601Constants);
int half_width = (width + 1) / 2;
for (int i = 0; i < half_width; ++i) {
ASSERT_EQ(dest_u_c[i], dest_u_opt[i]) << "u mismatch at " << i << " width " << width << " height " << height;
ASSERT_EQ(dest_v_c[i], dest_v_opt[i]) << "v mismatch at " << i << " width " << width << " height " << height;
ASSERT_EQ(dest_u_c[i], dest_u_opt[i])
<< "u mismatch at " << i << " width " << width << " height "
<< height;
ASSERT_EQ(dest_v_c[i], dest_v_opt[i])
<< "v mismatch at " << i << " width " << width << " height "
<< height;
}
}
}
@ -2909,7 +2923,6 @@ TEST_F(LibYUVConvertTest, TestI400LargeSize) {
#endif // !defined(LEAN_TESTS)
#define TESTATOBPI(FMT_A, TYPE_A, BPP_A, STRIDE_A, HEIGHT_A, FMT_B, SUBSAMP_X, \
SUBSAMP_Y, W1280, N, NEG, OFF) \
TEST_F(LibYUVConvertTest, FMT_A##To##FMT_B##N) { \

View File

@ -87,8 +87,10 @@ namespace libyuv {
align_buffer_page_end(dst_u_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
align_buffer_page_end(dst_v_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
align_buffer_page_end(dst_y_opt, kWidth * kHeight * DST_BPC); \
align_buffer_page_end(dst_u_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
align_buffer_page_end(dst_v_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
align_buffer_page_end(dst_u_opt, \
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
align_buffer_page_end(dst_v_opt, \
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
MemRandomize(src_y + OFF, kWidth * kHeight * SRC_BPC); \
MemRandomize(src_u + OFF, kSrcHalfWidth * kSrcHalfHeight * SRC_BPC); \
MemRandomize(src_v + OFF, kSrcHalfWidth * kSrcHalfHeight * SRC_BPC); \
@ -478,7 +480,8 @@ TESTPLANARTOBP(I212, uint16_t, 2, 2, 1, P212, uint16_t, 2, 2, 1, 12)
(kHeight + (TILE_HEIGHT - 1)) & ~(TILE_HEIGHT - 1); \
const int kSrcHalfPaddedWidth = SUBSAMPLE(kPaddedWidth, SRC_SUBSAMP_X); \
const int kSrcHalfPaddedHeight = SUBSAMPLE(kPaddedHeight, SRC_SUBSAMP_Y); \
align_buffer_page_end(src_y, kPaddedWidth* kPaddedHeight* SRC_BPC + OFF); \
align_buffer_page_end(src_y, \
kPaddedWidth * kPaddedHeight * SRC_BPC + OFF); \
align_buffer_page_end( \
src_uv, \
2 * kSrcHalfPaddedWidth * kSrcHalfPaddedHeight * SRC_BPC + OFF); \
@ -2287,12 +2290,13 @@ TEST_F(LibYUVConvertTest, TestARGBToI420Matrix) {
dst_v, kWidth / 2, &kArgbU2020Constants, kWidth, kHeight);
// Reference BT.709 (limited range)
// Y = round(0.2126 * 219 / 255 * R + 0.7152 * 219 / 255 * G + 0.0722 * 219 / 255 * B + 16)
// Y = round(0.1826 * R + 0.6142 * G + 0.0620 * B + 16)
// 47 * 255 + 157 * 255 + 16 * 255 + 4224 = 11985 + 40035 + 4080 + 4224 = 60324
// 60324 / 256 = 235.64 -> 235. Correct.
// Y = round(0.2126 * 219 / 255 * R + 0.7152 * 219 / 255 * G + 0.0722 * 219 /
// 255 * B + 16) Y = round(0.1826 * R + 0.6142 * G + 0.0620 * B + 16) 47 * 255
// + 157 * 255 + 16 * 255 + 4224 = 11985 + 40035 + 4080 + 4224 = 60324 60324 /
// 256 = 235.64 -> 235. Correct.
for (int i = 0; i < kWidth * kHeight * 4; ++i) src_argb[i] = 255;
for (int i = 0; i < kWidth * kHeight * 4; ++i)
src_argb[i] = 255;
ARGBToI420Matrix(src_argb, kWidth * 4, dst_y, kWidth, dst_u, kWidth / 2,
dst_v, kWidth / 2, &kArgbH709Constants, kWidth, kHeight);
ASSERT_EQ(dst_y[0], 235);
@ -2423,6 +2427,132 @@ TEST_F(LibYUVConvertTest, TestARGBToI444Matrix) {
free_aligned_buffer_page_end(ref_v);
}
template <typename ConvertToYUV, typename ConvertToARGB>
static void TestRGBToI420(ConvertToYUV convert_to_yuv,
ConvertToARGB convert_to_argb,
int width,
int height,
int disable_cpu_flags,
int benchmark_cpu_info) {
align_buffer_page_end(src_rgb, width * height * 4);
align_buffer_page_end(dst_y, width * height);
align_buffer_page_end(dst_u, (width + 1) / 2 * (height + 1) / 2);
align_buffer_page_end(dst_v, (width + 1) / 2 * (height + 1) / 2);
align_buffer_page_end(tmp_argb, width * height * 4);
align_buffer_page_end(ref_y, width * height);
align_buffer_page_end(ref_u, (width + 1) / 2 * (height + 1) / 2);
align_buffer_page_end(ref_v, (width + 1) / 2 * (height + 1) / 2);
MemRandomize(src_rgb, width * height * 4);
{
SCOPED_TRACE("C_Version");
MaskCpuFlags(disable_cpu_flags);
// Clear buffers
memset(dst_y, 0, width * height);
memset(dst_u, 0, (width + 1) / 2 * (height + 1) / 2);
memset(dst_v, 0, (width + 1) / 2 * (height + 1) / 2);
memset(ref_y, 0, width * height);
memset(ref_u, 0, (width + 1) / 2 * (height + 1) / 2);
memset(ref_v, 0, (width + 1) / 2 * (height + 1) / 2);
memset(tmp_argb, 0, width * height * 4);
int r1 =
convert_to_yuv(src_rgb, width * 4, dst_y, width, dst_u, (width + 1) / 2,
dst_v, (width + 1) / 2, width, height);
ASSERT_EQ(r1, 0);
int r2 =
convert_to_argb(src_rgb, width * 4, tmp_argb, width * 4, width, height);
ASSERT_EQ(r2, 0);
int r3 = ARGBToI420(tmp_argb, width * 4, ref_y, width, ref_u,
(width + 1) / 2, ref_v, (width + 1) / 2, width, height);
ASSERT_EQ(r3, 0);
for (int i = 0; i < width * height; ++i) {
ASSERT_EQ(dst_y[i], ref_y[i]);
}
for (int i = 0; i < (width + 1) / 2 * (height + 1) / 2; ++i) {
ASSERT_EQ(dst_u[i], ref_u[i]);
ASSERT_EQ(dst_v[i], ref_v[i]);
}
}
{
SCOPED_TRACE("SIMD_Version");
MaskCpuFlags(benchmark_cpu_info);
// Clear buffers
memset(dst_y, 0, width * height);
memset(dst_u, 0, (width + 1) / 2 * (height + 1) / 2);
memset(dst_v, 0, (width + 1) / 2 * (height + 1) / 2);
memset(ref_y, 0, width * height);
memset(ref_u, 0, (width + 1) / 2 * (height + 1) / 2);
memset(ref_v, 0, (width + 1) / 2 * (height + 1) / 2);
memset(tmp_argb, 0, width * height * 4);
int r1 =
convert_to_yuv(src_rgb, width * 4, dst_y, width, dst_u, (width + 1) / 2,
dst_v, (width + 1) / 2, width, height);
ASSERT_EQ(r1, 0);
int r2 =
convert_to_argb(src_rgb, width * 4, tmp_argb, width * 4, width, height);
ASSERT_EQ(r2, 0);
int r3 = ARGBToI420(tmp_argb, width * 4, ref_y, width, ref_u,
(width + 1) / 2, ref_v, (width + 1) / 2, width, height);
ASSERT_EQ(r3, 0);
for (int i = 0; i < width * height; ++i) {
ASSERT_EQ(dst_y[i], ref_y[i]);
}
for (int i = 0; i < (width + 1) / 2 * (height + 1) / 2; ++i) {
ASSERT_EQ(dst_u[i], ref_u[i]);
ASSERT_EQ(dst_v[i], ref_v[i]);
}
}
free_aligned_buffer_page_end(src_rgb);
free_aligned_buffer_page_end(dst_y);
free_aligned_buffer_page_end(dst_u);
free_aligned_buffer_page_end(dst_v);
free_aligned_buffer_page_end(tmp_argb);
free_aligned_buffer_page_end(ref_y);
free_aligned_buffer_page_end(ref_u);
free_aligned_buffer_page_end(ref_v);
}
TEST_F(LibYUVConvertTest, BGRAToI420_Check) {
TestRGBToI420(BGRAToI420, BGRAToARGB, 16, 16, disable_cpu_flags_,
benchmark_cpu_info_);
TestRGBToI420(BGRAToI420, BGRAToARGB, 17, 17, disable_cpu_flags_,
benchmark_cpu_info_);
TestRGBToI420(BGRAToI420, BGRAToARGB, 1280, 720, disable_cpu_flags_,
benchmark_cpu_info_);
}
TEST_F(LibYUVConvertTest, RGBAToI420_Check) {
TestRGBToI420(RGBAToI420, RGBAToARGB, 16, 16, disable_cpu_flags_,
benchmark_cpu_info_);
TestRGBToI420(RGBAToI420, RGBAToARGB, 17, 17, disable_cpu_flags_,
benchmark_cpu_info_);
TestRGBToI420(RGBAToI420, RGBAToARGB, 1280, 720, disable_cpu_flags_,
benchmark_cpu_info_);
}
TEST_F(LibYUVConvertTest, ABGRToI420_Check) {
TestRGBToI420(ABGRToI420, ABGRToARGB, 16, 16, disable_cpu_flags_,
benchmark_cpu_info_);
TestRGBToI420(ABGRToI420, ABGRToARGB, 17, 17, disable_cpu_flags_,
benchmark_cpu_info_);
TestRGBToI420(ABGRToI420, ABGRToARGB, 1280, 720, disable_cpu_flags_,
benchmark_cpu_info_);
}
#endif // !defined(LEAN_TESTS)
} // namespace libyuv

View File

@ -516,9 +516,9 @@ TEST_F(LibYUVScaleTest, YUVToRGBScaleUp) {
TEST_F(LibYUVScaleTest, YUVToRGBScaleDown) {
int diff = 0;
YUVToARGBTestFilter(
benchmark_width_ * 3 / 2, benchmark_height_ * 3 / 2, benchmark_width_,
benchmark_height_, libyuv::kFilterBilinear, benchmark_iterations_, 10,
YUVToARGBTestFilter(benchmark_width_ * 3 / 2, benchmark_height_ * 3 / 2,
benchmark_width_, benchmark_height_,
libyuv::kFilterBilinear, benchmark_iterations_, 10,
&diff);
ASSERT_LE(diff, 10);
}

View File

@ -88,7 +88,8 @@ static inline bool SizeValid(int src_width,
reinterpret_cast<uint8_t*>(malloc(((size) * 2 + 4095 + 63) & ~4095)); \
if (var##_mem) \
var = reinterpret_cast<uint16_t*>( \
(intptr_t)(var##_mem + (((size)*2 + 4095 + 63) & ~4095) - (size)*2) & \
(intptr_t)(var##_mem + (((size) * 2 + 4095 + 63) & ~4095) - \
(size) * 2) & \
~63)
#define free_aligned_buffer_page_end_16(var) \