mirror of
https://chromium.googlesource.com/libyuv/libyuv
synced 2026-06-15 00:16:08 +08:00
BGRAToI420 use BgraConstants for a direct conversion using AVX512BW
row win (msvc) Was C/SSSE3 BGRAToARGB_Opt (594 ms) BGRAToARGB_Endswap_Opt (609 ms) BGRAToI420_Opt (122 ms) Now AVX2 BGRAToARGB_Opt (100 ms) BGRAToARGB_Endswap_Opt (99 ms) BGRAToI420_Opt (115 ms) Clang/GCC AVX512BW BGRAToARGB_Opt (86 ms) BGRAToARGB_Endswap_Opt (91 ms) BGRAToI420_Opt (110 ms) Bug: 42280902 Change-Id: I52cb2b0cacea8f2f0b138ec3cc521185dbef8595 Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/7905821 Commit-Queue: Frank Barchard <fbarchard@google.com> Reviewed-by: richard winterton <rrwinterton@gmail.com>
This commit is contained in:
parent
95eedb9687
commit
4be798d7c5
44
GEMINI.md
44
GEMINI.md
@ -1,44 +1,62 @@
|
||||
# Gemini Project Context: libyuv Row Functions
|
||||
|
||||
This file provides context for the core row-processing architecture of libyuv. Use these guidelines when refactoring, reviewing, or generating code within the `row_*.cc` files.
|
||||
This file provides context for the core row-processing architecture of
|
||||
libyuv. Use these guidelines when refactoring, reviewing, or generating
|
||||
code within the `row_*.cc` files.
|
||||
|
||||
## Architectural Overview
|
||||
|
||||
Libyuv uses a dispatch system where high-level conversion functions call optimized "Row" functions. These functions are categorized by SIMD architecture and compiler compatibility.
|
||||
Libyuv uses a dispatch system where high-level conversion functions call
|
||||
optimized "Row" functions. These functions are categorized by SIMD architecture
|
||||
and compiler compatibility.
|
||||
|
||||
## Source File Map
|
||||
|
||||
### x86 Architectures (32-bit and 64-bit)
|
||||
|
||||
* **row_gcc.cc**: **Master copy.** Contains inline assembly in GCC syntax for GCC and Clang. Supports AVX, and AVX512. AVX512 implementations are strictly for 64-bit targets.
|
||||
* **row_win.cc**: Derivative of `row_gcc.cc`. Contains C++ intrinsics specifically for Visual C++ (MSVC). Can be tested with Clang using `-DLIBYUV_ENABLE_ROWWIN`.
|
||||
* **row_gcc.cc**: **Master copy.** Contains inline assembly in GCC syntax for
|
||||
GCC and Clang. Supports AVX, and AVX512. AVX512 implementations are strictly
|
||||
for 64-bit targets.
|
||||
* **row_win.cc**: Derivative of `row_gcc.cc`. Contains C++ intrinsics
|
||||
specifically for Visual C++ (MSVC). Can be tested with Clang using
|
||||
`-DLIBYUV_ENABLE_ROWWIN`.
|
||||
* **Note**: Use either `row_gcc` or `row_win`, never both.
|
||||
|
||||
### ARM Architectures
|
||||
|
||||
* **row_neon.cc**: 32-bit ARM. Written entirely in inline assembly for GCC/Clang.
|
||||
* **row_neon64.cc**: 64-bit ARM (AArch64). Written entirely in inline assembly for GCC/Clang.
|
||||
* **row_neon.cc**: 32-bit ARM. Written entirely in inline assembly for
|
||||
GCC/Clang.
|
||||
* **row_neon64.cc**: 64-bit ARM (AArch64). Written entirely in inline assembly
|
||||
for GCC/Clang.
|
||||
* **row_sve.cc**: ARMv9 Scalable Vector Extensions (SVE).
|
||||
* **row_sme.cc**: ARMv9 Scalable Matrix Extension (SME) and Streaming SVE (SSVE).
|
||||
* **row_sme.cc**: ARMv9 Scalable Matrix Extension (SME) and Streaming SVE
|
||||
(SSVE).
|
||||
|
||||
### Other Architectures
|
||||
|
||||
* **row_rvv.cc**: RISC-V Vector (RVV). Implemented using intrinsics. Optimized for SiFive X280.
|
||||
* **row_rvv.cc**: RISC-V Vector (RVV). Implemented using intrinsics. Optimized
|
||||
for SiFive X280.
|
||||
* **row_lsx.cc / row_lasx.cc**: Loongarch MIPS-like extensions.
|
||||
|
||||
### Utility and Fallbacks
|
||||
|
||||
* **row_common.cc**: Portable C/C++ versions. This is the reference implementation.
|
||||
* **row_any.cc**: Handles "remainder" pixels for widths not multiples of SIMD register size. Used for x86, NEON, and MIPS. Not required for SVE, SME, or RVV due to hardware-level masking.
|
||||
* **row_common.cc**: Portable C/C++ versions. This is the reference
|
||||
implementation.
|
||||
* **row_any.cc**: Handles "remainder" pixels for widths not multiples of SIMD
|
||||
register size. Used for x86, NEON, and MIPS. Not required for SVE, SME, or
|
||||
RVV due to hardware-level masking.
|
||||
|
||||
## Coding Guidelines
|
||||
|
||||
1. **AVX512 Logic**: AVX512 row functions are strictly enabled for **64-bit x86 only**.
|
||||
2. **Feature Macros**: Use the `HAS_` macros in `include/libyuv/row.h` to enable or disable specific AVX512 versions.
|
||||
1. **AVX512 Logic**: AVX512 row functions are strictly enabled for **64-bit x86
|
||||
only**.
|
||||
2. **Feature Macros**: Use the `HAS_` macros in `include/libyuv/row.h` to
|
||||
enable or disable specific AVX512 versions.
|
||||
|
||||
## Changelist (CL) & Commit Guidelines
|
||||
|
||||
When generating descriptions, follow the Chromium/Google standard format. Wrap commit message text at 72 characters
|
||||
When generating descriptions, follow the Chromium/Google standard format. Wrap
|
||||
commit message text at 72 characters
|
||||
|
||||
### Format Example:
|
||||
|
||||
|
||||
@ -1,6 +1,6 @@
|
||||
Name: libyuv
|
||||
URL: https://chromium.googlesource.com/libyuv/libyuv/
|
||||
Version: 1946
|
||||
Version: 1947
|
||||
Revision: DEPS
|
||||
License: BSD-3-Clause
|
||||
License File: LICENSE
|
||||
|
||||
@ -23,10 +23,11 @@ extern "C" {
|
||||
#endif
|
||||
|
||||
// This module is for Visual C 32/64 bit
|
||||
#if !defined(LIBYUV_DISABLE_X86) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || \
|
||||
defined(_M_X64) || defined(_M_X86))
|
||||
#if ((defined(_MSC_VER) && !defined(__clang__)) || defined(LIBYUV_ENABLE_ROWWIN))
|
||||
#if !defined(LIBYUV_DISABLE_X86) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
|
||||
defined(_M_X86))
|
||||
#if ((defined(_MSC_VER) && !defined(__clang__)) || \
|
||||
defined(LIBYUV_ENABLE_ROWWIN))
|
||||
#define USE_ROW_WIN
|
||||
#else
|
||||
#define USE_ROW_GCC
|
||||
@ -121,9 +122,9 @@ extern "C" {
|
||||
|
||||
// The following are available on all x86 platforms, but
|
||||
// require VS2012, clang 3.4 or gcc 4.7.
|
||||
#if !defined(LIBYUV_DISABLE_X86) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || \
|
||||
defined(_M_X64) || defined(_M_X86))
|
||||
#if !defined(LIBYUV_DISABLE_X86) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
|
||||
defined(_M_X86))
|
||||
#define HAS_ARGBMIRRORROW_AVX2
|
||||
#define HAS_RGB24MIRRORROW_AVX2
|
||||
#define HAS_ARGBTOUVMATRIXROW_AVX2
|
||||
@ -139,7 +140,7 @@ extern "C" {
|
||||
#define HAS_INTERPOLATEROW_AVX2
|
||||
#endif
|
||||
|
||||
#if !defined(LIBYUV_DISABLE_X86) && defined(USE_ROW_GCC) && \
|
||||
#if !defined(LIBYUV_DISABLE_X86) && defined(USE_ROW_GCC) && \
|
||||
(defined(VISUALC_HAS_AVX2) || defined(CLANG_HAS_AVX2) || \
|
||||
defined(GCC_HAS_AVX2))
|
||||
#define HAS_ARGBCOPYALPHAROW_AVX2
|
||||
@ -183,7 +184,7 @@ extern "C" {
|
||||
// The following are available for gcc/clang x86 platforms:
|
||||
// TODO(fbarchard): Port to Visual C
|
||||
#if !defined(LIBYUV_DISABLE_X86) && defined(USE_ROW_GCC) && \
|
||||
(defined(__x86_64__) || defined(__i386__)) && \
|
||||
(defined(__x86_64__) || defined(__i386__)) && \
|
||||
!defined(LIBYUV_ENABLE_ROWWIN)
|
||||
#define HAS_AB64TOARGBROW_SSSE3
|
||||
#define HAS_ABGRTOAR30ROW_SSSE3
|
||||
@ -259,8 +260,8 @@ extern "C" {
|
||||
// The following are available for AVX2 gcc/clang x86 platforms:
|
||||
// TODO(fbarchard): Port to Visual C
|
||||
#if !defined(LIBYUV_DISABLE_X86) && defined(USE_ROW_GCC) && \
|
||||
(defined(__x86_64__) || defined(__i386__)) && \
|
||||
(defined(CLANG_HAS_AVX2) || defined(GCC_HAS_AVX2)) && \
|
||||
(defined(__x86_64__) || defined(__i386__)) && \
|
||||
(defined(CLANG_HAS_AVX2) || defined(GCC_HAS_AVX2)) && \
|
||||
!defined(LIBYUV_ENABLE_ROWWIN)
|
||||
#define HAS_AB64TOARGBROW_AVX2
|
||||
#define HAS_ABGRTOAR30ROW_AVX2
|
||||
@ -342,19 +343,21 @@ extern "C" {
|
||||
#endif
|
||||
|
||||
// This module is for Visual C 32/64 bit
|
||||
#if !defined(LIBYUV_DISABLE_X86) && defined(USE_ROW_WIN) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || \
|
||||
defined(_M_X64) || defined(_M_X86)) && \
|
||||
((defined(_MSC_VER) && !defined(__clang__)) || \
|
||||
#if !defined(LIBYUV_DISABLE_X86) && defined(USE_ROW_WIN) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
|
||||
defined(_M_X86)) && \
|
||||
((defined(_MSC_VER) && !defined(__clang__)) || \
|
||||
defined(LIBYUV_ENABLE_ROWWIN))
|
||||
#define HAS_RAWTOARGBROW_AVX2
|
||||
#define HAS_RGB24TOARGBROW_AVX2
|
||||
#define HAS_RGB565TOARGBROW_AVX2
|
||||
#define HAS_ARGB1555TOARGBROW_AVX2
|
||||
#define HAS_ARGB4444TOARGBROW_AVX2
|
||||
#define HAS_ARGBSHUFFLEROW_AVX2
|
||||
#if defined(__x86_64__) || defined(_M_X64)
|
||||
#define HAS_RAWTOARGBROW_AVX512BW
|
||||
#define HAS_RGB24TOARGBROW_AVX512BW
|
||||
#define HAS_ARGBSHUFFLEROW_AVX512BW
|
||||
#endif
|
||||
#define HAS_ARGBTOYROW_AVX2
|
||||
#define HAS_ARGBTOYMATRIXROW_AVX2
|
||||
@ -383,7 +386,6 @@ extern "C" {
|
||||
#endif
|
||||
#define HAS_ARGBTORGB24ROW_AVX512VBMI
|
||||
#define HAS_CONVERT16TO8ROW_AVX512BW
|
||||
#define HAS_MERGEUVROW_AVX512BW
|
||||
#endif
|
||||
|
||||
// The following are available for AVX512 clang x64 platforms:
|
||||
@ -401,6 +403,11 @@ extern "C" {
|
||||
#define HAS_ARGBTOUVJROW_AVX512BW
|
||||
#define HAS_ARGBTOUVMATRIXROW_AVX512BW
|
||||
#define HAS_J400TOARGBROW_AVX512BW
|
||||
#define HAS_MERGEUVROW_AVX512BW
|
||||
#define HAS_MIRRORROW_AVX512BW
|
||||
#define HAS_MIRRORSPLITUVROW_AVX512BW
|
||||
#define HAS_SPLITUVROW_AVX512BW
|
||||
#define HAS_RGBTOUVMATRIXROW_AVX512BW
|
||||
#endif
|
||||
|
||||
// The following are available on Neon platforms:
|
||||
@ -1041,7 +1048,7 @@ struct ArgbConstants {
|
||||
|
||||
#endif
|
||||
|
||||
#define IS_ALIGNED(p, a) (!((uintptr_t)(p) & ((a)-1)))
|
||||
#define IS_ALIGNED(p, a) (!((uintptr_t)(p) & ((a) - 1)))
|
||||
|
||||
#define align_buffer_64(var, size) \
|
||||
size_t var##_mem_size = (size); /* NOLINT */ \
|
||||
@ -1097,26 +1104,17 @@ struct ArgbConstants {
|
||||
#define IACA_UD_BYTES __asm__ __volatile__("\n\t .byte 0x0F, 0x0B");
|
||||
|
||||
#else /* Visual C */
|
||||
#define IACA_UD_BYTES \
|
||||
{ __asm _emit 0x0F __asm _emit 0x0B }
|
||||
#define IACA_UD_BYTES {__asm _emit 0x0F __asm _emit 0x0B}
|
||||
|
||||
#define IACA_SSC_MARK(x) \
|
||||
{ __asm mov ebx, x __asm _emit 0x64 __asm _emit 0x67 __asm _emit 0x90 }
|
||||
{__asm mov ebx, x __asm _emit 0x64 __asm _emit 0x67 __asm _emit 0x90}
|
||||
|
||||
#define IACA_VC64_START __writegsbyte(111, 111);
|
||||
#define IACA_VC64_END __writegsbyte(222, 222);
|
||||
#endif
|
||||
|
||||
#define IACA_START \
|
||||
{ \
|
||||
IACA_UD_BYTES \
|
||||
IACA_SSC_MARK(111) \
|
||||
}
|
||||
#define IACA_END \
|
||||
{ \
|
||||
IACA_SSC_MARK(222) \
|
||||
IACA_UD_BYTES \
|
||||
}
|
||||
#define IACA_START {IACA_UD_BYTES IACA_SSC_MARK(111)}
|
||||
#define IACA_END {IACA_SSC_MARK(222) IACA_UD_BYTES}
|
||||
|
||||
void I210AlphaToARGBRow_NEON(const uint16_t* src_y,
|
||||
const uint16_t* src_u,
|
||||
@ -1828,9 +1826,9 @@ void ARGBToUV444MatrixRow_NEON(const uint8_t* src_argb,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGBToYMatrixRow_NEON(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGBToUV444MatrixRow_Any_NEON(const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
@ -2194,10 +2192,26 @@ void RGB565ToYMatrixRow_C(const uint8_t* src_rgb565,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB1555ToYMatrixRow_C(const uint8_t* src_argb1555, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB1555ToUVMatrixRow_C(const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void ARGB4444ToYMatrixRow_C(const uint8_t* src_argb4444, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB4444ToUVMatrixRow_C(const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void ARGB1555ToYMatrixRow_C(const uint8_t* src_argb1555,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB1555ToUVMatrixRow_C(const uint8_t* src_argb1555,
|
||||
int src_stride_argb1555,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB4444ToYMatrixRow_C(const uint8_t* src_argb4444,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB4444ToUVMatrixRow_C(const uint8_t* src_argb4444,
|
||||
int src_stride_argb4444,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGB565ToUVMatrixRow_C(const uint8_t* src_rgb565,
|
||||
int src_stride_rgb565,
|
||||
uint8_t* dst_u,
|
||||
@ -2210,8 +2224,30 @@ void ARGBToUVMatrixRow_SSSE3(const uint8_t* src_argb,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_Any_AVX2(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_Any_AVX2(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_AVX512BW(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_Any_AVX512BW(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGBToUVMatrixRow_AVX2(const uint8_t* src_argb,
|
||||
int src_stride_argb,
|
||||
uint8_t* dst_u,
|
||||
@ -2301,18 +2337,66 @@ void RGB565ToUVMatrixRow_Any_AVX2(const uint8_t* src_rgb565,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGB565ToYMatrixRow_NEON(const uint8_t* src_rgb565, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB1555ToYMatrixRow_NEON(const uint8_t* src_argb1555, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB1555ToUVMatrixRow_NEON(const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void ARGB4444ToYMatrixRow_NEON(const uint8_t* src_argb4444, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB4444ToUVMatrixRow_NEON(const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGB565ToUVMatrixRow_NEON(const uint8_t* src_rgb565, int src_stride_rgb565, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGB565ToYMatrixRow_Any_NEON(const uint8_t* src_rgb565, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB1555ToYMatrixRow_Any_NEON(const uint8_t* src_argb1555, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB1555ToUVMatrixRow_Any_NEON(const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void ARGB4444ToYMatrixRow_Any_NEON(const uint8_t* src_argb4444, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void ARGB4444ToUVMatrixRow_Any_NEON(const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGB565ToUVMatrixRow_Any_NEON(const uint8_t* src_rgb565, int src_stride_rgb565, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGB565ToYMatrixRow_NEON(const uint8_t* src_rgb565,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB1555ToYMatrixRow_NEON(const uint8_t* src_argb1555,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB1555ToUVMatrixRow_NEON(const uint8_t* src_argb1555,
|
||||
int src_stride_argb1555,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB4444ToYMatrixRow_NEON(const uint8_t* src_argb4444,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB4444ToUVMatrixRow_NEON(const uint8_t* src_argb4444,
|
||||
int src_stride_argb4444,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGB565ToUVMatrixRow_NEON(const uint8_t* src_rgb565,
|
||||
int src_stride_rgb565,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGB565ToYMatrixRow_Any_NEON(const uint8_t* src_rgb565,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB1555ToYMatrixRow_Any_NEON(const uint8_t* src_argb1555,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB1555ToUVMatrixRow_Any_NEON(const uint8_t* src_argb1555,
|
||||
int src_stride_argb1555,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB4444ToYMatrixRow_Any_NEON(const uint8_t* src_argb4444,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void ARGB4444ToUVMatrixRow_Any_NEON(const uint8_t* src_argb4444,
|
||||
int src_stride_argb4444,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGB565ToUVMatrixRow_Any_NEON(const uint8_t* src_rgb565,
|
||||
int src_stride_rgb565,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
|
||||
void ARGBToYMatrixRow_AVX2(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
@ -2340,9 +2424,22 @@ void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
|
||||
void RGBToUVMatrixRow_NEON(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGBToYMatrixRow_Any_NEON(const uint8_t* src_rgb, uint8_t* dst_y, int width, const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_Any_NEON(const uint8_t* src_rgb, int src_stride_rgb, uint8_t* dst_u, uint8_t* dst_v, int width, const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_NEON(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGBToYMatrixRow_Any_NEON(const uint8_t* src_rgb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
void RGBToUVMatrixRow_Any_NEON(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
|
||||
void ARGBToYMatrixRow_NEON_DotProd(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
@ -2374,7 +2471,6 @@ void ARGBToYMatrixRow_Any_LASX(const uint8_t* src_argb,
|
||||
int width,
|
||||
const struct ArgbConstants* c);
|
||||
|
||||
|
||||
void ARGBToUV444MatrixRow_SSSE3(const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
@ -2432,15 +2528,29 @@ void RGBAToYRow_C(const uint8_t* src_rgb, uint8_t* dst_y, int width);
|
||||
void RGB565ToYRow_C(const uint8_t* src_rgb565, uint8_t* dst_y, int width);
|
||||
void ARGB1555ToYRow_C(const uint8_t* src_argb1555, uint8_t* dst_y, int width);
|
||||
void ARGB4444ToYRow_C(const uint8_t* src_argb4444, uint8_t* dst_y, int width);
|
||||
void ARGBToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void ARGBToYJRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void ABGRToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void ABGRToYJRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void ARGBToYRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void ARGBToYJRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void ABGRToYRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void ABGRToYJRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void RGBAToYRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void RGBAToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void RGBAToYJRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void RGBAToYRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void RGBAToYJRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void BGRAToYRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void BGRAToYRow_Any_AVX512BW(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void BGRAToYRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void ARGBToYRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void ARGBToYJRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void ABGRToYJRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
@ -3040,12 +3150,16 @@ void ARGBToUVJ444Row_C(const uint8_t* src_argb,
|
||||
uint8_t* dst_v,
|
||||
int width);
|
||||
|
||||
void MirrorRow_AVX512BW(const uint8_t* src, uint8_t* dst, int width);
|
||||
void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width);
|
||||
void MirrorRow_SSSE3(const uint8_t* src, uint8_t* dst, int width);
|
||||
void MirrorRow_NEON(const uint8_t* src, uint8_t* dst, int width);
|
||||
void MirrorRow_LSX(const uint8_t* src, uint8_t* dst, int width);
|
||||
void MirrorRow_LASX(const uint8_t* src, uint8_t* dst, int width);
|
||||
void MirrorRow_C(const uint8_t* src, uint8_t* dst, int width);
|
||||
void MirrorRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void MirrorRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void MirrorRow_Any_SSSE3(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void MirrorRow_Any_SSE2(const uint8_t* src, uint8_t* dst, int width);
|
||||
@ -3063,6 +3177,10 @@ void MirrorUVRow_Any_NEON(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void MirrorUVRow_Any_LSX(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void MirrorUVRow_Any_LASX(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
|
||||
void MirrorSplitUVRow_AVX512BW(const uint8_t* src,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width);
|
||||
void MirrorSplitUVRow_AVX2(const uint8_t* src,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
@ -3124,6 +3242,10 @@ void SplitUVRow_SSE2(const uint8_t* src_uv,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width);
|
||||
void SplitUVRow_AVX512BW(const uint8_t* src_uv,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width);
|
||||
void SplitUVRow_AVX2(const uint8_t* src_uv,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
@ -3140,6 +3262,10 @@ void SplitUVRow_RVV(const uint8_t* src_uv,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width);
|
||||
void SplitUVRow_Any_AVX512BW(const uint8_t* src_ptr,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width);
|
||||
void SplitUVRow_Any_SSE2(const uint8_t* src_ptr,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
@ -4160,8 +4286,12 @@ void RGB24ToARGBRow_SSSE3(const uint8_t* src_rgb24,
|
||||
int width);
|
||||
void RAWToARGBRow_SSSE3(const uint8_t* src_raw, uint8_t* dst_argb, int width);
|
||||
void RAWToARGBRow_AVX2(const uint8_t* src_raw, uint8_t* dst_argb, int width);
|
||||
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width);
|
||||
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, int width);
|
||||
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24,
|
||||
uint8_t* dst_argb,
|
||||
int width);
|
||||
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw,
|
||||
uint8_t* dst_argb,
|
||||
int width);
|
||||
void RAWToRGBARow_SSSE3(const uint8_t* src_raw, uint8_t* dst_rgba, int width);
|
||||
void RAWToRGB24Row_SSSE3(const uint8_t* src_raw, uint8_t* dst_rgb24, int width);
|
||||
|
||||
@ -4250,9 +4380,7 @@ void RGB24ToARGBRow_Any_SSSE3(const uint8_t* src_ptr,
|
||||
void RAWToARGBRow_Any_SSSE3(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void RAWToARGBRow_Any_AVX2(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
void RAWToARGBRow_Any_AVX2(const uint8_t* src_ptr, uint8_t* dst_ptr, int width);
|
||||
void RGB24ToARGBRow_Any_AVX2(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
@ -4272,7 +4400,6 @@ void RAWToRGB24Row_Any_SSSE3(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
|
||||
|
||||
void RGB565ToARGBRow_Any_AVX2(const uint8_t* src_ptr,
|
||||
uint8_t* dst_ptr,
|
||||
int width);
|
||||
|
||||
@ -631,8 +631,8 @@ static inline void I422ToRGB565Row_SVE_SC(
|
||||
// Calculate a predicate for the final iteration to deal with the tail.
|
||||
"cnth %[vl] \n"
|
||||
"whilelt p1.b, wzr, %w[width] \n" //
|
||||
READYUV422_SVE_2X I422TORGB_SVE_2X RGBTOARGB8_SVE_TOP_2X
|
||||
RGB8TORGB565_SVE_FROM_TOP_2X
|
||||
READYUV422_SVE_2X I422TORGB_SVE_2X
|
||||
RGBTOARGB8_SVE_TOP_2X RGB8TORGB565_SVE_FROM_TOP_2X
|
||||
// Need to permute the data on the final iteration such that the
|
||||
// predicates (.b) line up with the 16-bit element data.
|
||||
"trn1 z20.b, z18.b, z19.b \n"
|
||||
@ -694,8 +694,8 @@ static inline void I422ToARGB1555Row_SVE_SC(
|
||||
// Calculate a predicate for the final iteration to deal with the tail.
|
||||
"cnth %[vl] \n"
|
||||
"whilelt p1.b, wzr, %w[width] \n" //
|
||||
READYUV422_SVE_2X I422TORGB_SVE_2X RGBTOARGB8_SVE_TOP_2X
|
||||
RGB8TOARGB1555_SVE_FROM_TOP_2X
|
||||
READYUV422_SVE_2X I422TORGB_SVE_2X
|
||||
RGBTOARGB8_SVE_TOP_2X RGB8TOARGB1555_SVE_FROM_TOP_2X
|
||||
"st2h {z0.h, z1.h}, p1, [%[dst]] \n"
|
||||
|
||||
"99: \n"
|
||||
@ -753,8 +753,8 @@ static inline void I422ToARGB4444Row_SVE_SC(
|
||||
// Calculate a predicate for the final iteration to deal with the tail.
|
||||
"cnth %[vl] \n"
|
||||
"whilelt p1.b, wzr, %w[width] \n" //
|
||||
READYUV422_SVE_2X I422TORGB_SVE_2X RGBTOARGB8_SVE_TOP_2X
|
||||
RGB8TOARGB4444_SVE_FROM_TOP_2X
|
||||
READYUV422_SVE_2X I422TORGB_SVE_2X
|
||||
RGBTOARGB8_SVE_TOP_2X RGB8TOARGB4444_SVE_FROM_TOP_2X
|
||||
"st2h {z0.h, z1.h}, p1, [%[dst]] \n"
|
||||
|
||||
"99: \n"
|
||||
|
||||
@ -11,6 +11,6 @@
|
||||
#ifndef INCLUDE_LIBYUV_VERSION_H_
|
||||
#define INCLUDE_LIBYUV_VERSION_H_
|
||||
|
||||
#define LIBYUV_VERSION 1946
|
||||
#define LIBYUV_VERSION 1947
|
||||
|
||||
#endif // INCLUDE_LIBYUV_VERSION_H_
|
||||
|
||||
@ -116,7 +116,7 @@ uint32_t HashDjb2_NEON(const uint8_t* src, int count, uint32_t seed) {
|
||||
uint32_t hash = seed;
|
||||
const uint32_t c16 = 0x92d9e201; // 33^16
|
||||
uint32_t tmp, tmp2;
|
||||
asm("ld1 {v16.4s, v17.4s, v18.4s, v19.4s}, [%[kIdx]] \n"
|
||||
asm("ld1 {v16.4s, v17.4s, v18.4s, v19.4s}, [%[kIdx]] \n"
|
||||
"ld1 {v4.4s, v5.4s, v6.4s, v7.4s}, [%[kMuls]] \n"
|
||||
|
||||
// count is always a multiple of 16.
|
||||
|
||||
@ -41,8 +41,9 @@ uint32_t HammingDistance_SSE42(const uint8_t* src_a,
|
||||
return diff;
|
||||
}
|
||||
|
||||
__declspec(naked) uint32_t
|
||||
SumSquareError_SSE2(const uint8_t* src_a, const uint8_t* src_b, int count) {
|
||||
__declspec(naked) uint32_t SumSquareError_SSE2(const uint8_t* src_a,
|
||||
const uint8_t* src_b,
|
||||
int count) {
|
||||
__asm {
|
||||
mov eax, [esp + 4] // src_a
|
||||
mov edx, [esp + 8] // src_b
|
||||
@ -81,8 +82,9 @@ __declspec(naked) uint32_t
|
||||
#ifdef HAS_SUMSQUAREERROR_AVX2
|
||||
// C4752: found Intel(R) Advanced Vector Extensions; consider using /arch:AVX.
|
||||
#pragma warning(disable : 4752)
|
||||
__declspec(naked) uint32_t
|
||||
SumSquareError_AVX2(const uint8_t* src_a, const uint8_t* src_b, int count) {
|
||||
__declspec(naked) uint32_t SumSquareError_AVX2(const uint8_t* src_a,
|
||||
const uint8_t* src_b,
|
||||
int count) {
|
||||
__asm {
|
||||
mov eax, [esp + 4] // src_a
|
||||
mov edx, [esp + 8] // src_b
|
||||
@ -146,8 +148,9 @@ uvec32 kHashMul3 = {
|
||||
0x00000001, // 33 ^ 0
|
||||
};
|
||||
|
||||
__declspec(naked) uint32_t
|
||||
HashDjb2_SSE41(const uint8_t* src, int count, uint32_t seed) {
|
||||
__declspec(naked) uint32_t HashDjb2_SSE41(const uint8_t* src,
|
||||
int count,
|
||||
uint32_t seed) {
|
||||
__asm {
|
||||
mov eax, [esp + 4] // src
|
||||
mov ecx, [esp + 8] // count
|
||||
@ -197,8 +200,9 @@ __declspec(naked) uint32_t
|
||||
|
||||
// Visual C 2012 required for AVX2.
|
||||
#ifdef HAS_HASHDJB2_AVX2
|
||||
__declspec(naked) uint32_t
|
||||
HashDjb2_AVX2(const uint8_t* src, int count, uint32_t seed) {
|
||||
__declspec(naked) uint32_t HashDjb2_AVX2(const uint8_t* src,
|
||||
int count,
|
||||
uint32_t seed) {
|
||||
__asm {
|
||||
mov eax, [esp + 4] // src
|
||||
mov ecx, [esp + 8] // count
|
||||
|
||||
@ -13,12 +13,11 @@
|
||||
#include <limits.h>
|
||||
|
||||
#include "libyuv/basic_types.h"
|
||||
#include "libyuv/convert_from_argb.h"
|
||||
#include "libyuv/cpu_id.h"
|
||||
#include "libyuv/planar_functions.h"
|
||||
#include "libyuv/convert_from_argb.h"
|
||||
#include "libyuv/rotate.h"
|
||||
#include "libyuv/row.h"
|
||||
|
||||
#include "libyuv/scale.h" // For ScalePlane()
|
||||
#include "libyuv/scale_row.h" // For FixedDiv
|
||||
#include "libyuv/scale_uv.h" // For UVScale()
|
||||
@ -2034,8 +2033,8 @@ int ARGBToI420(const uint8_t* src_argb,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI420Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kArgbI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kArgbI601Constants, width, height);
|
||||
}
|
||||
|
||||
LIBYUV_API
|
||||
@ -2056,7 +2055,7 @@ int ARGBToI420Matrix(const uint8_t* src_argb,
|
||||
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) =
|
||||
ARGBToUVMatrixRow_C;
|
||||
ARGBToUVMatrixRow_C;
|
||||
|
||||
#if defined(HAS_ARGBTOYMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -2121,34 +2120,34 @@ ARGBToUVMatrixRow_C;
|
||||
#endif
|
||||
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON_I8MM)
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SVE2)
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SME)
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -2439,8 +2438,8 @@ int BGRAToI420(const uint8_t* src_bgra,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI420Matrix(src_bgra, src_stride_bgra, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kBgraI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kBgraI601Constants, width, height);
|
||||
}
|
||||
|
||||
// Convert BGRA to I422.
|
||||
@ -2456,8 +2455,8 @@ int BGRAToI422(const uint8_t* src_bgra,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI422Matrix(src_bgra, src_stride_bgra, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kBgraI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kBgraI601Constants, width, height);
|
||||
}
|
||||
|
||||
// Convert ABGR to I422.
|
||||
@ -2473,8 +2472,8 @@ int ABGRToI422(const uint8_t* src_abgr,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI422Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kAbgrI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kAbgrI601Constants, width, height);
|
||||
}
|
||||
|
||||
// Convert RGBA to I422.
|
||||
@ -2490,8 +2489,8 @@ int RGBAToI422(const uint8_t* src_rgba,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI422Matrix(src_rgba, src_stride_rgba, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kRgbaI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kRgbaI601Constants, width, height);
|
||||
}
|
||||
|
||||
// Convert ABGR to I420.
|
||||
@ -2507,8 +2506,8 @@ int ABGRToI420(const uint8_t* src_abgr,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI420Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kAbgrI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kAbgrI601Constants, width, height);
|
||||
}
|
||||
|
||||
// Convert RGBA to I420.
|
||||
@ -2524,8 +2523,8 @@ int RGBAToI420(const uint8_t* src_rgba,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI420Matrix(src_rgba, src_stride_rgba, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kRgbaI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kRgbaI601Constants, width, height);
|
||||
}
|
||||
|
||||
// Enabled if 1 pass is available
|
||||
@ -2569,6 +2568,14 @@ int RGB24ToI420(const uint8_t* src_rgb24,
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_RGBTOUVMATRIXROW_AVX512BW)
|
||||
if (TestCpuFlag(kCpuHasAVX512BW)) {
|
||||
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_AVX512BW;
|
||||
if (IS_ALIGNED(width, 64)) {
|
||||
RGBToUVMatrixRow = RGBToUVMatrixRow_AVX512BW;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_RGBTOUVMATRIXROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_NEON;
|
||||
@ -2603,9 +2610,11 @@ int RGB24ToI420(const uint8_t* src_rgb24,
|
||||
}
|
||||
|
||||
for (y = 0; y < height - 1; y += 2) {
|
||||
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
RGBToYMatrixRow(src_rgb24, dst_y, width, &kArgbI601Constants);
|
||||
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width, &kArgbI601Constants);
|
||||
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width,
|
||||
&kArgbI601Constants);
|
||||
src_rgb24 += src_stride_rgb24 * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_u += dst_stride_u;
|
||||
@ -2854,15 +2863,15 @@ int RGB24ToJ420(const uint8_t* src_rgb24,
|
||||
// Convert RAW to I420.
|
||||
LIBYUV_API
|
||||
int RAWToI420(const uint8_t* src_rgb24,
|
||||
int src_stride_rgb24,
|
||||
uint8_t* dst_y,
|
||||
int dst_stride_y,
|
||||
uint8_t* dst_u,
|
||||
int dst_stride_u,
|
||||
uint8_t* dst_v,
|
||||
int dst_stride_v,
|
||||
int width,
|
||||
int height) {
|
||||
int src_stride_rgb24,
|
||||
uint8_t* dst_y,
|
||||
int dst_stride_y,
|
||||
uint8_t* dst_u,
|
||||
int dst_stride_u,
|
||||
uint8_t* dst_v,
|
||||
int dst_stride_v,
|
||||
int width,
|
||||
int height) {
|
||||
int y;
|
||||
void (*RGBToUVMatrixRow)(const uint8_t* src_rgb, int src_stride_rgb,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
@ -2886,6 +2895,14 @@ int RAWToI420(const uint8_t* src_rgb24,
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_RGBTOUVMATRIXROW_AVX512BW)
|
||||
if (TestCpuFlag(kCpuHasAVX512BW)) {
|
||||
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_AVX512BW;
|
||||
if (IS_ALIGNED(width, 64)) {
|
||||
RGBToUVMatrixRow = RGBToUVMatrixRow_AVX512BW;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_RGBTOUVMATRIXROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
RGBToUVMatrixRow = RGBToUVMatrixRow_Any_NEON;
|
||||
@ -2920,9 +2937,11 @@ int RAWToI420(const uint8_t* src_rgb24,
|
||||
}
|
||||
|
||||
for (y = 0; y < height - 1; y += 2) {
|
||||
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
RGBToUVMatrixRow(src_rgb24, src_stride_rgb24, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
RGBToYMatrixRow(src_rgb24, dst_y, width, &kArgbI601Constants);
|
||||
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width, &kArgbI601Constants);
|
||||
RGBToYMatrixRow(src_rgb24 + src_stride_rgb24, dst_y + dst_stride_y, width,
|
||||
&kArgbI601Constants);
|
||||
src_rgb24 += src_stride_rgb24 * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_u += dst_stride_u;
|
||||
@ -3622,9 +3641,11 @@ int RGB565ToI420(const uint8_t* src_rgb565,
|
||||
int y;
|
||||
void (*RGB565ToUVMatrixRow)(const uint8_t* src_rgb565, int src_stride_rgb565,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) = RGB565ToUVMatrixRow_C;
|
||||
void (*RGB565ToYMatrixRow)(const uint8_t* src_rgb565, uint8_t* dst_y, int width,
|
||||
const struct ArgbConstants* c) = RGB565ToYMatrixRow_C;
|
||||
const struct ArgbConstants* c) =
|
||||
RGB565ToUVMatrixRow_C;
|
||||
void (*RGB565ToYMatrixRow)(const uint8_t* src_rgb565, uint8_t* dst_y,
|
||||
int width, const struct ArgbConstants* c) =
|
||||
RGB565ToYMatrixRow_C;
|
||||
|
||||
#if defined(HAS_RGB565TOYMATRIXROW_AVX2)
|
||||
if (TestCpuFlag(kCpuHasAVX2)) {
|
||||
@ -3671,9 +3692,11 @@ int RGB565ToI420(const uint8_t* src_rgb565,
|
||||
}
|
||||
|
||||
for (y = 0; y < height - 1; y += 2) {
|
||||
RGB565ToUVMatrixRow(src_rgb565, src_stride_rgb565, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
RGB565ToUVMatrixRow(src_rgb565, src_stride_rgb565, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
RGB565ToYMatrixRow(src_rgb565, dst_y, width, &kArgbI601Constants);
|
||||
RGB565ToYMatrixRow(src_rgb565 + src_stride_rgb565, dst_y + dst_stride_y, width, &kArgbI601Constants);
|
||||
RGB565ToYMatrixRow(src_rgb565 + src_stride_rgb565, dst_y + dst_stride_y,
|
||||
width, &kArgbI601Constants);
|
||||
src_rgb565 += src_stride_rgb565 * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_u += dst_stride_u;
|
||||
@ -3681,30 +3704,31 @@ int RGB565ToI420(const uint8_t* src_rgb565,
|
||||
}
|
||||
if (height & 1) {
|
||||
RGB565ToYMatrixRow(src_rgb565, dst_y, width, &kArgbI601Constants);
|
||||
RGB565ToUVMatrixRow(src_rgb565, 0, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
RGB565ToUVMatrixRow(src_rgb565, 0, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
// Convert ARGB1555 to I420.
|
||||
LIBYUV_API
|
||||
int ARGB1555ToI420(const uint8_t* src_argb1555,
|
||||
int src_stride_argb1555,
|
||||
uint8_t* dst_y,
|
||||
int dst_stride_y,
|
||||
uint8_t* dst_u,
|
||||
int dst_stride_u,
|
||||
uint8_t* dst_v,
|
||||
int dst_stride_v,
|
||||
int width,
|
||||
int height) {
|
||||
int src_stride_argb1555,
|
||||
uint8_t* dst_y,
|
||||
int dst_stride_y,
|
||||
uint8_t* dst_u,
|
||||
int dst_stride_u,
|
||||
uint8_t* dst_v,
|
||||
int dst_stride_v,
|
||||
int width,
|
||||
int height) {
|
||||
int y;
|
||||
void (*ARGB1555ToUVMatrixRow)(
|
||||
const uint8_t* src_argb1555, int src_stride_argb1555, uint8_t* dst_u,
|
||||
uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) = ARGB1555ToUVMatrixRow_C;
|
||||
void (*ARGB1555ToYMatrixRow)(
|
||||
const uint8_t* src_argb1555, uint8_t* dst_y, int width,
|
||||
const struct ArgbConstants* c) = ARGB1555ToYMatrixRow_C;
|
||||
uint8_t* dst_v, int width, const struct ArgbConstants* c) =
|
||||
ARGB1555ToUVMatrixRow_C;
|
||||
void (*ARGB1555ToYMatrixRow)(const uint8_t* src_argb1555, uint8_t* dst_y,
|
||||
int width, const struct ArgbConstants* c) =
|
||||
ARGB1555ToYMatrixRow_C;
|
||||
|
||||
#if defined(HAS_ARGB1555TOYMATRIXROW_AVX2)
|
||||
if (TestCpuFlag(kCpuHasAVX2)) {
|
||||
@ -3751,9 +3775,11 @@ int ARGB1555ToI420(const uint8_t* src_argb1555,
|
||||
}
|
||||
|
||||
for (y = 0; y < height - 1; y += 2) {
|
||||
ARGB1555ToUVMatrixRow(src_argb1555, src_stride_argb1555, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
ARGB1555ToUVMatrixRow(src_argb1555, src_stride_argb1555, dst_u, dst_v,
|
||||
width, &kArgbI601Constants);
|
||||
ARGB1555ToYMatrixRow(src_argb1555, dst_y, width, &kArgbI601Constants);
|
||||
ARGB1555ToYMatrixRow(src_argb1555 + src_stride_argb1555, dst_y + dst_stride_y, width, &kArgbI601Constants);
|
||||
ARGB1555ToYMatrixRow(src_argb1555 + src_stride_argb1555,
|
||||
dst_y + dst_stride_y, width, &kArgbI601Constants);
|
||||
src_argb1555 += src_stride_argb1555 * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_u += dst_stride_u;
|
||||
@ -3761,30 +3787,31 @@ int ARGB1555ToI420(const uint8_t* src_argb1555,
|
||||
}
|
||||
if (height & 1) {
|
||||
ARGB1555ToYMatrixRow(src_argb1555, dst_y, width, &kArgbI601Constants);
|
||||
ARGB1555ToUVMatrixRow(src_argb1555, 0, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
ARGB1555ToUVMatrixRow(src_argb1555, 0, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
// Convert ARGB4444 to I420.
|
||||
LIBYUV_API
|
||||
int ARGB4444ToI420(const uint8_t* src_argb4444,
|
||||
int src_stride_argb4444,
|
||||
uint8_t* dst_y,
|
||||
int dst_stride_y,
|
||||
uint8_t* dst_u,
|
||||
int dst_stride_u,
|
||||
uint8_t* dst_v,
|
||||
int dst_stride_v,
|
||||
int width,
|
||||
int height) {
|
||||
int src_stride_argb4444,
|
||||
uint8_t* dst_y,
|
||||
int dst_stride_y,
|
||||
uint8_t* dst_u,
|
||||
int dst_stride_u,
|
||||
uint8_t* dst_v,
|
||||
int dst_stride_v,
|
||||
int width,
|
||||
int height) {
|
||||
int y;
|
||||
void (*ARGB4444ToUVMatrixRow)(
|
||||
const uint8_t* src_argb4444, int src_stride_argb4444, uint8_t* dst_u,
|
||||
uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) = ARGB4444ToUVMatrixRow_C;
|
||||
void (*ARGB4444ToYMatrixRow)(
|
||||
const uint8_t* src_argb4444, uint8_t* dst_y, int width,
|
||||
const struct ArgbConstants* c) = ARGB4444ToYMatrixRow_C;
|
||||
uint8_t* dst_v, int width, const struct ArgbConstants* c) =
|
||||
ARGB4444ToUVMatrixRow_C;
|
||||
void (*ARGB4444ToYMatrixRow)(const uint8_t* src_argb4444, uint8_t* dst_y,
|
||||
int width, const struct ArgbConstants* c) =
|
||||
ARGB4444ToYMatrixRow_C;
|
||||
|
||||
#if defined(HAS_ARGB4444TOYMATRIXROW_AVX2)
|
||||
if (TestCpuFlag(kCpuHasAVX2)) {
|
||||
@ -3831,9 +3858,11 @@ int ARGB4444ToI420(const uint8_t* src_argb4444,
|
||||
}
|
||||
|
||||
for (y = 0; y < height - 1; y += 2) {
|
||||
ARGB4444ToUVMatrixRow(src_argb4444, src_stride_argb4444, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
ARGB4444ToUVMatrixRow(src_argb4444, src_stride_argb4444, dst_u, dst_v,
|
||||
width, &kArgbI601Constants);
|
||||
ARGB4444ToYMatrixRow(src_argb4444, dst_y, width, &kArgbI601Constants);
|
||||
ARGB4444ToYMatrixRow(src_argb4444 + src_stride_argb4444, dst_y + dst_stride_y, width, &kArgbI601Constants);
|
||||
ARGB4444ToYMatrixRow(src_argb4444 + src_stride_argb4444,
|
||||
dst_y + dst_stride_y, width, &kArgbI601Constants);
|
||||
src_argb4444 += src_stride_argb4444 * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_u += dst_stride_u;
|
||||
@ -3841,7 +3870,8 @@ int ARGB4444ToI420(const uint8_t* src_argb4444,
|
||||
}
|
||||
if (height & 1) {
|
||||
ARGB4444ToYMatrixRow(src_argb4444, dst_y, width, &kArgbI601Constants);
|
||||
ARGB4444ToUVMatrixRow(src_argb4444, 0, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
ARGB4444ToUVMatrixRow(src_argb4444, 0, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
@ -3993,7 +4023,7 @@ int RGB24ToJ400(const uint8_t* src_rgb24,
|
||||
RGB24ToARGBRow = RGB24ToARGBRow_RVV;
|
||||
}
|
||||
#endif
|
||||
{
|
||||
{
|
||||
// Allocate 1 row of ARGB.
|
||||
const int row_size = (width * 4 + 31) & ~31;
|
||||
align_buffer_64(row, row_size);
|
||||
|
||||
@ -3720,7 +3720,7 @@ int RGB24ToARGB(const uint8_t* src_rgb24,
|
||||
RGB24ToARGBRow = RGB24ToARGBRow_RVV;
|
||||
}
|
||||
#endif
|
||||
for (y = 0; y < height; ++y) {
|
||||
for (y = 0; y < height; ++y) {
|
||||
RGB24ToARGBRow(src_rgb24, dst_argb, width);
|
||||
src_rgb24 += src_stride_rgb24;
|
||||
dst_argb += dst_stride_argb;
|
||||
|
||||
@ -35,8 +35,8 @@ int ARGBToI444(const uint8_t* src_argb,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI444Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kArgbI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kArgbI601Constants, width, height);
|
||||
}
|
||||
|
||||
LIBYUV_API
|
||||
@ -54,10 +54,9 @@ int ARGBToI444Matrix(const uint8_t* src_argb,
|
||||
int y;
|
||||
void (*ARGBToYMatrixRow)(const uint8_t* src_argb, uint8_t* dst_y, int width,
|
||||
const struct ArgbConstants* c) = ARGBToYMatrixRow_C;
|
||||
void (*ARGBToUV444MatrixRow)(const uint8_t* src_argb, uint8_t* dst_u,
|
||||
uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) =
|
||||
ARGBToUV444MatrixRow_C;
|
||||
void (*ARGBToUV444MatrixRow)(
|
||||
const uint8_t* src_argb, uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) = ARGBToUV444MatrixRow_C;
|
||||
|
||||
#if defined(HAS_ARGBTOYMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -188,8 +187,8 @@ int ARGBToI422(const uint8_t* src_argb,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI422Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kArgbI601Constants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kArgbI601Constants, width, height);
|
||||
}
|
||||
|
||||
LIBYUV_API
|
||||
@ -210,7 +209,7 @@ int ARGBToI422Matrix(const uint8_t* src_argb,
|
||||
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) =
|
||||
ARGBToUVMatrixRow_C;
|
||||
ARGBToUVMatrixRow_C;
|
||||
|
||||
#if defined(HAS_ARGBTOYMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -275,34 +274,34 @@ ARGBToUVMatrixRow_C;
|
||||
#endif
|
||||
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON_I8MM)
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SVE2)
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SME)
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -359,8 +358,9 @@ int ARGBToNV12(const uint8_t* src_argb,
|
||||
int dst_stride_uv,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToNV12Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_uv,
|
||||
dst_stride_uv, &kArgbI601Constants, width, height);
|
||||
return ARGBToNV12Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y,
|
||||
dst_uv, dst_stride_uv, &kArgbI601Constants, width,
|
||||
height);
|
||||
}
|
||||
|
||||
LIBYUV_API
|
||||
@ -380,7 +380,7 @@ int ARGBToNV12Matrix(const uint8_t* src_argb,
|
||||
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) =
|
||||
ARGBToUVMatrixRow_C;
|
||||
ARGBToUVMatrixRow_C;
|
||||
|
||||
#if defined(HAS_ARGBTOYMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -445,34 +445,34 @@ ARGBToUVMatrixRow_C;
|
||||
#endif
|
||||
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON_I8MM)
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SVE2)
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SME)
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -565,7 +565,7 @@ ARGBToUVMatrixRow_C;
|
||||
MergeUVRow(row_u, row_v, dst_uv, halfwidth);
|
||||
ARGBToYMatrixRow(src_argb, dst_y, width, argbconstants);
|
||||
ARGBToYMatrixRow(src_argb + src_stride_argb, dst_y + dst_stride_y, width,
|
||||
argbconstants);
|
||||
argbconstants);
|
||||
src_argb += src_stride_argb * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_uv += dst_stride_uv;
|
||||
@ -595,7 +595,7 @@ int ARGBToNV21Matrix(const uint8_t* src_argb,
|
||||
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) =
|
||||
ARGBToUVMatrixRow_C;
|
||||
ARGBToUVMatrixRow_C;
|
||||
|
||||
#if defined(HAS_ARGBTOYMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -660,34 +660,34 @@ ARGBToUVMatrixRow_C;
|
||||
#endif
|
||||
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON_I8MM)
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SVE2)
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SME)
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -780,7 +780,7 @@ ARGBToUVMatrixRow_C;
|
||||
MergeUVRow(row_u, row_v, dst_vu, halfwidth);
|
||||
ARGBToYMatrixRow(src_argb, dst_y, width, argbconstants);
|
||||
ARGBToYMatrixRow(src_argb + src_stride_argb, dst_y + dst_stride_y, width,
|
||||
argbconstants);
|
||||
argbconstants);
|
||||
src_argb += src_stride_argb * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_vu += dst_stride_uv;
|
||||
@ -864,7 +864,8 @@ int ARGBToYUY2Matrix(const uint8_t* src_argb,
|
||||
int y;
|
||||
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) = ARGBToUVMatrixRow_C;
|
||||
const struct ArgbConstants* c) =
|
||||
ARGBToUVMatrixRow_C;
|
||||
void (*ARGBToYMatrixRow)(const uint8_t* src_argb, uint8_t* dst_y, int width,
|
||||
const struct ArgbConstants* c) = ARGBToYMatrixRow_C;
|
||||
void (*I422ToYUY2Row)(const uint8_t* src_y, const uint8_t* src_u,
|
||||
@ -976,7 +977,8 @@ int ARGBToUYVYMatrix(const uint8_t* src_argb,
|
||||
int y;
|
||||
void (*ARGBToUVMatrixRow)(const uint8_t* src_argb, int src_stride_argb,
|
||||
uint8_t* dst_u, uint8_t* dst_v, int width,
|
||||
const struct ArgbConstants* c) = ARGBToUVMatrixRow_C;
|
||||
const struct ArgbConstants* c) =
|
||||
ARGBToUVMatrixRow_C;
|
||||
void (*ARGBToYMatrixRow)(const uint8_t* src_argb, uint8_t* dst_y, int width,
|
||||
const struct ArgbConstants* c) = ARGBToYMatrixRow_C;
|
||||
void (*I422ToUYVYRow)(const uint8_t* src_y, const uint8_t* src_u,
|
||||
@ -1077,8 +1079,6 @@ int ARGBToUYVYMatrix(const uint8_t* src_argb,
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
|
||||
// Same as NV12 but U and V swapped.
|
||||
LIBYUV_API
|
||||
int ARGBToNV21(const uint8_t* src_argb,
|
||||
@ -1089,8 +1089,9 @@ int ARGBToNV21(const uint8_t* src_argb,
|
||||
int dst_stride_vu,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToNV21Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_vu,
|
||||
dst_stride_vu, &kArgbI601Constants, width, height);
|
||||
return ARGBToNV21Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y,
|
||||
dst_vu, dst_stride_vu, &kArgbI601Constants, width,
|
||||
height);
|
||||
}
|
||||
|
||||
LIBYUV_API
|
||||
@ -1102,8 +1103,9 @@ int ABGRToNV12(const uint8_t* src_abgr,
|
||||
int dst_stride_uv,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToNV12Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_uv,
|
||||
dst_stride_uv, &kAbgrI601Constants, width, height);
|
||||
return ARGBToNV12Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y,
|
||||
dst_uv, dst_stride_uv, &kAbgrI601Constants, width,
|
||||
height);
|
||||
}
|
||||
|
||||
// Same as NV12 but U and V swapped.
|
||||
@ -1116,8 +1118,9 @@ int ABGRToNV21(const uint8_t* src_abgr,
|
||||
int dst_stride_vu,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToNV21Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_vu,
|
||||
dst_stride_vu, &kAbgrI601Constants, width, height);
|
||||
return ARGBToNV21Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y,
|
||||
dst_vu, dst_stride_vu, &kAbgrI601Constants, width,
|
||||
height);
|
||||
}
|
||||
|
||||
// Convert ARGB to YUY2.
|
||||
@ -1819,8 +1822,8 @@ int ARGBToJ444(const uint8_t* src_argb,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI444Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kArgbJPEGConstants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kArgbJPEGConstants, width, height);
|
||||
}
|
||||
|
||||
// Convert ARGB to J420. (JPeg full range I420).
|
||||
@ -1836,8 +1839,8 @@ int ARGBToJ420(const uint8_t* src_argb,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI420Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kArgbJPEGConstants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kArgbJPEGConstants, width, height);
|
||||
}
|
||||
|
||||
// Convert ARGB to J422. (JPeg full range I422).
|
||||
@ -1853,8 +1856,8 @@ int ARGBToJ422(const uint8_t* src_argb,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI422Matrix(src_argb, src_stride_argb, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kArgbJPEGConstants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kArgbJPEGConstants, width, height);
|
||||
}
|
||||
|
||||
// Convert ARGB to J400.
|
||||
@ -1978,8 +1981,8 @@ int ABGRToJ420(const uint8_t* src_abgr,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI420Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kAbgrJPEGConstants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kAbgrJPEGConstants, width, height);
|
||||
}
|
||||
|
||||
// Convert ABGR to J422. (JPeg full range I422).
|
||||
@ -1995,8 +1998,8 @@ int ABGRToJ422(const uint8_t* src_abgr,
|
||||
int width,
|
||||
int height) {
|
||||
return ARGBToI422Matrix(src_abgr, src_stride_abgr, dst_y, dst_stride_y, dst_u,
|
||||
dst_stride_u, dst_v, dst_stride_v, &kAbgrJPEGConstants,
|
||||
width, height);
|
||||
dst_stride_u, dst_v, dst_stride_v,
|
||||
&kAbgrJPEGConstants, width, height);
|
||||
}
|
||||
|
||||
// Convert ABGR to J400.
|
||||
@ -2165,7 +2168,7 @@ int RAWToNV21Matrix(const uint8_t* src_raw,
|
||||
void (*ARGBToYMatrixRow)(const uint8_t* src_argb, uint8_t* dst_y, int width,
|
||||
const struct ArgbConstants* c) = ARGBToYMatrixRow_C;
|
||||
void (*MergeUVRow)(const uint8_t* src_uj, const uint8_t* src_vj,
|
||||
uint8_t* dst_vu, int width) = MergeUVRow_C;
|
||||
uint8_t* dst_vu, int width) = MergeUVRow_C;
|
||||
#if defined(HAS_ARGBTOYMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
ARGBToYMatrixRow = ARGBToYMatrixRow_Any_SSSE3;
|
||||
@ -2298,34 +2301,34 @@ int RAWToNV21Matrix(const uint8_t* src_raw,
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON_I8MM)
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasNEON) && TestCpuFlag(kCpuHasNeonI8MM)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_Any_NEON_I8MM;
|
||||
if (IS_ALIGNED(width, 16)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_NEON_I8MM;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SVE2)
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSVE2)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SVE2;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SME)
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
if (TestCpuFlag(kCpuHasSME)) {
|
||||
if (IS_ALIGNED(width, 2)) {
|
||||
ARGBToUVMatrixRow = ARGBToUVMatrixRow_SME;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_SSSE3)
|
||||
if (TestCpuFlag(kCpuHasSSSE3)) {
|
||||
@ -2424,7 +2427,8 @@ int RAWToNV21Matrix(const uint8_t* src_raw,
|
||||
ARGBToUVMatrixRow(row, row_size, row_u, row_v, width, argbconstants);
|
||||
MergeUVRow(row_v, row_u, dst_vu, halfwidth);
|
||||
ARGBToYMatrixRow(row, dst_y, width, argbconstants);
|
||||
ARGBToYMatrixRow(row + row_size, dst_y + dst_stride_y, width, argbconstants);
|
||||
ARGBToYMatrixRow(row + row_size, dst_y + dst_stride_y, width,
|
||||
argbconstants);
|
||||
src_raw += src_stride_raw * 2;
|
||||
dst_y += dst_stride_y * 2;
|
||||
dst_vu += dst_stride_vu;
|
||||
@ -2482,7 +2486,6 @@ int RGB24ToNV12(const uint8_t* src_rgb24,
|
||||
height);
|
||||
}
|
||||
|
||||
|
||||
#ifdef __cplusplus
|
||||
} // extern "C"
|
||||
} // namespace libyuv
|
||||
|
||||
@ -8,13 +8,13 @@
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*/
|
||||
|
||||
#include "libyuv/convert_from_argb.h" // For ArgbConstants
|
||||
#include "libyuv/planar_functions.h"
|
||||
|
||||
#include <assert.h>
|
||||
#include <limits.h>
|
||||
#include <string.h> // for memset()
|
||||
|
||||
#include "libyuv/convert_from_argb.h" // For ArgbConstants
|
||||
#include "libyuv/cpu_id.h"
|
||||
#include "libyuv/row.h"
|
||||
#include "libyuv/scale_row.h" // for ScaleRowDown2
|
||||
@ -630,6 +630,14 @@ void SplitUVPlane(const uint8_t* src_uv,
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_SPLITUVROW_AVX512BW)
|
||||
if (TestCpuFlag(kCpuHasAVX512BW)) {
|
||||
SplitUVRow = SplitUVRow_Any_AVX512BW;
|
||||
if (IS_ALIGNED(width, 64)) {
|
||||
SplitUVRow = SplitUVRow_AVX512BW;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_SPLITUVROW_NEON)
|
||||
if (TestCpuFlag(kCpuHasNEON)) {
|
||||
SplitUVRow = SplitUVRow_Any_NEON;
|
||||
@ -1087,7 +1095,7 @@ int NV21ToNV12(const uint8_t* src_y,
|
||||
}
|
||||
|
||||
// Test if tile_height is a power of 2 (16 or 32)
|
||||
#define IS_POWEROFTWO(x) (!((x) & ((x)-1)))
|
||||
#define IS_POWEROFTWO(x) (!((x) & ((x) - 1)))
|
||||
|
||||
// Detile a plane of data
|
||||
// tile width is 16 and assumed.
|
||||
@ -2588,6 +2596,14 @@ void MirrorPlane(const uint8_t* src_y,
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_MIRRORROW_AVX512BW)
|
||||
if (TestCpuFlag(kCpuHasAVX512BW)) {
|
||||
MirrorRow = MirrorRow_Any_AVX512BW;
|
||||
if (IS_ALIGNED(width, 64)) {
|
||||
MirrorRow = MirrorRow_AVX512BW;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_MIRRORROW_LSX)
|
||||
if (TestCpuFlag(kCpuHasLSX)) {
|
||||
MirrorRow = MirrorRow_Any_LSX;
|
||||
|
||||
@ -8,11 +8,11 @@
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*/
|
||||
|
||||
#include "libyuv/rotate.h"
|
||||
|
||||
#include <assert.h>
|
||||
#include <limits.h>
|
||||
|
||||
#include "libyuv/rotate.h"
|
||||
|
||||
#include "libyuv/convert.h"
|
||||
#include "libyuv/cpu_id.h"
|
||||
#include "libyuv/planar_functions.h"
|
||||
@ -403,6 +403,11 @@ void SplitRotateUV180(const uint8_t* src,
|
||||
MirrorSplitUVRow = MirrorSplitUVRow_AVX2;
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_MIRRORSPLITUVROW_AVX512BW)
|
||||
if (TestCpuFlag(kCpuHasAVX512BW) && IS_ALIGNED(width, 32)) {
|
||||
MirrorSplitUVRow = MirrorSplitUVRow_AVX512BW;
|
||||
}
|
||||
#endif
|
||||
#if defined(HAS_MIRRORSPLITUVROW_LSX)
|
||||
if (TestCpuFlag(kCpuHasLSX) && IS_ALIGNED(width, 32)) {
|
||||
MirrorSplitUVRow = MirrorSplitUVRow_LSX;
|
||||
|
||||
@ -64,7 +64,7 @@ __declspec(naked) void TransposeWx8_SSSE3(const uint8_t* src,
|
||||
mov eax, ebp
|
||||
movdqa xmm7, xmm6
|
||||
palignr xmm7, xmm7, 8
|
||||
// Second round of bit swap.
|
||||
// Second round of bit swap.
|
||||
punpcklwd xmm0, xmm2
|
||||
punpcklwd xmm1, xmm3
|
||||
movdqa xmm2, xmm0
|
||||
@ -77,8 +77,8 @@ __declspec(naked) void TransposeWx8_SSSE3(const uint8_t* src,
|
||||
movdqa xmm7, xmm5
|
||||
palignr xmm6, xmm6, 8
|
||||
palignr xmm7, xmm7, 8
|
||||
// Third round of bit swap.
|
||||
// Write to the destination pointer.
|
||||
// Third round of bit swap.
|
||||
// Write to the destination pointer.
|
||||
punpckldq xmm0, xmm4
|
||||
movq qword ptr [edx], xmm0
|
||||
movdqa xmm4, xmm0
|
||||
@ -173,7 +173,7 @@ __declspec(naked) void TransposeUVWx8_SSE2(const uint8_t* src,
|
||||
movdqa xmm7, xmm5
|
||||
lea eax, [eax + 8 * edi + 16]
|
||||
neg edi
|
||||
// Second round of bit swap.
|
||||
// Second round of bit swap.
|
||||
movdqa xmm5, xmm0
|
||||
punpcklwd xmm0, xmm2
|
||||
punpckhwd xmm5, xmm2
|
||||
@ -193,8 +193,8 @@ __declspec(naked) void TransposeUVWx8_SSE2(const uint8_t* src,
|
||||
punpckhwd xmm6, xmm7
|
||||
movdqa xmm7, xmm6
|
||||
|
||||
// Third round of bit swap.
|
||||
// Write to the destination pointer.
|
||||
// Third round of bit swap.
|
||||
// Write to the destination pointer.
|
||||
movdqa xmm6, xmm0
|
||||
punpckldq xmm0, xmm4
|
||||
punpckhdq xmm6, xmm4
|
||||
|
||||
@ -1919,6 +1919,9 @@ ANY11IS(InterpolateRow_16To8_Any_AVX2,
|
||||
memcpy(dst_ptr + np * BPP, vout + (MASK + 1 - r) * BPP, r * BPP); \
|
||||
}
|
||||
|
||||
#ifdef HAS_MIRRORROW_AVX512BW
|
||||
ANY11M(MirrorRow_Any_AVX512BW, MirrorRow_AVX512BW, 1, 63)
|
||||
#endif
|
||||
#ifdef HAS_MIRRORROW_AVX2
|
||||
ANY11M(MirrorRow_Any_AVX2, MirrorRow_AVX2, 1, 31)
|
||||
#endif
|
||||
@ -2022,6 +2025,9 @@ ANY1(ARGBSetRow_Any_LSX, ARGBSetRow_LSX, uint32_t, 4, 3)
|
||||
#ifdef HAS_SPLITUVROW_SSE2
|
||||
ANY12(SplitUVRow_Any_SSE2, SplitUVRow_SSE2, 0, 2, 0, 15)
|
||||
#endif
|
||||
#ifdef HAS_SPLITUVROW_AVX512BW
|
||||
ANY12(SplitUVRow_Any_AVX512BW, SplitUVRow_AVX512BW, 0, 2, 0, 63)
|
||||
#endif
|
||||
#ifdef HAS_SPLITUVROW_AVX2
|
||||
ANY12(SplitUVRow_Any_AVX2, SplitUVRow_AVX2, 0, 2, 0, 31)
|
||||
#endif
|
||||
@ -2193,7 +2199,7 @@ ANY14(SplitARGBRow_Any_NEON, SplitARGBRow_NEON, 4, 15)
|
||||
uint8_t* dst_v, int width) { \
|
||||
SIMD_ALIGNED(uint8_t vin[256 * 2]); \
|
||||
SIMD_ALIGNED(uint8_t vout[256 * 2]); \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
memset(vout, 0, sizeof(vout)); /* for msan */ \
|
||||
int r = width & MASK; \
|
||||
int n = width & ~MASK; \
|
||||
@ -2215,29 +2221,29 @@ ANY14(SplitARGBRow_Any_NEON, SplitARGBRow_NEON, 4, 15)
|
||||
memcpy(dst_v + (np >> 1), vout + 256, SS(r, 1)); \
|
||||
}
|
||||
|
||||
#define ANY12M(NAMEANY, ANY_SIMD, BPP, MASK) \
|
||||
void NAMEANY(const uint8_t* src_ptr, uint8_t* dst_u, uint8_t* dst_v, \
|
||||
int width, const struct ArgbConstants* c) { \
|
||||
SIMD_ALIGNED(uint8_t vin[256]); \
|
||||
SIMD_ALIGNED(uint8_t vout[256 * 2]); \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
int r = width & MASK; \
|
||||
int n = width & ~MASK; \
|
||||
if (n > 0) { \
|
||||
ANY_SIMD(src_ptr, dst_u, dst_v, n, c); \
|
||||
} \
|
||||
memcpy(vin, src_ptr + (ptrdiff_t)n * BPP, (ptrdiff_t)r * BPP); \
|
||||
ANY_SIMD(vin, vout, vout + 256, MASK + 1, c); \
|
||||
memcpy(dst_u + (ptrdiff_t)n, vout, (ptrdiff_t)r); \
|
||||
memcpy(dst_v + (ptrdiff_t)n, vout + 256, (ptrdiff_t)r); \
|
||||
#define ANY12M(NAMEANY, ANY_SIMD, BPP, MASK) \
|
||||
void NAMEANY(const uint8_t* src_ptr, uint8_t* dst_u, uint8_t* dst_v, \
|
||||
int width, const struct ArgbConstants* c) { \
|
||||
SIMD_ALIGNED(uint8_t vin[256]); \
|
||||
SIMD_ALIGNED(uint8_t vout[256 * 2]); \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
int r = width & MASK; \
|
||||
int n = width & ~MASK; \
|
||||
if (n > 0) { \
|
||||
ANY_SIMD(src_ptr, dst_u, dst_v, n, c); \
|
||||
} \
|
||||
memcpy(vin, src_ptr + (ptrdiff_t)n * BPP, (ptrdiff_t)r * BPP); \
|
||||
ANY_SIMD(vin, vout, vout + 256, MASK + 1, c); \
|
||||
memcpy(dst_u + (ptrdiff_t)n, vout, (ptrdiff_t)r); \
|
||||
memcpy(dst_v + (ptrdiff_t)n, vout + 256, (ptrdiff_t)r); \
|
||||
}
|
||||
|
||||
#define ANY12MS(NAMEANY, ANY_SIMD, UVSHIFT, BPP, MASK) \
|
||||
void NAMEANY(const uint8_t* src_ptr, int src_stride, uint8_t* dst_u, \
|
||||
uint8_t* dst_v, int width, const struct ArgbConstants* c) { \
|
||||
void NAMEANY(const uint8_t* src_ptr, int src_stride, uint8_t* dst_u, \
|
||||
uint8_t* dst_v, int width, const struct ArgbConstants* c) { \
|
||||
SIMD_ALIGNED(uint8_t vin[256 * 2]); \
|
||||
SIMD_ALIGNED(uint8_t vout[256 * 2]); \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
memset(vout, 0, sizeof(vout)); /* for msan */ \
|
||||
int r = width & MASK; \
|
||||
int n = width & ~MASK; \
|
||||
@ -2291,6 +2297,9 @@ ANY12MS(ARGB4444ToUVMatrixRow_Any_AVX2, ARGB4444ToUVMatrixRow_AVX2, 0, 2, 31)
|
||||
#ifdef HAS_ARGBTOUVMATRIXROW_AVX512BW
|
||||
ANY12MS(ARGBToUVMatrixRow_Any_AVX512BW, ARGBToUVMatrixRow_AVX512BW, 0, 4, 63)
|
||||
#endif
|
||||
#ifdef HAS_RGBTOUVMATRIXROW_AVX512BW
|
||||
ANY12MS(RGBToUVMatrixRow_Any_AVX512BW, RGBToUVMatrixRow_AVX512BW, 0, 3, 63)
|
||||
#endif
|
||||
#ifdef HAS_ARGBTOUVMATRIXROW_SSSE3
|
||||
ANY12MS(ARGBToUVMatrixRow_Any_SSSE3, ARGBToUVMatrixRow_SSSE3, 0, 4, 7)
|
||||
#endif
|
||||
@ -2307,20 +2316,20 @@ ANY12M(ARGBToUV444MatrixRow_Any_SSSE3, ARGBToUV444MatrixRow_SSSE3, 4, 15)
|
||||
ANY12M(ARGBToUV444MatrixRow_Any_NEON, ARGBToUV444MatrixRow_NEON, 4, 7)
|
||||
#endif
|
||||
|
||||
#define ANY11MC(NAMEANY, ANY_SIMD, BPP, MASK) \
|
||||
void NAMEANY(const uint8_t* src_ptr, uint8_t* dst_ptr, int width, \
|
||||
const struct ArgbConstants* c) { \
|
||||
SIMD_ALIGNED(uint8_t vin[256]); \
|
||||
SIMD_ALIGNED(uint8_t vout[256]); \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
int r = width & MASK; \
|
||||
int n = width & ~MASK; \
|
||||
if (n > 0) { \
|
||||
ANY_SIMD(src_ptr, dst_ptr, n, c); \
|
||||
} \
|
||||
memcpy(vin, src_ptr + (ptrdiff_t)n * BPP, (ptrdiff_t)r * BPP); \
|
||||
ANY_SIMD(vin, vout, MASK + 1, c); \
|
||||
memcpy(dst_ptr + (ptrdiff_t)n, vout, (ptrdiff_t)r); \
|
||||
#define ANY11MC(NAMEANY, ANY_SIMD, BPP, MASK) \
|
||||
void NAMEANY(const uint8_t* src_ptr, uint8_t* dst_ptr, int width, \
|
||||
const struct ArgbConstants* c) { \
|
||||
SIMD_ALIGNED(uint8_t vin[256]); \
|
||||
SIMD_ALIGNED(uint8_t vout[256]); \
|
||||
memset(vin, 0, sizeof(vin)); /* for msan */ \
|
||||
int r = width & MASK; \
|
||||
int n = width & ~MASK; \
|
||||
if (n > 0) { \
|
||||
ANY_SIMD(src_ptr, dst_ptr, n, c); \
|
||||
} \
|
||||
memcpy(vin, src_ptr + (ptrdiff_t)n * BPP, (ptrdiff_t)r * BPP); \
|
||||
ANY_SIMD(vin, vout, MASK + 1, c); \
|
||||
memcpy(dst_ptr + (ptrdiff_t)n, vout, (ptrdiff_t)r); \
|
||||
}
|
||||
|
||||
#ifdef HAS_ARGBTOYROW_SSSE3
|
||||
|
||||
@ -14,7 +14,7 @@
|
||||
#include <string.h> // For memcpy and memset.
|
||||
|
||||
#include "libyuv/basic_types.h"
|
||||
#include "libyuv/convert_argb.h" // For kYuvI601Constants
|
||||
#include "libyuv/convert_argb.h" // For kYuvI601Constants
|
||||
#include "libyuv/convert_from_argb.h" // For ArgbConstants
|
||||
|
||||
#ifdef __cplusplus
|
||||
@ -764,7 +764,7 @@ static __inline uint8_t RGBToUMatrix(uint8_t b0,
|
||||
uint8_t b3,
|
||||
const struct ArgbConstants* c) {
|
||||
return (c->kAddUV[0] - (c->kRGBToU[0] * b0 + c->kRGBToU[1] * b1 +
|
||||
c->kRGBToU[2] * b2 + c->kRGBToU[3] * b3)) >>
|
||||
c->kRGBToU[2] * b2 + c->kRGBToU[3] * b3)) >>
|
||||
8;
|
||||
}
|
||||
static __inline uint8_t RGBToVMatrix(uint8_t b0,
|
||||
@ -773,7 +773,7 @@ static __inline uint8_t RGBToVMatrix(uint8_t b0,
|
||||
uint8_t b3,
|
||||
const struct ArgbConstants* c) {
|
||||
return (c->kAddUV[0] - (c->kRGBToV[0] * b0 + c->kRGBToV[1] * b1 +
|
||||
c->kRGBToV[2] * b2 + c->kRGBToV[3] * b3)) >>
|
||||
c->kRGBToV[2] * b2 + c->kRGBToV[3] * b3)) >>
|
||||
8;
|
||||
}
|
||||
|
||||
@ -783,7 +783,8 @@ void ARGBToYMatrixRow_C(const uint8_t* src_argb,
|
||||
const struct ArgbConstants* c) {
|
||||
int x;
|
||||
for (x = 0; x < width; ++x) {
|
||||
dst_y[0] = RGBToYMatrix(src_argb[0], src_argb[1], src_argb[2], src_argb[3], c);
|
||||
dst_y[0] =
|
||||
RGBToYMatrix(src_argb[0], src_argb[1], src_argb[2], src_argb[3], c);
|
||||
src_argb += 4;
|
||||
dst_y += 1;
|
||||
}
|
||||
@ -1513,18 +1514,18 @@ void J400ToARGBRow_C(const uint8_t* src_y, uint8_t* dst_argb, int width) {
|
||||
const struct YuvConstants SIMD_ALIGNED(kYvu##name##Constants) = \
|
||||
YUVCONSTANTSBODY(YG, YB, VR, VG, UG, UB);
|
||||
|
||||
#define MAKEARGBCONSTANTS(name, RY, GY, BY, RU, GU, BU, RV, GV, BV, AY, AUV) \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kArgb##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(BY, GY, RY, 0, -(BU), -(GU), -(RU), 0, -(BV), -(GV), \
|
||||
-(RV), 0, AY, AUV); \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kAbgr##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(RY, GY, BY, 0, -(RU), -(GU), -(BU), 0, -(RV), -(GV), \
|
||||
-(BV), 0, AY, AUV); \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kRgba##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(0, BY, GY, RY, 0, -(BU), -(GU), -(RU), 0, -(BV), \
|
||||
-(GV), -(RV), AY, AUV); \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kBgra##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(0, RY, GY, BY, 0, -(RU), -(GU), -(BU), 0, -(RV), \
|
||||
#define MAKEARGBCONSTANTS(name, RY, GY, BY, RU, GU, BU, RV, GV, BV, AY, AUV) \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kArgb##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(BY, GY, RY, 0, -(BU), -(GU), -(RU), 0, -(BV), -(GV), \
|
||||
-(RV), 0, AY, AUV); \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kAbgr##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(RY, GY, BY, 0, -(RU), -(GU), -(BU), 0, -(RV), -(GV), \
|
||||
-(BV), 0, AY, AUV); \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kRgba##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(0, BY, GY, RY, 0, -(BU), -(GU), -(RU), 0, -(BV), \
|
||||
-(GV), -(RV), AY, AUV); \
|
||||
extern const struct ArgbConstants SIMD_ALIGNED(kBgra##name##Constants) = \
|
||||
ARGBCONSTANTSBODY(0, RY, GY, BY, 0, -(RU), -(GU), -(BU), 0, -(RV), \
|
||||
-(GV), -(BV), AY, AUV);
|
||||
|
||||
// BT.601 limited range RGB to YUV coefficients
|
||||
@ -3467,7 +3468,7 @@ void ARGBBlendRow_C(const uint8_t* src_argb,
|
||||
}
|
||||
#undef BLEND
|
||||
|
||||
#define UBLEND(f, b, a) (((a)*f) + ((255 - a) * b) + 255) >> 8
|
||||
#define UBLEND(f, b, a) (((a) * f) + ((255 - a) * b) + 255) >> 8
|
||||
void BlendPlaneRow_C(const uint8_t* src0,
|
||||
const uint8_t* src1,
|
||||
const uint8_t* alpha,
|
||||
@ -4618,8 +4619,7 @@ void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
RGB24ToARGBRow_AVX2(src_rgb, row, twidth);
|
||||
RGB24ToARGBRow_AVX2(src_rgb + src_stride_rgb,
|
||||
row + MAXTWIDTH * 4, twidth);
|
||||
RGB24ToARGBRow_AVX2(src_rgb + src_stride_rgb, row + MAXTWIDTH * 4, twidth);
|
||||
ARGBToUVMatrixRow_AVX2(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_rgb += twidth * 3;
|
||||
dst_u += twidth / 2;
|
||||
@ -4629,6 +4629,29 @@ void RGBToUVMatrixRow_AVX2(const uint8_t* src_rgb,
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_AVX512BW) && \
|
||||
defined(HAS_RGB24TOARGBROW_AVX512BW)
|
||||
void RGBToUVMatrixRow_AVX512BW(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
SIMD_ALIGNED(uint8_t row[MAXTWIDTH * 4 * 2]);
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
RGB24ToARGBRow_AVX512BW(src_rgb, row, twidth);
|
||||
RGB24ToARGBRow_AVX512BW(src_rgb + src_stride_rgb, row + MAXTWIDTH * 4,
|
||||
twidth);
|
||||
ARGBToUVMatrixRow_AVX512BW(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_rgb += twidth * 3;
|
||||
dst_u += twidth / 2;
|
||||
dst_v += twidth / 2;
|
||||
width -= twidth;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
#if defined(HAS_ARGBTOUVMATRIXROW_NEON) && defined(HAS_RGB24TOARGBROW_NEON)
|
||||
void RGBToUVMatrixRow_NEON(const uint8_t* src_rgb,
|
||||
int src_stride_rgb,
|
||||
@ -4675,7 +4698,8 @@ void RGB565ToUVMatrixRow_C(const uint8_t* src_rgb565,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
RGB565ToARGBRow_C(src_rgb565, row, twidth);
|
||||
RGB565ToARGBRow_C(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4, twidth);
|
||||
RGB565ToARGBRow_C(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4,
|
||||
twidth);
|
||||
ARGBToUVMatrixRow_C(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_rgb565 += twidth * 2;
|
||||
dst_u += twidth / 2;
|
||||
@ -4712,8 +4736,8 @@ void RGB565ToUVMatrixRow_AVX2(const uint8_t* src_rgb565,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
RGB565ToARGBRow_AVX2(src_rgb565, row, twidth);
|
||||
RGB565ToARGBRow_AVX2(src_rgb565 + src_stride_rgb565,
|
||||
row + MAXTWIDTH * 4, twidth);
|
||||
RGB565ToARGBRow_AVX2(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4,
|
||||
twidth);
|
||||
ARGBToUVMatrixRow_AVX2(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_rgb565 += twidth * 2;
|
||||
dst_u += twidth / 2;
|
||||
@ -4751,7 +4775,8 @@ void RGB565ToUVMatrixRow_NEON(const uint8_t* src_rgb565,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
RGB565ToARGBRow_NEON(src_rgb565, row, twidth);
|
||||
RGB565ToARGBRow_NEON(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4, twidth);
|
||||
RGB565ToARGBRow_NEON(src_rgb565 + src_stride_rgb565, row + MAXTWIDTH * 4,
|
||||
twidth);
|
||||
ARGBToUVMatrixRow_NEON(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_rgb565 += twidth * 2;
|
||||
dst_u += twidth / 2;
|
||||
@ -4786,7 +4811,8 @@ void ARGB1555ToUVMatrixRow_C(const uint8_t* src_argb1555,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
ARGB1555ToARGBRow_C(src_argb1555, row, twidth);
|
||||
ARGB1555ToARGBRow_C(src_argb1555 + src_stride_argb1555, row + MAXTWIDTH * 4, twidth);
|
||||
ARGB1555ToARGBRow_C(src_argb1555 + src_stride_argb1555, row + MAXTWIDTH * 4,
|
||||
twidth);
|
||||
ARGBToUVMatrixRow_C(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_argb1555 += twidth * 2;
|
||||
dst_u += twidth / 2;
|
||||
@ -4820,7 +4846,8 @@ void ARGB4444ToUVMatrixRow_C(const uint8_t* src_argb4444,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
ARGB4444ToARGBRow_C(src_argb4444, row, twidth);
|
||||
ARGB4444ToARGBRow_C(src_argb4444 + src_stride_argb4444, row + MAXTWIDTH * 4, twidth);
|
||||
ARGB4444ToARGBRow_C(src_argb4444 + src_stride_argb4444, row + MAXTWIDTH * 4,
|
||||
twidth);
|
||||
ARGBToUVMatrixRow_C(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_argb4444 += twidth * 2;
|
||||
dst_u += twidth / 2;
|
||||
@ -4956,7 +4983,8 @@ void ARGB1555ToUVMatrixRow_NEON(const uint8_t* src_argb1555,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
ARGB1555ToARGBRow_NEON(src_argb1555, row, twidth);
|
||||
ARGB1555ToARGBRow_NEON(src_argb1555 + src_stride_argb1555, row + MAXTWIDTH * 4, twidth);
|
||||
ARGB1555ToARGBRow_NEON(src_argb1555 + src_stride_argb1555,
|
||||
row + MAXTWIDTH * 4, twidth);
|
||||
ARGBToUVMatrixRow_NEON(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_argb1555 += twidth * 2;
|
||||
dst_u += twidth / 2;
|
||||
@ -4977,7 +5005,8 @@ void ARGB4444ToUVMatrixRow_NEON(const uint8_t* src_argb4444,
|
||||
while (width > 0) {
|
||||
int twidth = width > MAXTWIDTH ? MAXTWIDTH : width;
|
||||
ARGB4444ToARGBRow_NEON(src_argb4444, row, twidth);
|
||||
ARGB4444ToARGBRow_NEON(src_argb4444 + src_stride_argb4444, row + MAXTWIDTH * 4, twidth);
|
||||
ARGB4444ToARGBRow_NEON(src_argb4444 + src_stride_argb4444,
|
||||
row + MAXTWIDTH * 4, twidth);
|
||||
ARGBToUVMatrixRow_NEON(row, MAXTWIDTH * 4, dst_u, dst_v, twidth, c);
|
||||
src_argb4444 += twidth * 2;
|
||||
dst_u += twidth / 2;
|
||||
|
||||
1244
source/row_gcc.cc
1244
source/row_gcc.cc
File diff suppressed because it is too large
Load Diff
@ -2027,10 +2027,12 @@ struct ArgbConstants {
|
||||
// R * 0.2990 coefficient = 77
|
||||
// Add 0.5 = 0x80
|
||||
static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0},
|
||||
128,
|
||||
0};
|
||||
128,
|
||||
0};
|
||||
|
||||
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, 128, 0};
|
||||
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0},
|
||||
128,
|
||||
0};
|
||||
|
||||
// RGB to BT.601 coefficients
|
||||
// B * 0.1016 coefficient = 25
|
||||
@ -2039,19 +2041,19 @@ static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, 128, 0}
|
||||
// Add 16.5 = 0x1080
|
||||
|
||||
static const struct ArgbConstants kRgb24I601Constants = {{25, 129, 66, 0},
|
||||
0x1080,
|
||||
0};
|
||||
0x1080,
|
||||
0};
|
||||
|
||||
static const struct ArgbConstants kRawI601Constants = {{66, 129, 25, 0},
|
||||
0x1080,
|
||||
0};
|
||||
0x1080,
|
||||
0};
|
||||
#endif // ArgbConstants
|
||||
|
||||
// ARGB expects first 3 values to contain RGB and 4th value is ignored.
|
||||
void ARGBToYMatrixRow_LASX(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
int32_t shuff[8] = {0, 4, 1, 5, 2, 6, 3, 7};
|
||||
asm volatile(
|
||||
"xvldrepl.b $xr0, %3, 0 \n\t" // load rgbconstants
|
||||
@ -2216,18 +2218,14 @@ static void RGBToYMatrixRow_LASX(const uint8_t* src_rgba,
|
||||
"xvst $xr10, %1, 0 \n\t"
|
||||
"addi.d %1, %1, 32 \n\t"
|
||||
"bnez %2, 1b \n\t"
|
||||
: "+&r"(src_rgba), // %0
|
||||
"+&r"(dst_y), // %1
|
||||
"+&r"(width) // %2
|
||||
: "r"(c), // %3
|
||||
"r"(shuff) // %4
|
||||
: "+&r"(src_rgba), // %0
|
||||
"+&r"(dst_y), // %1
|
||||
"+&r"(width) // %2
|
||||
: "r"(c), // %3
|
||||
"r"(shuff) // %4
|
||||
: "memory");
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
void ARGBToUVJRow_LASX(const uint8_t* src_argb,
|
||||
int src_stride_argb,
|
||||
uint8_t* dst_u,
|
||||
|
||||
@ -2812,10 +2812,12 @@ struct ArgbConstants {
|
||||
// R * 0.2990 coefficient = 77
|
||||
// Add 0.5 = 0x80
|
||||
static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0},
|
||||
128,
|
||||
0};
|
||||
128,
|
||||
0};
|
||||
|
||||
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, 128, 0};
|
||||
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0},
|
||||
128,
|
||||
0};
|
||||
|
||||
// RGB to BT.601 coefficients
|
||||
// B * 0.1016 coefficient = 25
|
||||
@ -2824,19 +2826,19 @@ static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, 128, 0}
|
||||
// Add 16.5 = 0x1080
|
||||
|
||||
static const struct ArgbConstants kRgb24I601Constants = {{25, 129, 66, 0},
|
||||
0x1080,
|
||||
0};
|
||||
0x1080,
|
||||
0};
|
||||
|
||||
static const struct ArgbConstants kRawI601Constants = {{66, 129, 25, 0},
|
||||
0x1080,
|
||||
0};
|
||||
0x1080,
|
||||
0};
|
||||
#endif // ArgbConstants
|
||||
|
||||
// ARGB expects first 3 values to contain RGB and 4th value is ignored.
|
||||
void ARGBToYMatrixRow_LSX(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"vldrepl.b $vr0, %3, 0 \n\t" // load rgbconstants
|
||||
"vldrepl.b $vr1, %3, 1 \n\t" // load rgbconstants
|
||||
@ -2987,18 +2989,14 @@ static void RGBToYMatrixRow_LSX(const uint8_t* src_rgba,
|
||||
"vst $vr10, %1, 0 \n\t"
|
||||
"addi.d %1, %1, 16 \n\t"
|
||||
"bnez %2, 1b \n\t"
|
||||
: "+&r"(src_rgba), // %0
|
||||
"+&r"(dst_y), // %1
|
||||
"+&r"(width) // %2
|
||||
: "r"(c), // %3
|
||||
"r"(shuff) // %4
|
||||
: "+&r"(src_rgba), // %0
|
||||
"+&r"(dst_y), // %1
|
||||
"+&r"(width) // %2
|
||||
: "r"(c), // %3
|
||||
"r"(shuff) // %4
|
||||
: "memory");
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
// undef for unified sources build
|
||||
#undef YUVTORGB_SETUP
|
||||
#undef READYUV422_D
|
||||
|
||||
@ -8,8 +8,8 @@
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*/
|
||||
|
||||
#include "libyuv/row.h"
|
||||
#include "libyuv/convert_from_argb.h" // For ArgbConstants
|
||||
#include "libyuv/row.h"
|
||||
|
||||
#ifdef __cplusplus
|
||||
namespace libyuv {
|
||||
@ -272,7 +272,7 @@ void I422ToRGBARow_NEON(const uint8_t* src_y,
|
||||
"subs %[width], %[width], #8 \n" //
|
||||
YUVTORGB //
|
||||
RGBTORGB8 //
|
||||
STORERGBA //
|
||||
STORERGBA //
|
||||
"bgt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
[src_u] "+r"(src_u), // %[src_u]
|
||||
@ -325,9 +325,8 @@ void I422ToRGB565Row_NEON(const uint8_t* src_y,
|
||||
YUVTORGB_SETUP
|
||||
"vmov.u8 d6, #255 \n"
|
||||
"1: \n" //
|
||||
READYUV422
|
||||
"subs %[width], %[width], #8 \n" YUVTORGB RGBTORGB8
|
||||
ARGBTORGB565
|
||||
READYUV422 "subs %[width], %[width], #8 \n" YUVTORGB
|
||||
RGBTORGB8 ARGBTORGB565
|
||||
"vst1.8 {q2}, [%[dst_rgb565]]! \n" // store 8 pixels RGB565.
|
||||
"bgt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
@ -1887,13 +1886,13 @@ void ARGBToUV444MatrixRow_NEON(const uint8_t* src_argb,
|
||||
"vst1.8 {d0}, [%1]! \n" // store 8 pixels U.
|
||||
"vst1.8 {d1}, [%2]! \n" // store 8 pixels V.
|
||||
"bgt 1b \n"
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_u), // %1
|
||||
"+r"(dst_v), // %2
|
||||
"+r"(width) // %3
|
||||
: "r"(&c->kRGBToU), // %4
|
||||
"r"(&c->kRGBToV), // %5
|
||||
"r"(&c->kAddUV) // %6
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_u), // %1
|
||||
"+r"(dst_v), // %2
|
||||
"+r"(width) // %3
|
||||
: "r"(&c->kRGBToU), // %4
|
||||
"r"(&c->kRGBToV), // %5
|
||||
"r"(&c->kAddUV) // %6
|
||||
: "cc", "memory", "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8",
|
||||
"q10", "q11", "q12");
|
||||
}
|
||||
@ -1912,7 +1911,6 @@ void ARGBToUVJ444Row_NEON(const uint8_t* src_argb,
|
||||
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width, &kArgbJPEGConstants);
|
||||
}
|
||||
|
||||
|
||||
// clang-format off
|
||||
// 16x2 pixels -> 8x1. width is number of argb pixels. e.g. 16.
|
||||
#define RGBTOUV(QB, QG, QR) \
|
||||
@ -1934,8 +1932,9 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
const uint8_t* src_argb_1 = src_argb + src_stride_argb;
|
||||
asm volatile (
|
||||
"vld1.8 {d24}, [%5] \n" // load kRGBToU (8 bytes, only 4 used)
|
||||
asm volatile(
|
||||
"vld1.8 {d24}, [%5] \n" // load kRGBToU (8 bytes,
|
||||
// only 4 used)
|
||||
"vld1.8 {d25}, [%6] \n" // load kRGBToV
|
||||
"vmovl.s8 q14, d24 \n" // U coeffs in d28
|
||||
"vmovl.s8 q15, d25 \n" // V coeffs in d30
|
||||
@ -1943,7 +1942,8 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
|
||||
|
||||
"1: \n"
|
||||
"vld4.8 {d0, d2, d4, d6}, [%0]! \n" // load 8 ARGB pixels.
|
||||
"vld4.8 {d1, d3, d5, d7}, [%0]! \n" // load next 8 ARGB pixels.
|
||||
"vld4.8 {d1, d3, d5, d7}, [%0]! \n" // load next 8 ARGB
|
||||
// pixels.
|
||||
"subs %4, %4, #16 \n" // 16 processed per loop.
|
||||
"vpaddl.u8 q0, q0 \n" // B 16 bytes -> 8 shorts.
|
||||
"vpaddl.u8 q1, q1 \n" // G 16 bytes -> 8 shorts.
|
||||
@ -1985,16 +1985,15 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
|
||||
"vst1.8 {d0}, [%2]! \n" // store 8 pixels U.
|
||||
"vst1.8 {d1}, [%3]! \n" // store 8 pixels V.
|
||||
"bgt 1b \n"
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(src_argb_1), // %1
|
||||
"+r"(dst_u), // %2
|
||||
"+r"(dst_v), // %3
|
||||
"+r"(width) // %4
|
||||
: "r"(&c->kRGBToU), // %5
|
||||
"r"(&c->kRGBToV) // %6
|
||||
: "cc", "memory", "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7",
|
||||
"q8", "q9", "q11", "q12", "q14", "q15"
|
||||
);
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(src_argb_1), // %1
|
||||
"+r"(dst_u), // %2
|
||||
"+r"(dst_v), // %3
|
||||
"+r"(width) // %4
|
||||
: "r"(&c->kRGBToU), // %5
|
||||
"r"(&c->kRGBToV) // %6
|
||||
: "cc", "memory", "q0", "q1", "q2", "q3", "q4", "q5", "q6", "q7", "q8",
|
||||
"q9", "q11", "q12", "q14", "q15");
|
||||
}
|
||||
|
||||
void ARGBToUVRow_NEON(const uint8_t* src_argb,
|
||||
@ -2704,9 +2703,9 @@ void AB64ToARGBRow_NEON(const uint16_t* src_ab64,
|
||||
|
||||
// ARGB expects first 3 values to contain RGB and 4th value is ignored.
|
||||
void ARGBToYMatrixRow_NEON(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"vld1.8 {d24}, [%3] \n" // load kRGBToY
|
||||
"vld1.16 {d25[0]}, [%4] \n" // load kAddY[0]
|
||||
@ -2773,9 +2772,9 @@ void BGRAToYJRow_NEON(const uint8_t* src_bgra, uint8_t* dst_yj, int width) {
|
||||
}
|
||||
|
||||
void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"vld1.8 {d24}, [%3] \n" // load kRGBToY
|
||||
"vld1.16 {d25[0]}, [%4] \n" // load kAddY[0]
|
||||
@ -2807,10 +2806,6 @@ void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
|
||||
"d24", "d25");
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
// Bilinear filter 16x2 -> 16x1
|
||||
void InterpolateRow_NEON(uint8_t* dst_ptr,
|
||||
const uint8_t* src_ptr,
|
||||
|
||||
@ -8,8 +8,8 @@
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*/
|
||||
|
||||
#include "libyuv/row.h"
|
||||
#include "libyuv/convert_from_argb.h"
|
||||
#include "libyuv/row.h"
|
||||
|
||||
#ifdef __cplusplus
|
||||
namespace libyuv {
|
||||
@ -292,12 +292,12 @@ void I210ToAR30Row_NEON(const uint16_t* src_y,
|
||||
uint16_t limit = 0x3ff0;
|
||||
uint16_t alpha = 0xc000;
|
||||
asm volatile(YUVTORGB_SETUP
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"dup v23.8h, %w[alpha] \n"
|
||||
"1: \n" //
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"dup v23.8h, %w[alpha] \n"
|
||||
"1: \n" //
|
||||
READYUV210
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
[src_u] "+r"(src_u), // %[src_u]
|
||||
[src_v] "+r"(src_v), // %[src_v]
|
||||
@ -321,12 +321,12 @@ void I410ToAR30Row_NEON(const uint16_t* src_y,
|
||||
uint16_t limit = 0x3ff0;
|
||||
uint16_t alpha = 0xc000;
|
||||
asm volatile(YUVTORGB_SETUP
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"dup v23.8h, %w[alpha] \n"
|
||||
"1: \n" //
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"dup v23.8h, %w[alpha] \n"
|
||||
"1: \n" //
|
||||
READYUV410
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
[src_u] "+r"(src_u), // %[src_u]
|
||||
[src_v] "+r"(src_v), // %[src_v]
|
||||
@ -349,12 +349,12 @@ void I212ToAR30Row_NEON(const uint16_t* src_y,
|
||||
const vec16* rgb_coeff = &yuvconstants->kRGBCoeffBias;
|
||||
const uint16_t limit = 0x3ff0;
|
||||
asm volatile(YUVTORGB_SETUP
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"movi v23.8h, #0xc0, lsl #8 \n" // A
|
||||
"1: \n" //
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"movi v23.8h, #0xc0, lsl #8 \n" // A
|
||||
"1: \n" //
|
||||
READYUV212
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
[src_u] "+r"(src_u), // %[src_u]
|
||||
[src_v] "+r"(src_v), // %[src_v]
|
||||
@ -531,13 +531,13 @@ void P210ToAR30Row_NEON(const uint16_t* src_y,
|
||||
const vec16* rgb_coeff = &yuvconstants->kRGBCoeffBias;
|
||||
const uint16_t limit = 0x3ff0;
|
||||
asm volatile(YUVTORGB_SETUP
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"movi v23.8h, #0xc0, lsl #8 \n" // A
|
||||
"ldr q2, [%[kIndices]] \n"
|
||||
"1: \n" //
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"movi v23.8h, #0xc0, lsl #8 \n" // A
|
||||
"ldr q2, [%[kIndices]] \n"
|
||||
"1: \n" //
|
||||
READYUVP210
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
[src_uv] "+r"(src_uv), // %[src_uv]
|
||||
[dst_ar30] "+r"(dst_ar30), // %[dst_ar30]
|
||||
@ -558,13 +558,13 @@ void P410ToAR30Row_NEON(const uint16_t* src_y,
|
||||
const vec16* rgb_coeff = &yuvconstants->kRGBCoeffBias;
|
||||
uint16_t limit = 0x3ff0;
|
||||
asm volatile(YUVTORGB_SETUP
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"movi v23.8h, #0xc0, lsl #8 \n" // A
|
||||
"ldr q2, [%[kIndices]] \n"
|
||||
"1: \n" //
|
||||
"dup v22.8h, %w[limit] \n"
|
||||
"movi v23.8h, #0xc0, lsl #8 \n" // A
|
||||
"ldr q2, [%[kIndices]] \n"
|
||||
"1: \n" //
|
||||
READYUVP410
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB STOREAR30
|
||||
"b.gt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
[src_uv] "+r"(src_uv), // %[src_uv]
|
||||
[dst_ar30] "+r"(dst_ar30), // %[dst_ar30]
|
||||
@ -783,9 +783,8 @@ void I422ToRGB565Row_NEON(const uint8_t* src_y,
|
||||
asm volatile(
|
||||
YUVTORGB_SETUP
|
||||
"1: \n" //
|
||||
READYUV422
|
||||
"subs %w[width], %w[width], #8 \n" I4XXTORGB RGBTORGB8_TOP
|
||||
ARGBTORGB565_FROM_TOP
|
||||
READYUV422 "subs %w[width], %w[width], #8 \n" I4XXTORGB
|
||||
RGBTORGB8_TOP ARGBTORGB565_FROM_TOP
|
||||
"st1 {v18.8h}, [%[dst_rgb565]], #16 \n" // store 8 pixels RGB565.
|
||||
"b.gt 1b \n"
|
||||
: [src_y] "+r"(src_y), // %[src_y]
|
||||
@ -1036,9 +1035,8 @@ void NV12ToRGB565Row_NEON(const uint8_t* src_y,
|
||||
YUVTORGB_SETUP
|
||||
"ldr q2, [%[kNV12Table]] \n"
|
||||
"1: \n" //
|
||||
READNV12
|
||||
"subs %w[width], %w[width], #8 \n" NVTORGB RGBTORGB8_TOP
|
||||
ARGBTORGB565_FROM_TOP
|
||||
READNV12 "subs %w[width], %w[width], #8 \n" NVTORGB
|
||||
RGBTORGB8_TOP ARGBTORGB565_FROM_TOP
|
||||
"st1 {v18.8h}, [%[dst_rgb565]], #16 \n" // store 8
|
||||
// pixels
|
||||
// RGB565.
|
||||
@ -2742,20 +2740,22 @@ void ARGBToUV444MatrixRow_NEON(const uint8_t* src_argb,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"ldr q16, [%[c], #16] \n" // kRGBToU
|
||||
"ldr q17, [%[c], #32] \n" // kRGBToV
|
||||
"ldr s0, [%[c], #64] \n" // kAddUV
|
||||
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs to 16-bit
|
||||
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs to 16-bit
|
||||
"dup v20.8h, v16.h[0] \n" // U0
|
||||
"dup v21.8h, v16.h[1] \n" // U1
|
||||
"dup v22.8h, v16.h[2] \n" // U2
|
||||
"dup v23.8h, v16.h[3] \n" // U3
|
||||
"dup v24.8h, v17.h[0] \n" // V0
|
||||
"dup v26.8h, v17.h[1] \n" // V1
|
||||
"dup v27.8h, v17.h[2] \n" // V2
|
||||
"dup v28.8h, v17.h[3] \n" // V3
|
||||
"dup v25.8h, v0.h[0] \n" // kAddUV
|
||||
"ldr q16, [%[c], #16] \n" // kRGBToU
|
||||
"ldr q17, [%[c], #32] \n" // kRGBToV
|
||||
"ldr s0, [%[c], #64] \n" // kAddUV
|
||||
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs
|
||||
// to 16-bit
|
||||
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs
|
||||
// to 16-bit
|
||||
"dup v20.8h, v16.h[0] \n" // U0
|
||||
"dup v21.8h, v16.h[1] \n" // U1
|
||||
"dup v22.8h, v16.h[2] \n" // U2
|
||||
"dup v23.8h, v16.h[3] \n" // U3
|
||||
"dup v24.8h, v17.h[0] \n" // V0
|
||||
"dup v26.8h, v17.h[1] \n" // V1
|
||||
"dup v27.8h, v17.h[2] \n" // V2
|
||||
"dup v28.8h, v17.h[3] \n" // V3
|
||||
"dup v25.8h, v0.h[0] \n" // kAddUV
|
||||
"1: \n"
|
||||
"ld4 {v0.8b,v1.8b,v2.8b,v3.8b}, [%0], #32 \n" // load 8 ARGB
|
||||
"subs %w3, %w3, #8 \n" // 8 processed per loop.
|
||||
@ -2783,27 +2783,26 @@ void ARGBToUV444MatrixRow_NEON(const uint8_t* src_argb,
|
||||
"st1 {v0.8b}, [%1], #8 \n"
|
||||
"st1 {v1.8b}, [%2], #8 \n"
|
||||
"b.gt 1b \n"
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_u), // %1
|
||||
"+r"(dst_v), // %2
|
||||
"+r"(width) // %3
|
||||
: [c] "r"(c) // %4
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7",
|
||||
"v16", "v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25",
|
||||
"v26", "v27", "v28");
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_u), // %1
|
||||
"+r"(dst_v), // %2
|
||||
"+r"(width) // %3
|
||||
: [c] "r"(c) // %4
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16",
|
||||
"v17", "v18", "v19", "v20", "v21", "v22", "v23", "v24", "v25", "v26",
|
||||
"v27", "v28");
|
||||
}
|
||||
|
||||
static void ARGBToUV444MatrixRow_NEON_I8MM(
|
||||
const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
static void ARGBToUV444MatrixRow_NEON_I8MM(const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"ldr q16, [%[c], #16] \n" // kRGBToU
|
||||
"ldr q17, [%[c], #32] \n" // kRGBToV
|
||||
"ldr s0, [%[c], #64] \n" // kAddUV
|
||||
"dup v29.8h, v0.h[0] \n" // 128.0
|
||||
"ldr q16, [%[c], #16] \n" // kRGBToU
|
||||
"ldr q17, [%[c], #32] \n" // kRGBToV
|
||||
"ldr s0, [%[c], #64] \n" // kAddUV
|
||||
"dup v29.8h, v0.h[0] \n" // 128.0
|
||||
"1: \n"
|
||||
"ldp q0, q1, [%[src]], #32 \n"
|
||||
"subs %w[width], %w[width], #8 \n" // 8 processed per loop.
|
||||
@ -2823,11 +2822,11 @@ static void ARGBToUV444MatrixRow_NEON_I8MM(
|
||||
"str d0, [%[dst_u]], #8 \n" // store 8 pixels U.
|
||||
"str d1, [%[dst_v]], #8 \n" // store 8 pixels V.
|
||||
"b.gt 1b \n"
|
||||
: [src] "+r"(src_argb), // %[src]
|
||||
[dst_u] "+r"(dst_u), // %[dst_u]
|
||||
[dst_v] "+r"(dst_v), // %[dst_v]
|
||||
[width] "+r"(width) // %[width]
|
||||
: [c] "r"(c) // %[c]
|
||||
: [src] "+r"(src_argb), // %[src]
|
||||
[dst_u] "+r"(dst_u), // %[dst_u]
|
||||
[dst_v] "+r"(dst_v), // %[dst_v]
|
||||
[width] "+r"(width) // %[width]
|
||||
: [c] "r"(c) // %[c]
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v16", "v17",
|
||||
"v29");
|
||||
}
|
||||
@ -2844,8 +2843,7 @@ void ARGBToUV444Row_NEON(const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width, &kArgbI601Constants);
|
||||
}
|
||||
|
||||
void ARGBToUV444Row_NEON_I8MM(const uint8_t* src_argb,
|
||||
@ -2860,8 +2858,7 @@ void ARGBToUVJ444Row_NEON(const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width,
|
||||
&kArgbJPEGConstants);
|
||||
ARGBToUV444MatrixRow_NEON(src_argb, dst_u, dst_v, width, &kArgbJPEGConstants);
|
||||
}
|
||||
|
||||
void ARGBToUVJ444Row_NEON_I8MM(const uint8_t* src_argb,
|
||||
@ -2903,23 +2900,27 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
const uint8_t* src_argb_1 = src_argb + src_stride_argb;
|
||||
asm volatile (
|
||||
"ldr q16, [%[c], #16] \n" // kRGBToU
|
||||
"ldr q17, [%[c], #32] \n" // kRGBToV
|
||||
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs to 16-bit
|
||||
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs to 16-bit
|
||||
"dup v20.8h, v16.h[0] \n" // U0
|
||||
"dup v21.8h, v16.h[1] \n" // U1
|
||||
"dup v22.8h, v16.h[2] \n" // U2
|
||||
"dup v23.8h, v16.h[3] \n" // U3
|
||||
"dup v24.8h, v17.h[0] \n" // V0
|
||||
"dup v26.8h, v17.h[1] \n" // V1
|
||||
"dup v27.8h, v17.h[2] \n" // V2
|
||||
"dup v28.8h, v17.h[3] \n" // V3
|
||||
"movi v25.8h, #0x80, lsl #8 \n" // 128.0 in 16-bit (0x8000)
|
||||
asm volatile(
|
||||
"ldr q16, [%[c], #16] \n" // kRGBToU
|
||||
"ldr q17, [%[c], #32] \n" // kRGBToV
|
||||
"sxtl v16.8h, v16.8b \n" // sign extend U coeffs
|
||||
// to 16-bit
|
||||
"sxtl v17.8h, v17.8b \n" // sign extend V coeffs
|
||||
// to 16-bit
|
||||
"dup v20.8h, v16.h[0] \n" // U0
|
||||
"dup v21.8h, v16.h[1] \n" // U1
|
||||
"dup v22.8h, v16.h[2] \n" // U2
|
||||
"dup v23.8h, v16.h[3] \n" // U3
|
||||
"dup v24.8h, v17.h[0] \n" // V0
|
||||
"dup v26.8h, v17.h[1] \n" // V1
|
||||
"dup v27.8h, v17.h[2] \n" // V2
|
||||
"dup v28.8h, v17.h[3] \n" // V3
|
||||
"movi v25.8h, #0x80, lsl #8 \n" // 128.0 in 16-bit
|
||||
// (0x8000)
|
||||
|
||||
"1: \n"
|
||||
"ld4 {v0.16b,v1.16b,v2.16b,v3.16b}, [%0], #64 \n" // load 16 pixels.
|
||||
"ld4 {v0.16b,v1.16b,v2.16b,v3.16b}, [%0], #64 \n" // load 16
|
||||
// pixels.
|
||||
"subs %w4, %w4, #16 \n" // 16 processed per loop.
|
||||
"uaddlp v0.8h, v0.16b \n" // B 16 bytes -> 8 shorts.
|
||||
"prfm pldl1keep, [%0, 448] \n"
|
||||
@ -2927,7 +2928,8 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
|
||||
"uaddlp v2.8h, v2.16b \n" // R 16 bytes -> 8 shorts.
|
||||
"uaddlp v18.8h, v3.16b \n" // A 16 bytes -> 8 shorts.
|
||||
|
||||
"ld4 {v4.16b,v5.16b,v6.16b,v7.16b}, [%1], #64 \n" // load 16 more.
|
||||
"ld4 {v4.16b,v5.16b,v6.16b,v7.16b}, [%1], #64 \n" // load 16
|
||||
// more.
|
||||
"uadalp v0.8h, v4.16b \n" // B 16 bytes -> 8 shorts.
|
||||
"prfm pldl1keep, [%1, 448] \n"
|
||||
"uadalp v1.8h, v5.16b \n" // G 16 bytes -> 8 shorts.
|
||||
@ -2940,34 +2942,33 @@ void ARGBToUVMatrixRow_NEON(const uint8_t* src_argb,
|
||||
"urshr v18.8h, v18.8h, #2 \n"
|
||||
|
||||
// U = B*U0 + G*U1 + R*U2 + A*U3
|
||||
"mul v3.8h, v0.8h, v20.8h \n"
|
||||
"mla v3.8h, v1.8h, v21.8h \n"
|
||||
"mla v3.8h, v2.8h, v22.8h \n"
|
||||
"mla v3.8h, v18.8h, v23.8h \n"
|
||||
"mul v3.8h, v0.8h, v20.8h \n"
|
||||
"mla v3.8h, v1.8h, v21.8h \n"
|
||||
"mla v3.8h, v2.8h, v22.8h \n"
|
||||
"mla v3.8h, v18.8h, v23.8h \n"
|
||||
|
||||
// V = B*V0 + G*V1 + R*V2 + A*V3
|
||||
"mul v4.8h, v0.8h, v24.8h \n"
|
||||
"mla v4.8h, v1.8h, v26.8h \n"
|
||||
"mla v4.8h, v2.8h, v27.8h \n"
|
||||
"mla v4.8h, v18.8h, v28.8h \n"
|
||||
"mul v4.8h, v0.8h, v24.8h \n"
|
||||
"mla v4.8h, v1.8h, v26.8h \n"
|
||||
"mla v4.8h, v2.8h, v27.8h \n"
|
||||
"mla v4.8h, v18.8h, v28.8h \n"
|
||||
|
||||
// U = (128.0 - U) >> 8, V = (128.0 - V) >> 8
|
||||
"subhn v0.8b, v25.8h, v3.8h \n"
|
||||
"subhn v1.8b, v25.8h, v4.8h \n"
|
||||
"subhn v0.8b, v25.8h, v3.8h \n"
|
||||
"subhn v1.8b, v25.8h, v4.8h \n"
|
||||
|
||||
"st1 {v0.8b}, [%2], #8 \n" // store 8 pixels U.
|
||||
"st1 {v1.8b}, [%3], #8 \n" // store 8 pixels V.
|
||||
"b.gt 1b \n"
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(src_argb_1), // %1
|
||||
"+r"(dst_u), // %2
|
||||
"+r"(dst_v), // %3
|
||||
"+r"(width) // %4
|
||||
: [c] "r"(c) // %5
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7",
|
||||
"v16", "v17", "v18", "v20", "v21", "v22", "v23", "v24", "v25", "v26",
|
||||
"v27", "v28"
|
||||
);
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(src_argb_1), // %1
|
||||
"+r"(dst_u), // %2
|
||||
"+r"(dst_v), // %3
|
||||
"+r"(width) // %4
|
||||
: [c] "r"(c) // %5
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16",
|
||||
"v17", "v18", "v20", "v21", "v22", "v23", "v24", "v25", "v26", "v27",
|
||||
"v28");
|
||||
}
|
||||
|
||||
void ARGBToUVRow_NEON(const uint8_t* src_argb,
|
||||
@ -3330,11 +3331,11 @@ void ARGB4444ToUVRow_NEON(const uint8_t* src_argb4444,
|
||||
|
||||
// Process any of ARGB, ABGR, BGRA, RGBA, by adjusting the ArgbConstants layout.
|
||||
static void ARGBToUVMatrixRow_NEON_I8MM_Impl(const uint8_t* src,
|
||||
int src_stride,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
int src_stride,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
const uint8_t* src1 = src + src_stride;
|
||||
asm volatile(
|
||||
"movi v23.8h, #0x80, lsl #8 \n" // 128.0 (0x8000 in
|
||||
@ -3388,12 +3389,12 @@ static void ARGBToUVMatrixRow_NEON_I8MM_Impl(const uint8_t* src,
|
||||
"str d0, [%[dst_u]], #8 \n" // store 8 pixels U
|
||||
"str d1, [%[dst_v]], #8 \n" // store 8 pixels V
|
||||
"b.gt 1b \n"
|
||||
: [src] "+r"(src), // %[src]
|
||||
[src1] "+r"(src1), // %[src1]
|
||||
[dst_u] "+r"(dst_u), // %[dst_u]
|
||||
[dst_v] "+r"(dst_v), // %[dst_v]
|
||||
[width] "+r"(width) // %[width]
|
||||
: [c] "r"(c) // %[c]
|
||||
: [src] "+r"(src), // %[src]
|
||||
[src1] "+r"(src1), // %[src1]
|
||||
[dst_u] "+r"(dst_u), // %[dst_u]
|
||||
[dst_v] "+r"(dst_v), // %[dst_v]
|
||||
[width] "+r"(width) // %[width]
|
||||
: [c] "r"(c) // %[c]
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v23",
|
||||
"v24", "v25");
|
||||
}
|
||||
@ -3404,8 +3405,8 @@ void ARGBToUVMatrixRow_NEON_I8MM(const uint8_t* src_argb,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v, width,
|
||||
c);
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v,
|
||||
width, c);
|
||||
}
|
||||
|
||||
void ARGBToUVRow_NEON_I8MM(const uint8_t* src_argb,
|
||||
@ -3413,8 +3414,8 @@ void ARGBToUVRow_NEON_I8MM(const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v, width,
|
||||
&kArgbI601Constants);
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v,
|
||||
width, &kArgbI601Constants);
|
||||
}
|
||||
|
||||
void ABGRToUVRow_NEON_I8MM(const uint8_t* src_abgr,
|
||||
@ -3422,8 +3423,8 @@ void ABGRToUVRow_NEON_I8MM(const uint8_t* src_abgr,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v, width,
|
||||
&kAbgrI601Constants);
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v,
|
||||
width, &kAbgrI601Constants);
|
||||
}
|
||||
|
||||
void BGRAToUVRow_NEON_I8MM(const uint8_t* src_bgra,
|
||||
@ -3431,8 +3432,8 @@ void BGRAToUVRow_NEON_I8MM(const uint8_t* src_bgra,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_bgra, src_stride_bgra, dst_u, dst_v, width,
|
||||
&kBgraI601Constants);
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_bgra, src_stride_bgra, dst_u, dst_v,
|
||||
width, &kBgraI601Constants);
|
||||
}
|
||||
|
||||
void RGBAToUVRow_NEON_I8MM(const uint8_t* src_rgba,
|
||||
@ -3440,8 +3441,8 @@ void RGBAToUVRow_NEON_I8MM(const uint8_t* src_rgba,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_rgba, src_stride_rgba, dst_u, dst_v, width,
|
||||
&kRgbaI601Constants);
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_rgba, src_stride_rgba, dst_u, dst_v,
|
||||
width, &kRgbaI601Constants);
|
||||
}
|
||||
|
||||
void ARGBToUVJRow_NEON_I8MM(const uint8_t* src_argb,
|
||||
@ -3449,8 +3450,8 @@ void ARGBToUVJRow_NEON_I8MM(const uint8_t* src_argb,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v, width,
|
||||
&kArgbJPEGConstants);
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_argb, src_stride_argb, dst_u, dst_v,
|
||||
width, &kArgbJPEGConstants);
|
||||
}
|
||||
|
||||
void ABGRToUVJRow_NEON_I8MM(const uint8_t* src_abgr,
|
||||
@ -3458,8 +3459,8 @@ void ABGRToUVJRow_NEON_I8MM(const uint8_t* src_abgr,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v, width,
|
||||
&kAbgrJPEGConstants);
|
||||
ARGBToUVMatrixRow_NEON_I8MM_Impl(src_abgr, src_stride_abgr, dst_u, dst_v,
|
||||
width, &kAbgrJPEGConstants);
|
||||
}
|
||||
|
||||
void RGB565ToYRow_NEON(const uint8_t* src_rgb565, uint8_t* dst_y, int width) {
|
||||
@ -3558,13 +3559,11 @@ void ARGB4444ToYRow_NEON(const uint8_t* src_argb4444,
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v24", "v25", "v26", "v27");
|
||||
}
|
||||
|
||||
|
||||
|
||||
// ARGB expects first 3 values to contain RGB and 4th value is ignored.
|
||||
void ARGBToYMatrixRow_NEON(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"ldr s16, [%3] \n" // load 4 coeffs
|
||||
"ldr s17, [%3, #48] \n" // load kAddY[0]
|
||||
@ -3589,20 +3588,18 @@ void ARGBToYMatrixRow_NEON(const uint8_t* src_argb,
|
||||
"addhn v1.8b, v1.8h, v22.8h \n"
|
||||
"st1 {v0.8b, v1.8b}, [%1], #16 \n" // store 16 pixels Y.
|
||||
"b.gt 1b \n"
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_y), // %1
|
||||
"+r"(width) // %2
|
||||
: "r"(c) // %3
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_y), // %1
|
||||
"+r"(width) // %2
|
||||
: "r"(c) // %3
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v16", "v17", "v18",
|
||||
"v19", "v20", "v21", "v22");
|
||||
}
|
||||
|
||||
|
||||
void ARGBToYMatrixRow_NEON_DotProd(
|
||||
const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
void ARGBToYMatrixRow_NEON_DotProd(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"ldr s16, [%3] \n" // load 4 coeffs
|
||||
"ldr s17, [%3, #48] \n" // load kAddY[0]
|
||||
@ -3625,14 +3622,14 @@ void ARGBToYMatrixRow_NEON_DotProd(
|
||||
"addhn v1.8b, v1.8h, v19.8h \n"
|
||||
"st1 {v0.8b, v1.8b}, [%1], #16 \n" // store 16 pixels Y.
|
||||
"b.gt 1b \n"
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_y), // %1
|
||||
"+r"(width) // %2
|
||||
: "r"(c) // %3
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16", "v17", "v18", "v19");
|
||||
: "+r"(src_argb), // %0
|
||||
"+r"(dst_y), // %1
|
||||
"+r"(width) // %2
|
||||
: "r"(c) // %3
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v5", "v6", "v7", "v16",
|
||||
"v17", "v18", "v19");
|
||||
}
|
||||
|
||||
|
||||
// RGB to JPeg coefficients
|
||||
|
||||
void ARGBToYRow_NEON(const uint8_t* src_argb, uint8_t* dst_y, int width) {
|
||||
@ -3708,9 +3705,9 @@ void BGRAToYRow_NEON_DotProd(const uint8_t* src_bgra,
|
||||
}
|
||||
|
||||
void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
asm volatile(
|
||||
"ldr s16, [%3] \n" // load 4 coeffs
|
||||
"ldr s17, [%3, #48] \n" // load kAddY[0]
|
||||
@ -3732,18 +3729,14 @@ void RGBToYMatrixRow_NEON(const uint8_t* src_rgb,
|
||||
"addhn v1.8b, v1.8h, v21.8h \n"
|
||||
"st1 {v0.8b, v1.8b}, [%1], #16 \n" // store 16 pixels Y.
|
||||
"b.gt 1b \n"
|
||||
: "+r"(src_rgb), // %0
|
||||
"+r"(dst_y), // %1
|
||||
"+r"(width) // %2
|
||||
: "r"(c) // %3
|
||||
: "+r"(src_rgb), // %0
|
||||
"+r"(dst_y), // %1
|
||||
"+r"(width) // %2
|
||||
: "r"(c) // %3
|
||||
: "cc", "memory", "v0", "v1", "v2", "v3", "v4", "v16", "v17", "v18",
|
||||
"v19", "v20", "v21");
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
// Bilinear filter 16x2 -> 16x1
|
||||
void InterpolateRow_NEON(uint8_t* dst_ptr,
|
||||
const uint8_t* src_ptr,
|
||||
|
||||
@ -1249,16 +1249,22 @@ void MergeUVRow_RVV(const uint8_t* src_u,
|
||||
}
|
||||
#endif
|
||||
|
||||
|
||||
|
||||
// RGB to JPeg coefficients
|
||||
// B * 0.1140 coefficient = 29
|
||||
// G * 0.5870 coefficient = 150
|
||||
// R * 0.2990 coefficient = 77
|
||||
// Add 0.5 = 0x80
|
||||
static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0}, {0}, {0}, {128}, {0}};
|
||||
static const struct ArgbConstants kRgb24JPEGConstants = {{29, 150, 77, 0},
|
||||
{0},
|
||||
{0},
|
||||
{128},
|
||||
{0}};
|
||||
|
||||
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, {0}, {0}, {128}, {0}};
|
||||
static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0},
|
||||
{0},
|
||||
{0},
|
||||
{128},
|
||||
{0}};
|
||||
|
||||
// RGB to BT.601 coefficients
|
||||
// B * 0.1016 coefficient = 25
|
||||
@ -1266,16 +1272,24 @@ static const struct ArgbConstants kRawJPEGConstants = {{77, 150, 29, 0}, {0}, {0
|
||||
// R * 0.2578 coefficient = 66
|
||||
// Add 16.5 = 0x1080
|
||||
|
||||
static const struct ArgbConstants kRgb24I601Constants = {{25, 129, 66, 0}, {0}, {0}, {0x1080}, {0}};
|
||||
static const struct ArgbConstants kRgb24I601Constants = {{25, 129, 66, 0},
|
||||
{0},
|
||||
{0},
|
||||
{0x1080},
|
||||
{0}};
|
||||
|
||||
static const struct ArgbConstants kRawI601Constants = {{66, 129, 25, 0}, {0}, {0}, {0x1080}, {0}};
|
||||
static const struct ArgbConstants kRawI601Constants = {{66, 129, 25, 0},
|
||||
{0},
|
||||
{0},
|
||||
{0x1080},
|
||||
{0}};
|
||||
|
||||
// ARGB expects first 3 values to contain RGB and 4th value is ignored
|
||||
#ifdef HAS_ARGBTOYMATRIXROW_RVV
|
||||
void ARGBToYMatrixRow_RVV(const uint8_t* src_argb,
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
uint8_t* dst_y,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
assert(width != 0);
|
||||
size_t w = (size_t)width;
|
||||
vuint8m2_t v_by, v_gy, v_ry; // vectors are to store RGBToY constant
|
||||
|
||||
@ -1127,9 +1127,10 @@ __arm_locally_streaming void ARGBToUVMatrixRow_SME(
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
int8_t uvconstants[8] = {
|
||||
(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1], (int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
|
||||
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1], (int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
|
||||
int8_t uvconstants[8] = {(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1],
|
||||
(int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
|
||||
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1],
|
||||
(int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
|
||||
ARGBToUVMatrixRow_SVE_SC(src_argb, src_stride_argb, dst_u, dst_v, width,
|
||||
uvconstants);
|
||||
}
|
||||
|
||||
@ -223,9 +223,10 @@ void ARGBToUVMatrixRow_SVE2(const uint8_t* src_argb,
|
||||
uint8_t* dst_v,
|
||||
int width,
|
||||
const struct ArgbConstants* c) {
|
||||
int8_t uvconstants[8] = {
|
||||
(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1], (int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
|
||||
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1], (int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
|
||||
int8_t uvconstants[8] = {(int8_t)c->kRGBToU[0], (int8_t)c->kRGBToU[1],
|
||||
(int8_t)c->kRGBToU[2], (int8_t)c->kRGBToU[3],
|
||||
(int8_t)c->kRGBToV[0], (int8_t)c->kRGBToV[1],
|
||||
(int8_t)c->kRGBToV[2], (int8_t)c->kRGBToV[3]};
|
||||
ARGBToUVMatrixRow_SVE_SC(src_argb, src_stride_argb, dst_u, dst_v, width,
|
||||
uvconstants);
|
||||
}
|
||||
|
||||
@ -8,19 +8,19 @@
|
||||
* be found in the AUTHORS file in the root of the source tree.
|
||||
*/
|
||||
|
||||
#include "libyuv/row.h"
|
||||
#include "libyuv/convert_from_argb.h" // For ArgbConstants
|
||||
#include "libyuv/row.h"
|
||||
|
||||
// This module is for Visual C 32/64 bit
|
||||
#if !defined(LIBYUV_DISABLE_X86) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || \
|
||||
defined(_M_X64) || defined(_M_X86)) && \
|
||||
((defined(_MSC_VER) && !defined(__clang__)) || \
|
||||
#if !defined(LIBYUV_DISABLE_X86) && \
|
||||
(defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || \
|
||||
defined(_M_X86)) && \
|
||||
((defined(_MSC_VER) && !defined(__clang__)) || \
|
||||
defined(LIBYUV_ENABLE_ROWWIN))
|
||||
|
||||
#include <emmintrin.h>
|
||||
#include <tmmintrin.h> // For _mm_maddubs_epi16
|
||||
#include <immintrin.h> // For AVX2 intrinsics
|
||||
#include <tmmintrin.h> // For _mm_maddubs_epi16
|
||||
|
||||
#ifdef __cplusplus
|
||||
namespace libyuv {
|
||||
@ -266,27 +266,33 @@ void BGRAToYRow_AVX2(const uint8_t* src_bgra, uint8_t* dst_y, int width) {
|
||||
LIBYUV_TARGET_AVX2
|
||||
void RAWToARGBRow_AVX2(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
|
||||
__m256i ymm_alpha = _mm256_set1_epi32(0xff000000);
|
||||
__m128i shuf_low = _mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
|
||||
__m128i shuf_high = _mm_set_epi8(-1, 13, 14, 15, -1, 10, 11, 12, -1, 7, 8, 9, -1, 4, 5, 6);
|
||||
__m128i shuf_low =
|
||||
_mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
|
||||
__m128i shuf_high =
|
||||
_mm_set_epi8(-1, 13, 14, 15, -1, 10, 11, 12, -1, 7, 8, 9, -1, 4, 5, 6);
|
||||
__m256i ymm_shuf = _mm256_broadcastsi128_si256(shuf_low);
|
||||
__m256i ymm_shuf2 = _mm256_broadcastsi128_si256(shuf_high);
|
||||
|
||||
while (width > 0) {
|
||||
__m128i xmm0 = _mm_loadu_si128((const __m128i*)src_raw);
|
||||
__m256i ymm0 = _mm256_castsi128_si256(xmm0);
|
||||
ymm0 = _mm256_inserti128_si256(ymm0, _mm_loadu_si128((const __m128i*)(src_raw + 12)), 1);
|
||||
ymm0 = _mm256_inserti128_si256(
|
||||
ymm0, _mm_loadu_si128((const __m128i*)(src_raw + 12)), 1);
|
||||
|
||||
__m128i xmm1 = _mm_loadu_si128((const __m128i*)(src_raw + 24));
|
||||
__m256i ymm1 = _mm256_castsi128_si256(xmm1);
|
||||
ymm1 = _mm256_inserti128_si256(ymm1, _mm_loadu_si128((const __m128i*)(src_raw + 36)), 1);
|
||||
ymm1 = _mm256_inserti128_si256(
|
||||
ymm1, _mm_loadu_si128((const __m128i*)(src_raw + 36)), 1);
|
||||
|
||||
__m128i xmm2 = _mm_loadu_si128((const __m128i*)(src_raw + 48));
|
||||
__m256i ymm2 = _mm256_castsi128_si256(xmm2);
|
||||
ymm2 = _mm256_inserti128_si256(ymm2, _mm_loadu_si128((const __m128i*)(src_raw + 60)), 1);
|
||||
ymm2 = _mm256_inserti128_si256(
|
||||
ymm2, _mm_loadu_si128((const __m128i*)(src_raw + 60)), 1);
|
||||
|
||||
__m128i xmm3 = _mm_loadu_si128((const __m128i*)(src_raw + 68));
|
||||
__m256i ymm3 = _mm256_castsi128_si256(xmm3);
|
||||
ymm3 = _mm256_inserti128_si256(ymm3, _mm_loadu_si128((const __m128i*)(src_raw + 80)), 1);
|
||||
ymm3 = _mm256_inserti128_si256(
|
||||
ymm3, _mm_loadu_si128((const __m128i*)(src_raw + 80)), 1);
|
||||
|
||||
ymm0 = _mm256_shuffle_epi8(ymm0, ymm_shuf);
|
||||
ymm1 = _mm256_shuffle_epi8(ymm1, ymm_shuf);
|
||||
@ -312,10 +318,13 @@ void RAWToARGBRow_AVX2(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
|
||||
|
||||
#ifdef HAS_RAWTOARGBROW_AVX512BW
|
||||
LIBYUV_TARGET_AVX512BW
|
||||
void RGBToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, const __m128i* shuffler, int width) {
|
||||
void RGBToARGBRow_AVX512BW(const uint8_t* src_raw,
|
||||
uint8_t* dst_argb,
|
||||
const __m128i* shuffler,
|
||||
int width) {
|
||||
__m512i zmm_alpha = _mm512_set1_epi32(0xff000000);
|
||||
__m512i zmm_perm = _mm512_set_epi32(
|
||||
12, 11, 10, 9, 9, 8, 7, 6, 6, 5, 4, 3, 3, 2, 1, 0);
|
||||
__m512i zmm_perm =
|
||||
_mm512_set_epi32(12, 11, 10, 9, 9, 8, 7, 6, 6, 5, 4, 3, 3, 2, 1, 0);
|
||||
__m512i zmm_shuf = _mm512_broadcast_i32x4(_mm_loadu_si128(shuffler));
|
||||
|
||||
while (width > 0) {
|
||||
@ -351,14 +360,20 @@ void RGBToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, const __m1
|
||||
}
|
||||
|
||||
LIBYUV_TARGET_AVX512BW
|
||||
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw, uint8_t* dst_argb, int width) {
|
||||
__m128i shuf = _mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
|
||||
void RAWToARGBRow_AVX512BW(const uint8_t* src_raw,
|
||||
uint8_t* dst_argb,
|
||||
int width) {
|
||||
__m128i shuf =
|
||||
_mm_set_epi8(-1, 9, 10, 11, -1, 6, 7, 8, -1, 3, 4, 5, -1, 0, 1, 2);
|
||||
RGBToARGBRow_AVX512BW(src_raw, dst_argb, &shuf, width);
|
||||
}
|
||||
|
||||
LIBYUV_TARGET_AVX512BW
|
||||
void RGB24ToARGBRow_AVX512BW(const uint8_t* src_rgb24, uint8_t* dst_argb, int width) {
|
||||
__m128i shuf = _mm_set_epi8(-1, 11, 10, 9, -1, 8, 7, 6, -1, 5, 4, 3, -1, 2, 1, 0);
|
||||
void RGB24ToARGBRow_AVX512BW(const uint8_t* src_rgb24,
|
||||
uint8_t* dst_argb,
|
||||
int width) {
|
||||
__m128i shuf =
|
||||
_mm_set_epi8(-1, 11, 10, 9, -1, 8, 7, 6, -1, 5, 4, 3, -1, 2, 1, 0);
|
||||
RGBToARGBRow_AVX512BW(src_rgb24, dst_argb, &shuf, width);
|
||||
}
|
||||
#endif
|
||||
@ -374,16 +389,19 @@ void ARGBToUVMatrixRow_AVX2(const uint8_t* src_argb,
|
||||
__m256i ymm_u = _mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)c->kRGBToU));
|
||||
__m256i ymm_v = _mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)c->kRGBToV));
|
||||
__m256i ymm_0101 = _mm256_set1_epi16(0x0101);
|
||||
__m256i ymm_shuf = _mm256_setr_epi8(0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15,
|
||||
0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15);
|
||||
__m256i ymm_shuf =
|
||||
_mm256_setr_epi8(0, 4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15, 0,
|
||||
4, 1, 5, 2, 6, 3, 7, 8, 12, 9, 13, 10, 14, 11, 15);
|
||||
__m256i ymm_8000 = _mm256_set1_epi16((short)0x8000);
|
||||
__m256i ymm_zero = _mm256_setzero_si256();
|
||||
|
||||
while (width > 0) {
|
||||
__m256i ymm0 = _mm256_loadu_si256((const __m256i*)src_argb);
|
||||
__m256i ymm1 = _mm256_loadu_si256((const __m256i*)(src_argb + 32));
|
||||
__m256i ymm2 = _mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb));
|
||||
__m256i ymm3 = _mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb + 32));
|
||||
__m256i ymm2 =
|
||||
_mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb));
|
||||
__m256i ymm3 =
|
||||
_mm256_loadu_si256((const __m256i*)(src_argb + src_stride_argb + 32));
|
||||
|
||||
ymm0 = _mm256_shuffle_epi8(ymm0, ymm_shuf);
|
||||
ymm1 = _mm256_shuffle_epi8(ymm1, ymm_shuf);
|
||||
@ -455,8 +473,8 @@ void MergeUVRow_AVX2(const uint8_t* src_u,
|
||||
#ifdef HAS_MIRRORROW_AVX2
|
||||
LIBYUV_TARGET_AVX2
|
||||
void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
|
||||
__m256i ymm_shuf =
|
||||
_mm256_broadcastsi128_si256(_mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
|
||||
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
|
||||
_mm_setr_epi8(15, 14, 13, 12, 11, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0));
|
||||
src += width;
|
||||
while (width > 0) {
|
||||
src -= 32;
|
||||
@ -473,8 +491,8 @@ void MirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
|
||||
#ifdef HAS_MIRRORUVROW_AVX2
|
||||
LIBYUV_TARGET_AVX2
|
||||
void MirrorUVRow_AVX2(const uint8_t* src_uv, uint8_t* dst_uv, int width) {
|
||||
__m256i ymm_shuf =
|
||||
_mm256_broadcastsi128_si256(_mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1));
|
||||
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
|
||||
_mm_setr_epi8(14, 15, 12, 13, 10, 11, 8, 9, 6, 7, 4, 5, 2, 3, 0, 1));
|
||||
src_uv += width * 2;
|
||||
while (width > 0) {
|
||||
src_uv -= 32;
|
||||
@ -494,8 +512,8 @@ void MirrorSplitUVRow_AVX2(const uint8_t* src_uv,
|
||||
uint8_t* dst_u,
|
||||
uint8_t* dst_v,
|
||||
int width) {
|
||||
__m256i ymm_shuf =
|
||||
_mm256_broadcastsi128_si256(_mm_setr_epi8(14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1));
|
||||
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
|
||||
_mm_setr_epi8(14, 12, 10, 8, 6, 4, 2, 0, 15, 13, 11, 9, 7, 5, 3, 1));
|
||||
src_uv += width * 2;
|
||||
while (width > 0) {
|
||||
src_uv -= 32;
|
||||
@ -516,25 +534,28 @@ LIBYUV_TARGET_AVX2
|
||||
void RGB24MirrorRow_AVX2(const uint8_t* src_rgb24,
|
||||
uint8_t* dst_rgb24,
|
||||
int width) {
|
||||
__m256i shuf0 = _mm256_setr_epi8(
|
||||
-1, 12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2,
|
||||
-1, 12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2);
|
||||
__m128i shuf1 = _mm_setr_epi8(
|
||||
13, 14, 15, 10, 11, 12, 7, 8, 9, 4, 5, 6, 1, 2, 3, -1);
|
||||
__m256i shuf0 =
|
||||
_mm256_setr_epi8(-1, 12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2, -1,
|
||||
12, 13, 14, 9, 10, 11, 6, 7, 8, 3, 4, 5, 0, 1, 2);
|
||||
__m128i shuf1 =
|
||||
_mm_setr_epi8(13, 14, 15, 10, 11, 12, 7, 8, 9, 4, 5, 6, 1, 2, 3, -1);
|
||||
|
||||
src_rgb24 += width * 3 - 96;
|
||||
while (width > 0) {
|
||||
__m128i v0_lo = _mm_loadu_si128((const __m128i*)(src_rgb24 + 0));
|
||||
__m128i v0_hi = _mm_loadu_si128((const __m128i*)(src_rgb24 + 15));
|
||||
__m256i v0 = _mm256_inserti128_si256(_mm256_castsi128_si256(v0_lo), v0_hi, 1);
|
||||
__m256i v0 =
|
||||
_mm256_inserti128_si256(_mm256_castsi128_si256(v0_lo), v0_hi, 1);
|
||||
|
||||
__m128i v1_lo = _mm_loadu_si128((const __m128i*)(src_rgb24 + 30));
|
||||
__m128i v1_hi = _mm_loadu_si128((const __m128i*)(src_rgb24 + 45));
|
||||
__m256i v1 = _mm256_inserti128_si256(_mm256_castsi128_si256(v1_lo), v1_hi, 1);
|
||||
__m256i v1 =
|
||||
_mm256_inserti128_si256(_mm256_castsi128_si256(v1_lo), v1_hi, 1);
|
||||
|
||||
__m128i v2_lo = _mm_loadu_si128((const __m128i*)(src_rgb24 + 60));
|
||||
__m128i v2_hi = _mm_loadu_si128((const __m128i*)(src_rgb24 + 75));
|
||||
__m256i v2 = _mm256_inserti128_si256(_mm256_castsi128_si256(v2_lo), v2_hi, 1);
|
||||
__m256i v2 =
|
||||
_mm256_inserti128_si256(_mm256_castsi128_si256(v2_lo), v2_hi, 1);
|
||||
|
||||
__m128i v3 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 80));
|
||||
|
||||
@ -544,11 +565,14 @@ void RGB24MirrorRow_AVX2(const uint8_t* src_rgb24,
|
||||
v3 = _mm_shuffle_epi8(v3, shuf1);
|
||||
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 80), _mm256_castsi256_si128(v0));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 65), _mm256_extracti128_si256(v0, 1));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 65),
|
||||
_mm256_extracti128_si256(v0, 1));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 50), _mm256_castsi256_si128(v1));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 35), _mm256_extracti128_si256(v1, 1));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 35),
|
||||
_mm256_extracti128_si256(v1, 1));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 20), _mm256_castsi256_si128(v2));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 5), _mm256_extracti128_si256(v2, 1));
|
||||
_mm_storeu_si128((__m128i*)(dst_rgb24 + 5),
|
||||
_mm256_extracti128_si256(v2, 1));
|
||||
_mm_storel_epi64((__m128i*)(dst_rgb24 + 0), v3);
|
||||
|
||||
src_rgb24 -= 96;
|
||||
@ -629,7 +653,8 @@ void InterpolateRow_16_AVX2(uint16_t* dst_ptr,
|
||||
for (i = 0; i < width; i += 16) {
|
||||
__m256i row0 = _mm256_loadu_si256((const __m256i*)(src_ptr + i));
|
||||
__m256i row1 = _mm256_loadu_si256((const __m256i*)(src_ptr1 + i));
|
||||
_mm256_storeu_si256((__m256i*)(dst_ptr + i), _mm256_avg_epu16(row0, row1));
|
||||
_mm256_storeu_si256((__m256i*)(dst_ptr + i),
|
||||
_mm256_avg_epu16(row0, row1));
|
||||
}
|
||||
} else {
|
||||
for (i = 0; i < width; i += 16) {
|
||||
@ -672,21 +697,23 @@ void ARGBMirrorRow_AVX2(const uint8_t* src, uint8_t* dst, int width) {
|
||||
#ifdef HAS_J400TOARGBROW_AVX2
|
||||
alignas(32) static const uint8_t kShuffleMaskJ400ToARGB_0[32] = {
|
||||
0u, 0u, 0u, 128u, 1u, 1u, 1u, 128u, 2u, 2u, 2u, 128u, 3u, 3u, 3u, 128u,
|
||||
4u, 4u, 4u, 128u, 5u, 5u, 5u, 128u, 6u, 6u, 6u, 128u, 7u, 7u, 7u, 128u
|
||||
};
|
||||
4u, 4u, 4u, 128u, 5u, 5u, 5u, 128u, 6u, 6u, 6u, 128u, 7u, 7u, 7u, 128u};
|
||||
alignas(32) static const uint8_t kShuffleMaskJ400ToARGB_1[32] = {
|
||||
8u, 8u, 8u, 128u, 9u, 9u, 9u, 128u, 10u, 10u, 10u, 128u, 11u, 11u, 11u, 128u,
|
||||
12u, 12u, 12u, 128u, 13u, 13u, 13u, 128u, 14u, 14u, 14u, 128u, 15u, 15u, 15u, 128u
|
||||
};
|
||||
8u, 8u, 8u, 128u, 9u, 9u, 9u, 128u, 10u, 10u, 10u,
|
||||
128u, 11u, 11u, 11u, 128u, 12u, 12u, 12u, 128u, 13u, 13u,
|
||||
13u, 128u, 14u, 14u, 14u, 128u, 15u, 15u, 15u, 128u};
|
||||
|
||||
LIBYUV_TARGET_AVX2
|
||||
void J400ToARGBRow_AVX2(const uint8_t* src_y, uint8_t* dst_argb, int width) {
|
||||
__m256i ymm_mask0 = _mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_0);
|
||||
__m256i ymm_mask1 = _mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_1);
|
||||
__m256i ymm_mask0 =
|
||||
_mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_0);
|
||||
__m256i ymm_mask1 =
|
||||
_mm256_load_si256((const __m256i*)kShuffleMaskJ400ToARGB_1);
|
||||
__m256i ymm_alpha = _mm256_set1_epi32((int)0xff000000u);
|
||||
|
||||
while (width > 0) {
|
||||
__m256i ymm0 = _mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)src_y));
|
||||
__m256i ymm0 =
|
||||
_mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)src_y));
|
||||
|
||||
__m256i ymm1 = _mm256_shuffle_epi8(ymm0, ymm_mask0);
|
||||
__m256i ymm2 = _mm256_shuffle_epi8(ymm0, ymm_mask1);
|
||||
@ -707,13 +734,15 @@ void J400ToARGBRow_AVX2(const uint8_t* src_y, uint8_t* dst_argb, int width) {
|
||||
#ifdef HAS_RGB24TOARGBROW_AVX2
|
||||
alignas(16) static const uint8_t kShuffleMaskRGB24ToARGB[2][16] = {
|
||||
{0u, 1u, 2u, 128u, 3u, 4u, 5u, 128u, 6u, 7u, 8u, 128u, 9u, 10u, 11u, 128u},
|
||||
{4u, 5u, 6u, 128u, 7u, 8u, 9u, 128u, 10u, 11u, 12u, 128u, 13u, 14u, 15u, 128u}
|
||||
};
|
||||
{4u, 5u, 6u, 128u, 7u, 8u, 9u, 128u, 10u, 11u, 12u, 128u, 13u, 14u, 15u,
|
||||
128u}};
|
||||
#endif
|
||||
|
||||
#ifdef HAS_RGB565TOARGBROW_AVX2
|
||||
LIBYUV_TARGET_AVX2
|
||||
void RGB565ToARGBRow_AVX2(const uint8_t* src_rgb565, uint8_t* dst_argb, int width) {
|
||||
void RGB565ToARGBRow_AVX2(const uint8_t* src_rgb565,
|
||||
uint8_t* dst_argb,
|
||||
int width) {
|
||||
__m256i ymm_scale_rb = _mm256_set1_epi32(0x01080108);
|
||||
__m256i ymm_scale_g = _mm256_set1_epi32(0x20802080);
|
||||
__m256i ymm_mask_b = _mm256_set1_epi16((short)0xf800);
|
||||
@ -730,11 +759,11 @@ void RGB565ToARGBRow_AVX2(const uint8_t* src_rgb565, uint8_t* dst_argb, int widt
|
||||
ymm1 = _mm256_mulhi_epu16(ymm1, ymm_scale_rb);
|
||||
ymm2 = _mm256_mulhi_epu16(ymm2, ymm_scale_rb);
|
||||
ymm1 = _mm256_slli_epi16(ymm1, 8);
|
||||
ymm1 = _mm256_or_si256(ymm1, ymm2); // RB
|
||||
ymm1 = _mm256_or_si256(ymm1, ymm2); // RB
|
||||
|
||||
ymm0 = _mm256_and_si256(ymm0, ymm_mask_g);
|
||||
ymm0 = _mm256_mulhi_epu16(ymm0, ymm_scale_g);
|
||||
ymm0 = _mm256_or_si256(ymm0, ymm_mask_a); // GA
|
||||
ymm0 = _mm256_or_si256(ymm0, ymm_mask_a); // GA
|
||||
|
||||
ymm2 = _mm256_unpacklo_epi8(ymm1, ymm0);
|
||||
ymm1 = _mm256_unpackhi_epi8(ymm1, ymm0);
|
||||
@ -755,7 +784,9 @@ void RGB565ToARGBRow_AVX2(const uint8_t* src_rgb565, uint8_t* dst_argb, int widt
|
||||
|
||||
#ifdef HAS_ARGB1555TOARGBROW_AVX2
|
||||
LIBYUV_TARGET_AVX2
|
||||
void ARGB1555ToARGBRow_AVX2(const uint8_t* src_argb1555, uint8_t* dst_argb, int width) {
|
||||
void ARGB1555ToARGBRow_AVX2(const uint8_t* src_argb1555,
|
||||
uint8_t* dst_argb,
|
||||
int width) {
|
||||
__m256i ymm_scale_rb = _mm256_set1_epi32(0x01080108);
|
||||
__m256i ymm_scale_g = _mm256_set1_epi32(0x42004200);
|
||||
__m256i ymm_mask_b = _mm256_set1_epi16((short)0xf800);
|
||||
@ -773,14 +804,14 @@ void ARGB1555ToARGBRow_AVX2(const uint8_t* src_argb1555, uint8_t* dst_argb, int
|
||||
ymm2 = _mm256_mulhi_epu16(ymm2, ymm_scale_rb);
|
||||
ymm1 = _mm256_mulhi_epu16(ymm1, ymm_scale_rb);
|
||||
ymm1 = _mm256_slli_epi16(ymm1, 8);
|
||||
ymm1 = _mm256_or_si256(ymm1, ymm2); // RB
|
||||
ymm1 = _mm256_or_si256(ymm1, ymm2); // RB
|
||||
|
||||
ymm2 = ymm0;
|
||||
ymm0 = _mm256_and_si256(ymm0, ymm_mask_g);
|
||||
ymm2 = _mm256_srai_epi16(ymm2, 8);
|
||||
ymm0 = _mm256_mulhi_epu16(ymm0, ymm_scale_g);
|
||||
ymm2 = _mm256_and_si256(ymm2, ymm_mask_a);
|
||||
ymm0 = _mm256_or_si256(ymm0, ymm2); // GA
|
||||
ymm0 = _mm256_or_si256(ymm0, ymm2); // GA
|
||||
|
||||
ymm2 = _mm256_unpacklo_epi8(ymm1, ymm0);
|
||||
ymm1 = _mm256_unpackhi_epi8(ymm1, ymm0);
|
||||
@ -801,7 +832,9 @@ void ARGB1555ToARGBRow_AVX2(const uint8_t* src_argb1555, uint8_t* dst_argb, int
|
||||
|
||||
#ifdef HAS_ARGB4444TOARGBROW_AVX2
|
||||
LIBYUV_TARGET_AVX2
|
||||
void ARGB4444ToARGBRow_AVX2(const uint8_t* src_argb4444, uint8_t* dst_argb, int width) {
|
||||
void ARGB4444ToARGBRow_AVX2(const uint8_t* src_argb4444,
|
||||
uint8_t* dst_argb,
|
||||
int width) {
|
||||
__m256i ymm_mask = _mm256_set1_epi32(0x0f0f0f0f);
|
||||
__m256i ymm_mask2 = _mm256_slli_epi32(ymm_mask, 4);
|
||||
|
||||
@ -841,27 +874,35 @@ void ARGB4444ToARGBRow_AVX2(const uint8_t* src_argb4444, uint8_t* dst_argb, int
|
||||
|
||||
#ifdef HAS_RGB24TOARGBROW_AVX2
|
||||
LIBYUV_TARGET_AVX2
|
||||
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width) {
|
||||
void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24,
|
||||
uint8_t* dst_argb,
|
||||
int width) {
|
||||
__m256i ymm_alpha = _mm256_set1_epi32(0xff000000);
|
||||
__m256i ymm_shuf = _mm256_broadcastsi128_si256(_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[0]));
|
||||
__m256i ymm_shuf2 = _mm256_broadcastsi128_si256(_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[1]));
|
||||
__m256i ymm_shuf = _mm256_broadcastsi128_si256(
|
||||
_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[0]));
|
||||
__m256i ymm_shuf2 = _mm256_broadcastsi128_si256(
|
||||
_mm_load_si128((const __m128i*)kShuffleMaskRGB24ToARGB[1]));
|
||||
|
||||
while (width > 0) {
|
||||
__m128i xmm0 = _mm_loadu_si128((const __m128i*)src_rgb24);
|
||||
__m256i ymm0 = _mm256_castsi128_si256(xmm0);
|
||||
ymm0 = _mm256_inserti128_si256(ymm0, _mm_loadu_si128((const __m128i*)(src_rgb24 + 12)), 1);
|
||||
ymm0 = _mm256_inserti128_si256(
|
||||
ymm0, _mm_loadu_si128((const __m128i*)(src_rgb24 + 12)), 1);
|
||||
|
||||
__m128i xmm1 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 24));
|
||||
__m256i ymm1 = _mm256_castsi128_si256(xmm1);
|
||||
ymm1 = _mm256_inserti128_si256(ymm1, _mm_loadu_si128((const __m128i*)(src_rgb24 + 36)), 1);
|
||||
ymm1 = _mm256_inserti128_si256(
|
||||
ymm1, _mm_loadu_si128((const __m128i*)(src_rgb24 + 36)), 1);
|
||||
|
||||
__m128i xmm2 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 48));
|
||||
__m256i ymm2 = _mm256_castsi128_si256(xmm2);
|
||||
ymm2 = _mm256_inserti128_si256(ymm2, _mm_loadu_si128((const __m128i*)(src_rgb24 + 60)), 1);
|
||||
ymm2 = _mm256_inserti128_si256(
|
||||
ymm2, _mm_loadu_si128((const __m128i*)(src_rgb24 + 60)), 1);
|
||||
|
||||
__m128i xmm3 = _mm_loadu_si128((const __m128i*)(src_rgb24 + 68));
|
||||
__m256i ymm3 = _mm256_castsi128_si256(xmm3);
|
||||
ymm3 = _mm256_inserti128_si256(ymm3, _mm_loadu_si128((const __m128i*)(src_rgb24 + 80)), 1);
|
||||
ymm3 = _mm256_inserti128_si256(
|
||||
ymm3, _mm_loadu_si128((const __m128i*)(src_rgb24 + 80)), 1);
|
||||
|
||||
ymm0 = _mm256_shuffle_epi8(ymm0, ymm_shuf);
|
||||
ymm1 = _mm256_shuffle_epi8(ymm1, ymm_shuf);
|
||||
@ -886,6 +927,50 @@ void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width)
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef HAS_ARGBSHUFFLEROW_AVX2
|
||||
LIBYUV_TARGET_AVX2
|
||||
void ARGBShuffleRow_AVX2(const uint8_t* src_argb,
|
||||
uint8_t* dst_argb,
|
||||
const uint8_t* shuffler,
|
||||
int width) {
|
||||
__m256i control =
|
||||
_mm256_broadcastsi128_si256(_mm_loadu_si128((const __m128i*)shuffler));
|
||||
while (width >= 16) {
|
||||
__m256i row = _mm256_loadu_si256((const __m256i*)src_argb);
|
||||
__m256i row1 = _mm256_loadu_si256((const __m256i*)(src_argb + 32));
|
||||
row = _mm256_shuffle_epi8(row, control);
|
||||
row1 = _mm256_shuffle_epi8(row1, control);
|
||||
_mm256_storeu_si256((__m256i*)dst_argb, row);
|
||||
_mm256_storeu_si256((__m256i*)(dst_argb + 32), row1);
|
||||
src_argb += 64;
|
||||
dst_argb += 64;
|
||||
width -= 16;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef HAS_ARGBSHUFFLEROW_AVX512BW
|
||||
LIBYUV_TARGET_AVX512BW
|
||||
void ARGBShuffleRow_AVX512BW(const uint8_t* src_argb,
|
||||
uint8_t* dst_argb,
|
||||
const uint8_t* shuffler,
|
||||
int width) {
|
||||
__m512i control =
|
||||
_mm512_broadcast_i32x4(_mm_loadu_si128((const __m128i*)shuffler));
|
||||
while (width >= 32) {
|
||||
__m512i row = _mm512_loadu_si512((const __m512i*)src_argb);
|
||||
__m512i row1 = _mm512_loadu_si512((const __m512i*)(src_argb + 64));
|
||||
row = _mm512_shuffle_epi8(row, control);
|
||||
row1 = _mm512_shuffle_epi8(row1, control);
|
||||
_mm512_storeu_si512((__m512i*)dst_argb, row);
|
||||
_mm512_storeu_si512((__m512i*)(dst_argb + 64), row1);
|
||||
src_argb += 128;
|
||||
dst_argb += 128;
|
||||
width -= 32;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
#endif
|
||||
|
||||
#ifdef __cplusplus
|
||||
@ -893,4 +978,7 @@ void RGB24ToARGBRow_AVX2(const uint8_t* src_rgb24, uint8_t* dst_argb, int width)
|
||||
} // namespace libyuv
|
||||
#endif
|
||||
|
||||
#endif // !defined(LIBYUV_DISABLE_X86) && (defined(__x86_64__) || defined(__i386__) || defined(_M_X64) || defined(_M_X86)) && ((defined(_MSC_VER) && !defined(__clang__)) || defined(LIBYUV_ENABLE_ROWWIN))
|
||||
#endif // !defined(LIBYUV_DISABLE_X86) && (defined(__x86_64__) ||
|
||||
// defined(__i386__) || defined(_M_X64) || defined(_M_X86)) &&
|
||||
// ((defined(_MSC_VER) && !defined(__clang__)) ||
|
||||
// defined(LIBYUV_ENABLE_ROWWIN))
|
||||
|
||||
@ -1951,9 +1951,9 @@ int ScalePlane(const uint8_t* src,
|
||||
// Reject dimensions larger than 32768 (or smaller than -32768 for height).
|
||||
// This prevents FixedDiv signed integer overflows that can lead to division
|
||||
// by zero/overflow crashes (SIGFPE on x86) or incorrect step calculations.
|
||||
if (!src || src_width <= 0 || src_height == 0 ||
|
||||
src_width > 32768 || src_height < -32768 || src_height > 32768 ||
|
||||
!dst || dst_width <= 0 || dst_height <= 0) {
|
||||
if (!src || src_width <= 0 || src_height == 0 || src_width > 32768 ||
|
||||
src_height < -32768 || src_height > 32768 || !dst || dst_width <= 0 ||
|
||||
dst_height <= 0) {
|
||||
return -1;
|
||||
}
|
||||
// Simplify filtering when possible.
|
||||
@ -2059,9 +2059,9 @@ int ScalePlane_16(const uint16_t* src,
|
||||
// Reject dimensions larger than 32768 (or smaller than -32768 for height).
|
||||
// This prevents FixedDiv signed integer overflows that can lead to division
|
||||
// by zero/overflow crashes (SIGFPE on x86) or incorrect step calculations.
|
||||
if (!src || src_width <= 0 || src_height == 0 ||
|
||||
src_width > 32768 || src_height < -32768 || src_height > 32768 ||
|
||||
!dst || dst_width <= 0 || dst_height <= 0) {
|
||||
if (!src || src_width <= 0 || src_height == 0 || src_width > 32768 ||
|
||||
src_height < -32768 || src_height > 32768 || !dst || dst_width <= 0 ||
|
||||
dst_height <= 0) {
|
||||
return -1;
|
||||
}
|
||||
// Simplify filtering when possible.
|
||||
@ -2171,9 +2171,9 @@ int ScalePlane_12(const uint16_t* src,
|
||||
// Reject dimensions larger than 32768 (or smaller than -32768 for height).
|
||||
// This prevents FixedDiv signed integer overflows that can lead to division
|
||||
// by zero/overflow crashes (SIGFPE on x86) or incorrect step calculations.
|
||||
if (!src || src_width <= 0 || src_height == 0 ||
|
||||
src_width > 32768 || src_height < -32768 || src_height > 32768 ||
|
||||
!dst || dst_width <= 0 || dst_height <= 0) {
|
||||
if (!src || src_width <= 0 || src_height == 0 || src_width > 32768 ||
|
||||
src_height < -32768 || src_height > 32768 || !dst || dst_width <= 0 ||
|
||||
dst_height <= 0) {
|
||||
return -1;
|
||||
}
|
||||
// Simplify filtering when possible.
|
||||
|
||||
@ -792,10 +792,10 @@ void ScaleFilterCols64_C(uint8_t* dst_ptr,
|
||||
#undef BLENDER
|
||||
|
||||
// Same as 8 bit arm blender but return is cast to uint16_t
|
||||
#define BLENDER(a, b, f) \
|
||||
(uint16_t)( \
|
||||
(int)(a) + \
|
||||
(int)((((int64_t)((f)) * ((int64_t)(b) - (int)(a))) + 0x8000) >> 16))
|
||||
#define BLENDER(a, b, f) \
|
||||
(uint16_t)((int)(a) + \
|
||||
(int)((((int64_t)((f)) * ((int64_t)(b) - (int)(a))) + 0x8000) >> \
|
||||
16))
|
||||
|
||||
void ScaleFilterCols_16_C(uint16_t* dst_ptr,
|
||||
const uint16_t* src_ptr,
|
||||
@ -1196,7 +1196,7 @@ void ScaleARGBColsUp2_C(uint8_t* dst_argb,
|
||||
|
||||
// TODO(fbarchard): Replace 0x7f ^ f with 128-f. bug=607.
|
||||
// Mimics SSSE3 blender
|
||||
#define BLENDER1(a, b, f) ((a) * (0x7f ^ f) + (b)*f) >> 7
|
||||
#define BLENDER1(a, b, f) ((a) * (0x7f ^ f) + (b) * f) >> 7
|
||||
#define BLENDERC(a, b, f, s) \
|
||||
(uint32_t)(BLENDER1(((a) >> s) & 255, ((b) >> s) & 255, f) << s)
|
||||
#define BLENDER(a, b, f) \
|
||||
|
||||
@ -1759,25 +1759,25 @@ void ScaleRowUp2_Bilinear_16_AVX2(const uint16_t* src_ptr,
|
||||
void ScaleAddRow_SSE2(const uint8_t* src_ptr,
|
||||
uint16_t* dst_ptr,
|
||||
int src_width) {
|
||||
asm volatile("pxor %%xmm5,%%xmm5 \n"
|
||||
asm volatile("pxor %%xmm5,%%xmm5 \n"
|
||||
|
||||
// 16 pixel loop.
|
||||
LABELALIGN
|
||||
"1: \n"
|
||||
"movdqu (%0),%%xmm3 \n"
|
||||
"lea 0x10(%0),%0 \n" // src_ptr += 16
|
||||
"movdqu (%1),%%xmm0 \n"
|
||||
"movdqu 0x10(%1),%%xmm1 \n"
|
||||
"movdqa %%xmm3,%%xmm2 \n"
|
||||
"punpcklbw %%xmm5,%%xmm2 \n"
|
||||
"punpckhbw %%xmm5,%%xmm3 \n"
|
||||
"paddusw %%xmm2,%%xmm0 \n"
|
||||
"paddusw %%xmm3,%%xmm1 \n"
|
||||
"movdqu %%xmm0,(%1) \n"
|
||||
"movdqu %%xmm1,0x10(%1) \n"
|
||||
"lea 0x20(%1),%1 \n"
|
||||
"sub $0x10,%2 \n"
|
||||
"jg 1b \n"
|
||||
"1: \n"
|
||||
"movdqu (%0),%%xmm3 \n"
|
||||
"lea 0x10(%0),%0 \n" // src_ptr += 16
|
||||
"movdqu (%1),%%xmm0 \n"
|
||||
"movdqu 0x10(%1),%%xmm1 \n"
|
||||
"movdqa %%xmm3,%%xmm2 \n"
|
||||
"punpcklbw %%xmm5,%%xmm2 \n"
|
||||
"punpckhbw %%xmm5,%%xmm3 \n"
|
||||
"paddusw %%xmm2,%%xmm0 \n"
|
||||
"paddusw %%xmm3,%%xmm1 \n"
|
||||
"movdqu %%xmm0,(%1) \n"
|
||||
"movdqu %%xmm1,0x10(%1) \n"
|
||||
"lea 0x20(%1),%1 \n"
|
||||
"sub $0x10,%2 \n"
|
||||
"jg 1b \n"
|
||||
: "+r"(src_ptr), // %0
|
||||
"+r"(dst_ptr), // %1
|
||||
"+r"(src_width) // %2
|
||||
@ -1790,23 +1790,23 @@ void ScaleAddRow_SSE2(const uint8_t* src_ptr,
|
||||
void ScaleAddRow_AVX2(const uint8_t* src_ptr,
|
||||
uint16_t* dst_ptr,
|
||||
int src_width) {
|
||||
asm volatile("vpxor %%ymm5,%%ymm5,%%ymm5 \n"
|
||||
asm volatile("vpxor %%ymm5,%%ymm5,%%ymm5 \n"
|
||||
|
||||
LABELALIGN
|
||||
"1: \n"
|
||||
"vmovdqu (%0),%%ymm3 \n"
|
||||
"lea 0x20(%0),%0 \n" // src_ptr += 32
|
||||
"vpermq $0xd8,%%ymm3,%%ymm3 \n"
|
||||
"vpunpcklbw %%ymm5,%%ymm3,%%ymm2 \n"
|
||||
"vpunpckhbw %%ymm5,%%ymm3,%%ymm3 \n"
|
||||
"vpaddusw (%1),%%ymm2,%%ymm0 \n"
|
||||
"vpaddusw 0x20(%1),%%ymm3,%%ymm1 \n"
|
||||
"vmovdqu %%ymm0,(%1) \n"
|
||||
"vmovdqu %%ymm1,0x20(%1) \n"
|
||||
"lea 0x40(%1),%1 \n"
|
||||
"sub $0x20,%2 \n"
|
||||
"jg 1b \n"
|
||||
"vzeroupper \n"
|
||||
"1: \n"
|
||||
"vmovdqu (%0),%%ymm3 \n"
|
||||
"lea 0x20(%0),%0 \n" // src_ptr += 32
|
||||
"vpermq $0xd8,%%ymm3,%%ymm3 \n"
|
||||
"vpunpcklbw %%ymm5,%%ymm3,%%ymm2 \n"
|
||||
"vpunpckhbw %%ymm5,%%ymm3,%%ymm3 \n"
|
||||
"vpaddusw (%1),%%ymm2,%%ymm0 \n"
|
||||
"vpaddusw 0x20(%1),%%ymm3,%%ymm1 \n"
|
||||
"vmovdqu %%ymm0,(%1) \n"
|
||||
"vmovdqu %%ymm1,0x20(%1) \n"
|
||||
"lea 0x40(%1),%1 \n"
|
||||
"sub $0x20,%2 \n"
|
||||
"jg 1b \n"
|
||||
"vzeroupper \n"
|
||||
: "+r"(src_ptr), // %0
|
||||
"+r"(dst_ptr), // %1
|
||||
"+r"(src_width) // %2
|
||||
|
||||
@ -104,7 +104,7 @@ __declspec(naked) void ScaleRowDown2_SSSE3(const uint8_t* src_ptr,
|
||||
movdqu xmm0, [eax]
|
||||
movdqu xmm1, [eax + 16]
|
||||
lea eax, [eax + 32]
|
||||
psrlw xmm0, 8 // isolate odd pixels.
|
||||
psrlw xmm0, 8 // isolate odd pixels.
|
||||
psrlw xmm1, 8
|
||||
packuswb xmm0, xmm1
|
||||
movdqu [edx], xmm0
|
||||
@ -138,7 +138,7 @@ __declspec(naked) void ScaleRowDown2Linear_SSSE3(const uint8_t* src_ptr,
|
||||
lea eax, [eax + 32]
|
||||
pmaddubsw xmm0, xmm4 // horizontal add
|
||||
pmaddubsw xmm1, xmm4
|
||||
pavgw xmm0, xmm5 // (x + 1) / 2
|
||||
pavgw xmm0, xmm5 // (x + 1) / 2
|
||||
pavgw xmm1, xmm5
|
||||
packuswb xmm0, xmm1
|
||||
movdqu [edx], xmm0
|
||||
@ -213,7 +213,7 @@ __declspec(naked) void ScaleRowDown2_AVX2(const uint8_t* src_ptr,
|
||||
vpsrlw ymm0, ymm0, 8 // isolate odd pixels.
|
||||
vpsrlw ymm1, ymm1, 8
|
||||
vpackuswb ymm0, ymm0, ymm1
|
||||
vpermq ymm0, ymm0, 0xd8 // unmutate vpackuswb
|
||||
vpermq ymm0, ymm0, 0xd8 // unmutate vpackuswb
|
||||
vmovdqu [edx], ymm0
|
||||
lea edx, [edx + 32]
|
||||
sub ecx, 32
|
||||
@ -249,7 +249,7 @@ __declspec(naked) void ScaleRowDown2Linear_AVX2(const uint8_t* src_ptr,
|
||||
vpavgw ymm0, ymm0, ymm5 // (x + 1) / 2
|
||||
vpavgw ymm1, ymm1, ymm5
|
||||
vpackuswb ymm0, ymm0, ymm1
|
||||
vpermq ymm0, ymm0, 0xd8 // unmutate vpackuswb
|
||||
vpermq ymm0, ymm0, 0xd8 // unmutate vpackuswb
|
||||
vmovdqu [edx], ymm0
|
||||
lea edx, [edx + 32]
|
||||
sub ecx, 32
|
||||
@ -319,7 +319,7 @@ __declspec(naked) void ScaleRowDown4_SSSE3(const uint8_t* src_ptr,
|
||||
// src_stride ignored
|
||||
mov edx, [esp + 12] // dst_ptr
|
||||
mov ecx, [esp + 16] // dst_width
|
||||
pcmpeqb xmm5, xmm5 // generate mask 0x00ff0000
|
||||
pcmpeqb xmm5, xmm5 // generate mask 0x00ff0000
|
||||
psrld xmm5, 24
|
||||
pslld xmm5, 16
|
||||
|
||||
@ -424,7 +424,7 @@ __declspec(naked) void ScaleRowDown4_AVX2(const uint8_t* src_ptr,
|
||||
vpermq ymm0, ymm0, 0xd8 // unmutate vpackuswb
|
||||
vpsrlw ymm0, ymm0, 8
|
||||
vpackuswb ymm0, ymm0, ymm0
|
||||
vpermq ymm0, ymm0, 0xd8 // unmutate vpackuswb
|
||||
vpermq ymm0, ymm0, 0xd8 // unmutate vpackuswb
|
||||
vmovdqu [edx], xmm0
|
||||
lea edx, [edx + 16]
|
||||
sub ecx, 16
|
||||
@ -687,7 +687,7 @@ __declspec(naked) void ScaleRowDown38_SSSE3(const uint8_t* src_ptr,
|
||||
pshufb xmm1, xmm5
|
||||
paddusb xmm0, xmm1
|
||||
|
||||
movq qword ptr [edx], xmm0 // write 12 pixels
|
||||
movq qword ptr [edx], xmm0 // write 12 pixels
|
||||
movhlps xmm1, xmm0
|
||||
movd [edx + 8], xmm1
|
||||
lea edx, [edx + 12]
|
||||
@ -1030,7 +1030,7 @@ __declspec(naked) void ScaleARGBRowDown2Linear_SSE2(const uint8_t* src_argb,
|
||||
lea eax, [eax + 32]
|
||||
movdqa xmm2, xmm0
|
||||
shufps xmm0, xmm1, 0x88 // even pixels
|
||||
shufps xmm2, xmm1, 0xdd // odd pixels
|
||||
shufps xmm2, xmm1, 0xdd // odd pixels
|
||||
pavgb xmm0, xmm2
|
||||
movdqu [edx], xmm0
|
||||
lea edx, [edx + 16]
|
||||
@ -1216,7 +1216,7 @@ __declspec(naked) void ScaleARGBCols_SSE2(uint8_t* dst_argb,
|
||||
test ecx, 2
|
||||
je xloop29
|
||||
|
||||
// 2 Pixels.
|
||||
// 2 Pixels.
|
||||
movd xmm0, [esi + eax * 4] // 1 source x0 pixels
|
||||
movd xmm1, [esi + edx * 4] // 1 source x1 pixels
|
||||
pextrw eax, xmm2, 5 // get x2 integer.
|
||||
@ -1229,7 +1229,7 @@ __declspec(naked) void ScaleARGBCols_SSE2(uint8_t* dst_argb,
|
||||
test ecx, 1
|
||||
je xloop99
|
||||
|
||||
// 1 Pixels.
|
||||
// 1 Pixels.
|
||||
movd xmm0, [esi + eax * 4] // 1 source x2 pixels
|
||||
movd dword ptr [edi], xmm0
|
||||
xloop99:
|
||||
|
||||
@ -464,8 +464,7 @@ static void YUVFToRGBReference(int y, int u, int v, int* r, int* g, int* b) {
|
||||
static void YUVUToRGBReference(int y, int u, int v, int* r, int* g, int* b) {
|
||||
double y1 = (y - 16) * 1.164384;
|
||||
*r = RoundToByte(y1 - (v - 128) * -1.67867);
|
||||
*g = RoundToByte(y1 - (u - 128) * 0.187326 -
|
||||
(v - 128) * 0.65042);
|
||||
*g = RoundToByte(y1 - (u - 128) * 0.187326 - (v - 128) * 0.65042);
|
||||
*b = RoundToByte(y1 - (u - 128) * -2.14177);
|
||||
}
|
||||
|
||||
|
||||
@ -53,9 +53,9 @@ namespace libyuv {
|
||||
#define ABGRToABGR ARGBCopy
|
||||
|
||||
// subsample amount uses a divide.
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a)-1)) / (a))
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a) - 1)) / (a))
|
||||
|
||||
#define ALIGNINT(V, ALIGN) (((V) + (ALIGN)-1) / (ALIGN) * (ALIGN))
|
||||
#define ALIGNINT(V, ALIGN) (((V) + (ALIGN) - 1) / (ALIGN) * (ALIGN))
|
||||
|
||||
#define TESTBPTOPI(SRC_FMT_PLANAR, SRC_T, SRC_BPC, SRC_SUBSAMP_X, \
|
||||
SRC_SUBSAMP_Y, FMT_PLANAR, DST_T, DST_BPC, DST_SUBSAMP_X, \
|
||||
@ -82,15 +82,19 @@ namespace libyuv {
|
||||
(kHeight + (TILE_HEIGHT - 1)) & ~(TILE_HEIGHT - 1); \
|
||||
const int kSrcHalfPaddedWidth = SUBSAMPLE(kPaddedWidth, SRC_SUBSAMP_X); \
|
||||
const int kSrcHalfPaddedHeight = SUBSAMPLE(kPaddedHeight, SRC_SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kPaddedWidth* kPaddedHeight* SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_y, \
|
||||
kPaddedWidth * kPaddedHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end( \
|
||||
src_uv, kSrcHalfPaddedWidth* kSrcHalfPaddedHeight* SRC_BPC * 2 + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_u_c, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_v_c, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_u_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_v_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
src_uv, \
|
||||
kSrcHalfPaddedWidth * kSrcHalfPaddedHeight * SRC_BPC * 2 + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_u_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_v_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_u_opt, \
|
||||
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_v_opt, \
|
||||
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
SRC_T* src_y_p = reinterpret_cast<SRC_T*>(src_y + OFF); \
|
||||
SRC_T* src_uv_p = reinterpret_cast<SRC_T*>(src_uv + OFF); \
|
||||
for (int i = 0; i < kPaddedWidth * kPaddedHeight; ++i) { \
|
||||
@ -101,12 +105,12 @@ namespace libyuv {
|
||||
src_uv_p[i] = \
|
||||
(fastrand() & (((SRC_T)(-1)) << ((8 * SRC_BPC) - SRC_DEPTH))); \
|
||||
} \
|
||||
memset(dst_y_c, 1, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_u_c, 2, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_v_c, 3, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_u_opt, 102, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_v_opt, 103, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_u_c, 2, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
memset(dst_v_c, 3, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_u_opt, 102, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
memset(dst_v_opt, 103, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
SRC_FMT_PLANAR##To##FMT_PLANAR( \
|
||||
src_y_p, kWidth, src_uv_p, kSrcHalfWidth * 2, \
|
||||
@ -223,11 +227,11 @@ TESTBPTOP(P012, uint16_t, 2, 2, 2, I012, uint16_t, 2, 2, 2, 12, 1, 1)
|
||||
const int kStrideB = ALIGNINT(kWidth * BPP_B, ALIGN); \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV + OFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight + OFF); \
|
||||
for (int i = 0; i < kWidth * kHeight; ++i) { \
|
||||
src_y[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
@ -381,58 +385,58 @@ TESTPLANARTOB(I444, 1, 1, ABGR, 4, 4, 1)
|
||||
TESTPLANARTOB(I444, 1, 1, ARGB, 4, 4, 1)
|
||||
#endif
|
||||
|
||||
#define TESTBPTOBI(FMT_PLANAR, SUBSAMP_X, SUBSAMP_Y, FMT_B, FMT_C, BPP_B, \
|
||||
W1280, N, NEG, OFF) \
|
||||
TEST_F(LibYUVConvertTest, FMT_PLANAR##To##FMT_B##N) { \
|
||||
const int kWidth = W1280; \
|
||||
const int kHeight = benchmark_height_; \
|
||||
const int kStrideB = kWidth * BPP_B; \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_uv, \
|
||||
kStrideUV* SUBSAMPLE(kHeight, SUBSAMP_Y) * 2 + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight); \
|
||||
for (int i = 0; i < kHeight; ++i) \
|
||||
for (int j = 0; j < kWidth; ++j) \
|
||||
src_y[i * kWidth + j + OFF] = (fastrand() & 0xff); \
|
||||
for (int i = 0; i < SUBSAMPLE(kHeight, SUBSAMP_Y); ++i) { \
|
||||
for (int j = 0; j < kStrideUV * 2; ++j) { \
|
||||
src_uv[i * kStrideUV * 2 + j + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
} \
|
||||
memset(dst_argb_c, 1, kStrideB* kHeight); \
|
||||
memset(dst_argb_opt, 101, kStrideB* kHeight); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_PLANAR##To##FMT_B(src_y + OFF, kWidth, src_uv + OFF, kStrideUV * 2, \
|
||||
dst_argb_c, kWidth * BPP_B, kWidth, NEG kHeight); \
|
||||
MaskCpuFlags(benchmark_cpu_info_); \
|
||||
for (int i = 0; i < benchmark_iterations_; ++i) { \
|
||||
FMT_PLANAR##To##FMT_B(src_y + OFF, kWidth, src_uv + OFF, kStrideUV * 2, \
|
||||
dst_argb_opt, kWidth * BPP_B, kWidth, \
|
||||
NEG kHeight); \
|
||||
} \
|
||||
/* Convert to ARGB so 565 is expanded to bytes that can be compared. */ \
|
||||
align_buffer_page_end(dst_argb32_c, kWidth * 4 * kHeight); \
|
||||
align_buffer_page_end(dst_argb32_opt, kWidth * 4 * kHeight); \
|
||||
memset(dst_argb32_c, 2, kWidth * 4 * kHeight); \
|
||||
memset(dst_argb32_opt, 102, kWidth * 4 * kHeight); \
|
||||
FMT_C##ToARGB(dst_argb_c, kStrideB, dst_argb32_c, kWidth * 4, kWidth, \
|
||||
kHeight); \
|
||||
FMT_C##ToARGB(dst_argb_opt, kStrideB, dst_argb32_opt, kWidth * 4, kWidth, \
|
||||
kHeight); \
|
||||
for (int i = 0; i < kHeight; ++i) { \
|
||||
for (int j = 0; j < kWidth * 4; ++j) { \
|
||||
ASSERT_EQ(dst_argb32_c[i * kWidth * 4 + j], \
|
||||
dst_argb32_opt[i * kWidth * 4 + j]); \
|
||||
} \
|
||||
} \
|
||||
free_aligned_buffer_page_end(src_y); \
|
||||
free_aligned_buffer_page_end(src_uv); \
|
||||
free_aligned_buffer_page_end(dst_argb_c); \
|
||||
free_aligned_buffer_page_end(dst_argb_opt); \
|
||||
free_aligned_buffer_page_end(dst_argb32_c); \
|
||||
free_aligned_buffer_page_end(dst_argb32_opt); \
|
||||
#define TESTBPTOBI(FMT_PLANAR, SUBSAMP_X, SUBSAMP_Y, FMT_B, FMT_C, BPP_B, \
|
||||
W1280, N, NEG, OFF) \
|
||||
TEST_F(LibYUVConvertTest, FMT_PLANAR##To##FMT_B##N) { \
|
||||
const int kWidth = W1280; \
|
||||
const int kHeight = benchmark_height_; \
|
||||
const int kStrideB = kWidth * BPP_B; \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end( \
|
||||
src_uv, kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y) * 2 + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight); \
|
||||
for (int i = 0; i < kHeight; ++i) \
|
||||
for (int j = 0; j < kWidth; ++j) \
|
||||
src_y[i * kWidth + j + OFF] = (fastrand() & 0xff); \
|
||||
for (int i = 0; i < SUBSAMPLE(kHeight, SUBSAMP_Y); ++i) { \
|
||||
for (int j = 0; j < kStrideUV * 2; ++j) { \
|
||||
src_uv[i * kStrideUV * 2 + j + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
} \
|
||||
memset(dst_argb_c, 1, kStrideB * kHeight); \
|
||||
memset(dst_argb_opt, 101, kStrideB * kHeight); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_PLANAR##To##FMT_B(src_y + OFF, kWidth, src_uv + OFF, kStrideUV * 2, \
|
||||
dst_argb_c, kWidth * BPP_B, kWidth, NEG kHeight); \
|
||||
MaskCpuFlags(benchmark_cpu_info_); \
|
||||
for (int i = 0; i < benchmark_iterations_; ++i) { \
|
||||
FMT_PLANAR##To##FMT_B(src_y + OFF, kWidth, src_uv + OFF, kStrideUV * 2, \
|
||||
dst_argb_opt, kWidth * BPP_B, kWidth, \
|
||||
NEG kHeight); \
|
||||
} \
|
||||
/* Convert to ARGB so 565 is expanded to bytes that can be compared. */ \
|
||||
align_buffer_page_end(dst_argb32_c, kWidth * 4 * kHeight); \
|
||||
align_buffer_page_end(dst_argb32_opt, kWidth * 4 * kHeight); \
|
||||
memset(dst_argb32_c, 2, kWidth * 4 * kHeight); \
|
||||
memset(dst_argb32_opt, 102, kWidth * 4 * kHeight); \
|
||||
FMT_C##ToARGB(dst_argb_c, kStrideB, dst_argb32_c, kWidth * 4, kWidth, \
|
||||
kHeight); \
|
||||
FMT_C##ToARGB(dst_argb_opt, kStrideB, dst_argb32_opt, kWidth * 4, kWidth, \
|
||||
kHeight); \
|
||||
for (int i = 0; i < kHeight; ++i) { \
|
||||
for (int j = 0; j < kWidth * 4; ++j) { \
|
||||
ASSERT_EQ(dst_argb32_c[i * kWidth * 4 + j], \
|
||||
dst_argb32_opt[i * kWidth * 4 + j]); \
|
||||
} \
|
||||
} \
|
||||
free_aligned_buffer_page_end(src_y); \
|
||||
free_aligned_buffer_page_end(src_uv); \
|
||||
free_aligned_buffer_page_end(dst_argb_c); \
|
||||
free_aligned_buffer_page_end(dst_argb_opt); \
|
||||
free_aligned_buffer_page_end(dst_argb32_c); \
|
||||
free_aligned_buffer_page_end(dst_argb32_opt); \
|
||||
}
|
||||
|
||||
#if defined(ENABLE_FULL_TESTS)
|
||||
@ -507,15 +511,16 @@ TESTBPTOB(NV12, 2, 2, RGB565, RGB565, 2)
|
||||
const int kStrideB = \
|
||||
(kWidth * EPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
|
||||
align_buffer_page_end(src_argb, \
|
||||
kStrideA* kHeightA*(int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeightB*(int)sizeof(TYPE_B)); \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, \
|
||||
kStrideB * kHeightB * (int)sizeof(TYPE_B)); \
|
||||
align_buffer_page_end(dst_argb_opt, \
|
||||
kStrideB* kHeightB*(int)sizeof(TYPE_B)); \
|
||||
kStrideB * kHeightB * (int)sizeof(TYPE_B)); \
|
||||
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
|
||||
src_argb[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
memset(dst_argb_c, 1, kStrideB* kHeightB); \
|
||||
memset(dst_argb_opt, 101, kStrideB* kHeightB); \
|
||||
memset(dst_argb_c, 1, kStrideB * kHeightB); \
|
||||
memset(dst_argb_opt, 101, kStrideB * kHeightB); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_A##To##FMT_B((TYPE_A*)(src_argb + OFF), kStrideA, (TYPE_B*)dst_argb_c, \
|
||||
kStrideB, kWidth, NEG kHeight); \
|
||||
@ -532,41 +537,42 @@ TESTBPTOB(NV12, 2, 2, RGB565, RGB565, 2)
|
||||
free_aligned_buffer_page_end(dst_argb_opt); \
|
||||
}
|
||||
|
||||
#define TESTATOBRANDOM(FMT_A, TYPE_A, EPP_A, STRIDE_A, HEIGHT_A, FMT_B, \
|
||||
TYPE_B, EPP_B, STRIDE_B, HEIGHT_B) \
|
||||
TEST_F(LibYUVConvertTest, FMT_A##To##FMT_B##_Random) { \
|
||||
for (int times = 0; times < benchmark_iterations_; ++times) { \
|
||||
const int kWidth = (fastrand() & 63) + 1; \
|
||||
const int kHeight = (fastrand() & 31) + 1; \
|
||||
const int kHeightA = (kHeight + HEIGHT_A - 1) / HEIGHT_A * HEIGHT_A; \
|
||||
const int kHeightB = (kHeight + HEIGHT_B - 1) / HEIGHT_B * HEIGHT_B; \
|
||||
const int kStrideA = \
|
||||
(kWidth * EPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
|
||||
const int kStrideB = \
|
||||
(kWidth * EPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
|
||||
align_buffer_page_end(src_argb, kStrideA* kHeightA*(int)sizeof(TYPE_A)); \
|
||||
align_buffer_page_end(dst_argb_c, \
|
||||
kStrideB* kHeightB*(int)sizeof(TYPE_B)); \
|
||||
align_buffer_page_end(dst_argb_opt, \
|
||||
kStrideB* kHeightB*(int)sizeof(TYPE_B)); \
|
||||
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
|
||||
src_argb[i] = 0xfe; \
|
||||
} \
|
||||
memset(dst_argb_c, 123, kStrideB* kHeightB); \
|
||||
memset(dst_argb_opt, 123, kStrideB* kHeightB); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_A##To##FMT_B((TYPE_A*)src_argb, kStrideA, (TYPE_B*)dst_argb_c, \
|
||||
kStrideB, kWidth, kHeight); \
|
||||
MaskCpuFlags(benchmark_cpu_info_); \
|
||||
FMT_A##To##FMT_B((TYPE_A*)src_argb, kStrideA, (TYPE_B*)dst_argb_opt, \
|
||||
kStrideB, kWidth, kHeight); \
|
||||
for (int i = 0; i < kStrideB * kHeightB * (int)sizeof(TYPE_B); ++i) { \
|
||||
ASSERT_EQ(dst_argb_c[i], dst_argb_opt[i]); \
|
||||
} \
|
||||
free_aligned_buffer_page_end(src_argb); \
|
||||
free_aligned_buffer_page_end(dst_argb_c); \
|
||||
free_aligned_buffer_page_end(dst_argb_opt); \
|
||||
} \
|
||||
#define TESTATOBRANDOM(FMT_A, TYPE_A, EPP_A, STRIDE_A, HEIGHT_A, FMT_B, \
|
||||
TYPE_B, EPP_B, STRIDE_B, HEIGHT_B) \
|
||||
TEST_F(LibYUVConvertTest, FMT_A##To##FMT_B##_Random) { \
|
||||
for (int times = 0; times < benchmark_iterations_; ++times) { \
|
||||
const int kWidth = (fastrand() & 63) + 1; \
|
||||
const int kHeight = (fastrand() & 31) + 1; \
|
||||
const int kHeightA = (kHeight + HEIGHT_A - 1) / HEIGHT_A * HEIGHT_A; \
|
||||
const int kHeightB = (kHeight + HEIGHT_B - 1) / HEIGHT_B * HEIGHT_B; \
|
||||
const int kStrideA = \
|
||||
(kWidth * EPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
|
||||
const int kStrideB = \
|
||||
(kWidth * EPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
|
||||
align_buffer_page_end(src_argb, \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A)); \
|
||||
align_buffer_page_end(dst_argb_c, \
|
||||
kStrideB * kHeightB * (int)sizeof(TYPE_B)); \
|
||||
align_buffer_page_end(dst_argb_opt, \
|
||||
kStrideB * kHeightB * (int)sizeof(TYPE_B)); \
|
||||
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
|
||||
src_argb[i] = 0xfe; \
|
||||
} \
|
||||
memset(dst_argb_c, 123, kStrideB * kHeightB); \
|
||||
memset(dst_argb_opt, 123, kStrideB * kHeightB); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_A##To##FMT_B((TYPE_A*)src_argb, kStrideA, (TYPE_B*)dst_argb_c, \
|
||||
kStrideB, kWidth, kHeight); \
|
||||
MaskCpuFlags(benchmark_cpu_info_); \
|
||||
FMT_A##To##FMT_B((TYPE_A*)src_argb, kStrideA, (TYPE_B*)dst_argb_opt, \
|
||||
kStrideB, kWidth, kHeight); \
|
||||
for (int i = 0; i < kStrideB * kHeightB * (int)sizeof(TYPE_B); ++i) { \
|
||||
ASSERT_EQ(dst_argb_c[i], dst_argb_opt[i]); \
|
||||
} \
|
||||
free_aligned_buffer_page_end(src_argb); \
|
||||
free_aligned_buffer_page_end(dst_argb_c); \
|
||||
free_aligned_buffer_page_end(dst_argb_opt); \
|
||||
} \
|
||||
}
|
||||
|
||||
#if defined(ENABLE_FULL_TESTS)
|
||||
@ -672,11 +678,11 @@ TESTATOB(AB64, uint16_t, 4, 4, 1, AR64, uint16_t, 4, 4, 1)
|
||||
const int kStrideB = \
|
||||
(kWidth * EPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
|
||||
align_buffer_page_end(src_argb, \
|
||||
kStrideA* kHeightA*(int)sizeof(TYPE_A) + OFF); \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, \
|
||||
kStrideA* kHeightA*(int)sizeof(TYPE_A) + OFF); \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, \
|
||||
kStrideA* kHeightA*(int)sizeof(TYPE_A) + OFF); \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
|
||||
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
|
||||
src_argb[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
@ -791,14 +797,14 @@ TESTATOA(AB64, uint16_t, 4, 4, 1, AR64, uint16_t, 4, 4, 1)
|
||||
(kWidth * BPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
|
||||
const int kStrideB = \
|
||||
(kWidth * BPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
|
||||
align_buffer_page_end(src_argb, kStrideA* kHeightA + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeightB); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeightB); \
|
||||
align_buffer_page_end(src_argb, kStrideA * kHeightA + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeightB); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeightB); \
|
||||
for (int i = 0; i < kStrideA * kHeightA; ++i) { \
|
||||
src_argb[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
memset(dst_argb_c, 1, kStrideB* kHeightB); \
|
||||
memset(dst_argb_opt, 101, kStrideB* kHeightB); \
|
||||
memset(dst_argb_c, 1, kStrideB * kHeightB); \
|
||||
memset(dst_argb_opt, 101, kStrideB * kHeightB); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_A##To##FMT_B##Dither(src_argb + OFF, kStrideA, dst_argb_c, kStrideB, \
|
||||
NULL, kWidth, NEG kHeight); \
|
||||
@ -827,14 +833,14 @@ TESTATOA(AB64, uint16_t, 4, 4, 1, AR64, uint16_t, 4, 4, 1)
|
||||
(kWidth * BPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
|
||||
const int kStrideB = \
|
||||
(kWidth * BPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
|
||||
align_buffer_page_end(src_argb, kStrideA* kHeightA); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeightB); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeightB); \
|
||||
align_buffer_page_end(src_argb, kStrideA * kHeightA); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeightB); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeightB); \
|
||||
for (int i = 0; i < kStrideA * kHeightA; ++i) { \
|
||||
src_argb[i] = (fastrand() & 0xff); \
|
||||
} \
|
||||
memset(dst_argb_c, 123, kStrideB* kHeightB); \
|
||||
memset(dst_argb_opt, 123, kStrideB* kHeightB); \
|
||||
memset(dst_argb_c, 123, kStrideB * kHeightB); \
|
||||
memset(dst_argb_opt, 123, kStrideB * kHeightB); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_A##To##FMT_B##Dither(src_argb, kStrideA, dst_argb_c, kStrideB, NULL, \
|
||||
kWidth, kHeight); \
|
||||
@ -885,15 +891,16 @@ TESTATOBD(ARGB, 4, 4, 1, RGB565, 2, 2, 1)
|
||||
const int kStrideA = \
|
||||
(kWidth * EPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
|
||||
align_buffer_page_end(src_argb, \
|
||||
kStrideA* kHeightA*(int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideA* kHeightA*(int)sizeof(TYPE_A)); \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A)); \
|
||||
align_buffer_page_end(dst_argb_opt, \
|
||||
kStrideA* kHeightA*(int)sizeof(TYPE_A)); \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A)); \
|
||||
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
|
||||
src_argb[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
memset(dst_argb_c, 1, kStrideA* kHeightA); \
|
||||
memset(dst_argb_opt, 101, kStrideA* kHeightA); \
|
||||
memset(dst_argb_c, 1, kStrideA * kHeightA); \
|
||||
memset(dst_argb_opt, 101, kStrideA * kHeightA); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_ATOB((TYPE_A*)(src_argb + OFF), kStrideA, (TYPE_A*)dst_argb_c, \
|
||||
kStrideA, kWidth, NEG kHeight); \
|
||||
@ -945,12 +952,12 @@ TESTEND(AB64ToAR64, uint16_t, 4, 4, 1)
|
||||
const int kStrideB = ALIGNINT(kWidth * BPP_B, ALIGN); \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV + OFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV + OFF); \
|
||||
align_buffer_page_end(src_a, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(src_a, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight + OFF); \
|
||||
for (int i = 0; i < kWidth * kHeight; ++i) { \
|
||||
src_y[i + OFF] = (fastrand() & 0xff); \
|
||||
src_a[i + OFF] = (fastrand() & 0xff); \
|
||||
@ -1240,11 +1247,11 @@ TEST_F(LibYUVConvertTest, TestDither) {
|
||||
const int kStrideB = ALIGNINT(kWidth * BPP_B, ALIGN); \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV + OFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight + OFF); \
|
||||
for (int i = 0; i < kWidth * kHeight; ++i) { \
|
||||
src_y[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
@ -1265,10 +1272,10 @@ TEST_F(LibYUVConvertTest, TestDither) {
|
||||
dst_argb_opt + OFF, kStrideB, NULL, kWidth, NEG kHeight); \
|
||||
} \
|
||||
/* Convert to ARGB so 565 is expanded to bytes that can be compared. */ \
|
||||
align_buffer_page_end(dst_argb32_c, kWidth* BPP_C* kHeight); \
|
||||
align_buffer_page_end(dst_argb32_opt, kWidth* BPP_C* kHeight); \
|
||||
memset(dst_argb32_c, 2, kWidth* BPP_C* kHeight); \
|
||||
memset(dst_argb32_opt, 102, kWidth* BPP_C* kHeight); \
|
||||
align_buffer_page_end(dst_argb32_c, kWidth * BPP_C * kHeight); \
|
||||
align_buffer_page_end(dst_argb32_opt, kWidth * BPP_C * kHeight); \
|
||||
memset(dst_argb32_c, 2, kWidth * BPP_C * kHeight); \
|
||||
memset(dst_argb32_opt, 102, kWidth * BPP_C * kHeight); \
|
||||
FMT_B##To##FMT_C(dst_argb_c + OFF, kStrideB, dst_argb32_c, kWidth * BPP_C, \
|
||||
kWidth, kHeight); \
|
||||
FMT_B##To##FMT_C(dst_argb_opt + OFF, kStrideB, dst_argb32_opt, \
|
||||
@ -1317,10 +1324,10 @@ TESTPLANARTOBD(I420, 2, 2, RGB565, 2, 2, 1, ARGB, 4)
|
||||
const int kStrideB = SUBSAMPLE(kWidth, SUB_B) * BPP_B; \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV + OFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV + OFF); \
|
||||
align_buffer_page_end(dst_argb_b, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_b, kStrideB * kHeight + OFF); \
|
||||
for (int i = 0; i < kWidth * kHeight; ++i) { \
|
||||
src_y[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
@ -1334,8 +1341,8 @@ TESTPLANARTOBD(I420, 2, 2, RGB565, 2, 2, 1, ARGB, 4)
|
||||
kWidth, NEG kHeight); \
|
||||
/* Convert to a 3rd format in 1 step and 2 steps and compare */ \
|
||||
const int kStrideC = kWidth * BPP_C; \
|
||||
align_buffer_page_end(dst_argb_c, kStrideC* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_bc, kStrideC* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideC * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_bc, kStrideC * kHeight + OFF); \
|
||||
memset(dst_argb_c + OFF, 2, kStrideC * kHeight); \
|
||||
memset(dst_argb_bc + OFF, 3, kStrideC * kHeight); \
|
||||
for (int i = 0; i < benchmark_iterations_; ++i) { \
|
||||
@ -1464,14 +1471,14 @@ TESTPLANARTOE(I444, 1, 1, ABGR, 1, 4, ARGB, 4)
|
||||
const int kStrideB = SUBSAMPLE(kWidth, SUB_B) * BPP_B; \
|
||||
const int kSizeUV = \
|
||||
SUBSAMPLE(kWidth, SUBSAMP_X) * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV + OFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV + OFF); \
|
||||
align_buffer_page_end(src_a, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_b, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(src_a, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_b, kStrideB * kHeight + OFF); \
|
||||
const int kStrideC = kWidth * BPP_C; \
|
||||
align_buffer_page_end(dst_argb_c, kStrideC* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_bc, kStrideC* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideC * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_bc, kStrideC * kHeight + OFF); \
|
||||
memset(dst_argb_c + OFF, 2, kStrideC * kHeight); \
|
||||
memset(dst_argb_b + OFF, 1, kStrideB * kHeight); \
|
||||
memset(dst_argb_bc + OFF, 3, kStrideC * kHeight); \
|
||||
@ -1578,16 +1585,16 @@ TESTQPLANARTOE(I444Alpha, 1, 1, ABGR, 1, 4, ARGB, 4)
|
||||
const int kHeight = benchmark_height_; \
|
||||
const int kStrideA = SUBSAMPLE(kWidth, SUB_A) * BPP_A; \
|
||||
const int kStrideB = SUBSAMPLE(kWidth, SUB_B) * BPP_B; \
|
||||
align_buffer_page_end(src_argb_a, kStrideA* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_b, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(src_argb_a, kStrideA * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_b, kStrideB * kHeight + OFF); \
|
||||
MemRandomize(src_argb_a + OFF, kStrideA * kHeight); \
|
||||
memset(dst_argb_b + OFF, 1, kStrideB * kHeight); \
|
||||
FMT_A##To##FMT_B(src_argb_a + OFF, kStrideA, dst_argb_b + OFF, kStrideB, \
|
||||
kWidth, NEG kHeight); \
|
||||
/* Convert to a 3rd format in 1 step and 2 steps and compare */ \
|
||||
const int kStrideC = kWidth * BPP_C; \
|
||||
align_buffer_page_end(dst_argb_c, kStrideC* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_bc, kStrideC* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideC * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_bc, kStrideC * kHeight + OFF); \
|
||||
memset(dst_argb_c + OFF, 2, kStrideC * kHeight); \
|
||||
memset(dst_argb_bc + OFF, 3, kStrideC * kHeight); \
|
||||
for (int i = 0; i < benchmark_iterations_; ++i) { \
|
||||
@ -1798,11 +1805,11 @@ TEST_F(LibYUVConvertTest, ABGRToAR30Row_Opt) {
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
const int kBpc = 2; \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight* kBpc + SOFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV* kBpc + SOFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV* kBpc + SOFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight + DOFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight + DOFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight * kBpc + SOFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV * kBpc + SOFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV * kBpc + SOFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight + DOFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight + DOFF); \
|
||||
for (int i = 0; i < kWidth * kHeight; ++i) { \
|
||||
reinterpret_cast<uint16_t*>(src_y + SOFF)[i] = (fastrand() & FMT_MASK); \
|
||||
} \
|
||||
@ -1913,12 +1920,12 @@ TESTPLANAR16TOB(I210, 2, 1, 0x3ff, AR30Filter, 4, 4, 1)
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
const int kBpc = 2; \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight* kBpc + OFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV* kBpc + OFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV* kBpc + OFF); \
|
||||
align_buffer_page_end(src_a, kWidth* kHeight* kBpc + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight * kBpc + OFF); \
|
||||
align_buffer_page_end(src_u, kSizeUV * kBpc + OFF); \
|
||||
align_buffer_page_end(src_v, kSizeUV * kBpc + OFF); \
|
||||
align_buffer_page_end(src_a, kWidth * kHeight * kBpc + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight + OFF); \
|
||||
for (int i = 0; i < kWidth * kHeight; ++i) { \
|
||||
reinterpret_cast<uint16_t*>(src_y + OFF)[i] = \
|
||||
(fastrand() & ((1 << S_DEPTH) - 1)); \
|
||||
@ -2146,10 +2153,10 @@ TESTQPLANAR16TOB(I210Alpha, 2, 1, ARGBFilter, 4, 4, 1, 10)
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X) * 2; \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y) * 2; \
|
||||
const int kBpc = 2; \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight* kBpc + SOFF); \
|
||||
align_buffer_page_end(src_uv, kSizeUV* kBpc + SOFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight + DOFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight + DOFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight * kBpc + SOFF); \
|
||||
align_buffer_page_end(src_uv, kSizeUV * kBpc + SOFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight + DOFF); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight + DOFF); \
|
||||
for (int i = 0; i < kWidth * kHeight; ++i) { \
|
||||
reinterpret_cast<uint16_t*>(src_y + SOFF)[i] = \
|
||||
(fastrand() & (((uint16_t)(-1)) << (16 - S_DEPTH))); \
|
||||
@ -2834,13 +2841,20 @@ TEST_F(LibYUVConvertTest, TestARGBToUVMatrixRow_Opt) {
|
||||
|
||||
int src_stride = (height == 1) ? 0 : kMaxWidth * 4;
|
||||
|
||||
ARGBToUVMatrixRow_C(&orig_argb_pixels[0], src_stride, &dest_u_c[0], &dest_v_c[0], width, &kArgbI601Constants);
|
||||
ARGBToUVMatrixRow_Any_NEON(&orig_argb_pixels[0], src_stride, &dest_u_opt[0], &dest_v_opt[0], width, &kArgbI601Constants);
|
||||
ARGBToUVMatrixRow_C(&orig_argb_pixels[0], src_stride, &dest_u_c[0],
|
||||
&dest_v_c[0], width, &kArgbI601Constants);
|
||||
ARGBToUVMatrixRow_Any_NEON(&orig_argb_pixels[0], src_stride,
|
||||
&dest_u_opt[0], &dest_v_opt[0], width,
|
||||
&kArgbI601Constants);
|
||||
|
||||
int half_width = (width + 1) / 2;
|
||||
for (int i = 0; i < half_width; ++i) {
|
||||
ASSERT_EQ(dest_u_c[i], dest_u_opt[i]) << "u mismatch at " << i << " width " << width << " height " << height;
|
||||
ASSERT_EQ(dest_v_c[i], dest_v_opt[i]) << "v mismatch at " << i << " width " << width << " height " << height;
|
||||
ASSERT_EQ(dest_u_c[i], dest_u_opt[i])
|
||||
<< "u mismatch at " << i << " width " << width << " height "
|
||||
<< height;
|
||||
ASSERT_EQ(dest_v_c[i], dest_v_opt[i])
|
||||
<< "v mismatch at " << i << " width " << width << " height "
|
||||
<< height;
|
||||
}
|
||||
}
|
||||
}
|
||||
@ -2903,13 +2917,12 @@ TEST_F(LibYUVConvertTest, TestI400LargeSize) {
|
||||
free_aligned_buffer_page_end(dest_argb);
|
||||
free_aligned_buffer_page_end(orig_i400);
|
||||
}
|
||||
#endif // DISABLE_SLOW_TESTS
|
||||
#endif // DISABLE_SLOW_TESTS
|
||||
#endif // !defined(DISABLE_SLOW_TESTS) && \
|
||||
// (defined(__x86_64__) || defined(_M_X64) || defined(__aarch64__))
|
||||
|
||||
#endif // !defined(LEAN_TESTS)
|
||||
|
||||
|
||||
#define TESTATOBPI(FMT_A, TYPE_A, BPP_A, STRIDE_A, HEIGHT_A, FMT_B, SUBSAMP_X, \
|
||||
SUBSAMP_Y, W1280, N, NEG, OFF) \
|
||||
TEST_F(LibYUVConvertTest, FMT_A##To##FMT_B##N) { \
|
||||
@ -2922,17 +2935,17 @@ TEST_F(LibYUVConvertTest, TestI400LargeSize) {
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X) * 2; \
|
||||
const int kSizeUV = kStrideUV * SUBSAMPLE(kHeight, SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_argb, \
|
||||
kStrideA* kHeightA*(int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kStrideY* kHeight); \
|
||||
kStrideA * kHeightA * (int)sizeof(TYPE_A) + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kStrideY * kHeight); \
|
||||
align_buffer_page_end(dst_uv_c, kSizeUV); \
|
||||
align_buffer_page_end(dst_y_opt, kStrideY* kHeight); \
|
||||
align_buffer_page_end(dst_y_opt, kStrideY * kHeight); \
|
||||
align_buffer_page_end(dst_uv_opt, kSizeUV); \
|
||||
for (int i = 0; i < kStrideA * kHeightA * (int)sizeof(TYPE_A); ++i) { \
|
||||
src_argb[i + OFF] = (fastrand() & 0xff); \
|
||||
} \
|
||||
memset(dst_y_c, 1, kStrideY* kHeight); \
|
||||
memset(dst_y_c, 1, kStrideY * kHeight); \
|
||||
memset(dst_uv_c, 2, kSizeUV); \
|
||||
memset(dst_y_opt, 101, kStrideY* kHeight); \
|
||||
memset(dst_y_opt, 101, kStrideY * kHeight); \
|
||||
memset(dst_uv_opt, 102, kSizeUV); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_A##To##FMT_B((TYPE_A*)(src_argb + OFF), kStrideA, dst_y_c, kStrideY, \
|
||||
|
||||
@ -51,9 +51,9 @@ namespace libyuv {
|
||||
#define ABGRToABGR ARGBCopy
|
||||
|
||||
// subsample amount uses a divide.
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a)-1)) / (a))
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a) - 1)) / (a))
|
||||
|
||||
#define ALIGNINT(V, ALIGN) (((V) + (ALIGN)-1) / (ALIGN) * (ALIGN))
|
||||
#define ALIGNINT(V, ALIGN) (((V) + (ALIGN) - 1) / (ALIGN) * (ALIGN))
|
||||
|
||||
// Planar test
|
||||
|
||||
@ -78,17 +78,19 @@ namespace libyuv {
|
||||
const int kSrcHalfHeight = SUBSAMPLE(kHeight, SRC_SUBSAMP_Y); \
|
||||
const int kDstHalfWidth = SUBSAMPLE(kWidth, DST_SUBSAMP_X); \
|
||||
const int kDstHalfHeight = SUBSAMPLE(kHeight, DST_SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight* SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_u, \
|
||||
kSrcHalfWidth* kSrcHalfHeight* SRC_BPC + OFF); \
|
||||
kSrcHalfWidth * kSrcHalfHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_v, \
|
||||
kSrcHalfWidth* kSrcHalfHeight* SRC_BPC + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_u_c, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_v_c, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_u_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_v_opt, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
kSrcHalfWidth * kSrcHalfHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_u_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_v_c, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_u_opt, \
|
||||
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_v_opt, \
|
||||
kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
MemRandomize(src_y + OFF, kWidth * kHeight * SRC_BPC); \
|
||||
MemRandomize(src_u + OFF, kSrcHalfWidth * kSrcHalfHeight * SRC_BPC); \
|
||||
MemRandomize(src_v + OFF, kSrcHalfWidth * kSrcHalfHeight * SRC_BPC); \
|
||||
@ -102,12 +104,12 @@ namespace libyuv {
|
||||
src_u_p[i] = src_u_p[i] & ((1 << SRC_DEPTH) - 1); \
|
||||
src_v_p[i] = src_v_p[i] & ((1 << SRC_DEPTH) - 1); \
|
||||
} \
|
||||
memset(dst_y_c, 1, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_u_c, 2, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_v_c, 3, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_u_opt, 102, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_v_opt, 103, kDstHalfWidth* kDstHalfHeight* DST_BPC); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_u_c, 2, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
memset(dst_v_c, 3, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_u_opt, 102, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
memset(dst_v_opt, 103, kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
SRC_FMT_PLANAR##To##FMT_PLANAR( \
|
||||
src_y_p, kWidth, src_u_p, kSrcHalfWidth, src_v_p, kSrcHalfWidth, \
|
||||
@ -212,15 +214,15 @@ TESTPLANARTOP(I412, uint16_t, 2, 1, 1, I444, uint8_t, 1, 1, 1, 12)
|
||||
const int kHeight = benchmark_height_; \
|
||||
const int kSizeUV = \
|
||||
SUBSAMPLE(kWidth, SRC_SUBSAMP_X) * SUBSAMPLE(kHeight, SRC_SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(src_uv, \
|
||||
kSizeUV*((PIXEL_STRIDE == 3) ? 3 : 2) + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight); \
|
||||
kSizeUV * ((PIXEL_STRIDE == 3) ? 3 : 2) + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_u_c, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_v_c, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_u_opt, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_v_opt, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
@ -239,12 +241,12 @@ TESTPLANARTOP(I412, uint16_t, 2, 1, 1, I444, uint8_t, 1, 1, 1, 12)
|
||||
(fastrand() & 0xff); \
|
||||
} \
|
||||
} \
|
||||
memset(dst_y_c, 1, kWidth* kHeight); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight); \
|
||||
memset(dst_u_c, 2, \
|
||||
SUBSAMPLE(kWidth, SUBSAMP_X) * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_v_c, 3, \
|
||||
SUBSAMPLE(kWidth, SUBSAMP_X) * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight); \
|
||||
memset(dst_u_opt, 102, \
|
||||
SUBSAMPLE(kWidth, SUBSAMP_X) * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_v_opt, 103, \
|
||||
@ -359,17 +361,17 @@ static int I400ToNV21(const uint8_t* src_y,
|
||||
const int kSrcHalfHeight = SUBSAMPLE(kHeight, SRC_SUBSAMP_Y); \
|
||||
const int kDstHalfWidth = SUBSAMPLE(kWidth, DST_SUBSAMP_X); \
|
||||
const int kDstHalfHeight = SUBSAMPLE(kHeight, DST_SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight* SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_u, \
|
||||
kSrcHalfWidth* kSrcHalfHeight* SRC_BPC + OFF); \
|
||||
kSrcHalfWidth * kSrcHalfHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_v, \
|
||||
kSrcHalfWidth* kSrcHalfHeight* SRC_BPC + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight* DST_BPC); \
|
||||
kSrcHalfWidth * kSrcHalfHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_uv_c, \
|
||||
kDstHalfWidth* kDstHalfHeight* DST_BPC * 2); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight* DST_BPC); \
|
||||
kDstHalfWidth * kDstHalfHeight * DST_BPC * 2); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_uv_opt, \
|
||||
kDstHalfWidth* kDstHalfHeight* DST_BPC * 2); \
|
||||
kDstHalfWidth * kDstHalfHeight * DST_BPC * 2); \
|
||||
MemRandomize(src_y + OFF, kWidth * kHeight * SRC_BPC); \
|
||||
MemRandomize(src_u + OFF, kSrcHalfWidth * kSrcHalfHeight * SRC_BPC); \
|
||||
MemRandomize(src_v + OFF, kSrcHalfWidth * kSrcHalfHeight * SRC_BPC); \
|
||||
@ -383,10 +385,10 @@ static int I400ToNV21(const uint8_t* src_y,
|
||||
src_u_p[i] = src_u_p[i] & ((1 << SRC_DEPTH) - 1); \
|
||||
src_v_p[i] = src_v_p[i] & ((1 << SRC_DEPTH) - 1); \
|
||||
} \
|
||||
memset(dst_y_c, 1, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_uv_c, 2, kDstHalfWidth* kDstHalfHeight* DST_BPC * 2); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_uv_opt, 102, kDstHalfWidth* kDstHalfHeight* DST_BPC * 2); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_uv_c, 2, kDstHalfWidth * kDstHalfHeight * DST_BPC * 2); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_uv_opt, 102, kDstHalfWidth * kDstHalfHeight * DST_BPC * 2); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
SRC_FMT_PLANAR##To##FMT_PLANAR(src_y_p, kWidth, src_u_p, kSrcHalfWidth, \
|
||||
src_v_p, kSrcHalfWidth, \
|
||||
@ -478,14 +480,15 @@ TESTPLANARTOBP(I212, uint16_t, 2, 2, 1, P212, uint16_t, 2, 2, 1, 12)
|
||||
(kHeight + (TILE_HEIGHT - 1)) & ~(TILE_HEIGHT - 1); \
|
||||
const int kSrcHalfPaddedWidth = SUBSAMPLE(kPaddedWidth, SRC_SUBSAMP_X); \
|
||||
const int kSrcHalfPaddedHeight = SUBSAMPLE(kPaddedHeight, SRC_SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kPaddedWidth* kPaddedHeight* SRC_BPC + OFF); \
|
||||
align_buffer_page_end(src_y, \
|
||||
kPaddedWidth * kPaddedHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end( \
|
||||
src_uv, \
|
||||
2 * kSrcHalfPaddedWidth * kSrcHalfPaddedHeight * SRC_BPC + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_uv_c, \
|
||||
2 * kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight* DST_BPC); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight * DST_BPC); \
|
||||
align_buffer_page_end(dst_uv_opt, \
|
||||
2 * kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
SRC_T* src_y_p = reinterpret_cast<SRC_T*>(src_y + OFF); \
|
||||
@ -502,13 +505,13 @@ TESTPLANARTOBP(I212, uint16_t, 2, 2, 1, P212, uint16_t, 2, 2, 1, 12)
|
||||
src_uv_p[i] = \
|
||||
(fastrand() & (((SRC_T)(-1)) << ((8 * SRC_BPC) - SRC_DEPTH))); \
|
||||
} \
|
||||
memset(dst_y_c, 1, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_uv_c, 2, 2 * kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight* DST_BPC); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight * DST_BPC); \
|
||||
memset(dst_uv_opt, 102, 2 * kDstHalfWidth * kDstHalfHeight * DST_BPC); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
SRC_FMT_PLANAR##To##FMT_PLANAR( \
|
||||
src_y_p, kWidth* SRC_BPC / (int)sizeof(SRC_T), src_uv_p, \
|
||||
src_y_p, kWidth * SRC_BPC / (int)sizeof(SRC_T), src_uv_p, \
|
||||
2 * kSrcHalfWidth * SRC_BPC / (int)sizeof(SRC_T), \
|
||||
DOY ? reinterpret_cast<DST_T*>(dst_y_c) : NULL, kWidth, \
|
||||
reinterpret_cast<DST_T*>(dst_uv_c), 2 * kDstHalfWidth, kWidth, \
|
||||
@ -516,7 +519,7 @@ TESTPLANARTOBP(I212, uint16_t, 2, 2, 1, P212, uint16_t, 2, 2, 1, 12)
|
||||
MaskCpuFlags(benchmark_cpu_info_); \
|
||||
for (int i = 0; i < benchmark_iterations_; ++i) { \
|
||||
SRC_FMT_PLANAR##To##FMT_PLANAR( \
|
||||
src_y_p, kWidth* SRC_BPC / (int)sizeof(SRC_T), src_uv_p, \
|
||||
src_y_p, kWidth * SRC_BPC / (int)sizeof(SRC_T), src_uv_p, \
|
||||
2 * kSrcHalfWidth * SRC_BPC / (int)sizeof(SRC_T), \
|
||||
DOY ? reinterpret_cast<DST_T*>(dst_y_opt) : NULL, kWidth, \
|
||||
reinterpret_cast<DST_T*>(dst_uv_opt), 2 * kDstHalfWidth, kWidth, \
|
||||
@ -598,16 +601,16 @@ TESTBPTOBP(P010, uint16_t, 2, 2, 2, NV12, uint8_t, 1, 2, 2, 8, 1, 1)
|
||||
const int kHeight = ALIGNINT(benchmark_height_, YALIGN); \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kStride = (kStrideUV * SUBSAMP_X * 8 * BPP_A + 7) / 8; \
|
||||
align_buffer_page_end(src_argb, kStride* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight); \
|
||||
align_buffer_page_end(src_argb, kStride * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv_c, \
|
||||
kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv_opt, \
|
||||
kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_y_c, 1, kWidth* kHeight); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight); \
|
||||
memset(dst_uv_c, 2, kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight); \
|
||||
memset(dst_uv_opt, 102, kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
for (int i = 0; i < kHeight; ++i) \
|
||||
for (int j = 0; j < kStride; ++j) \
|
||||
@ -691,20 +694,20 @@ TESTATOPLANAR(YUY2, 2, 1, I422, 2, 1)
|
||||
const int kHeight = ALIGNINT(benchmark_height_, YALIGN); \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
const int kStride = (kStrideUV * SUBSAMP_X * 8 * BPP_A + 7) / 8; \
|
||||
align_buffer_page_end(src_argb, kStride* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_a_c, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight); \
|
||||
align_buffer_page_end(src_argb, kStride * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_a_c, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv_c, \
|
||||
kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_a_opt, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_a_opt, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv_opt, \
|
||||
kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_a_c, 1, kWidth* kHeight); \
|
||||
memset(dst_y_c, 2, kWidth* kHeight); \
|
||||
memset(dst_a_c, 1, kWidth * kHeight); \
|
||||
memset(dst_y_c, 2, kWidth * kHeight); \
|
||||
memset(dst_uv_c, 3, kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_a_opt, 101, kWidth* kHeight); \
|
||||
memset(dst_y_opt, 102, kWidth* kHeight); \
|
||||
memset(dst_a_opt, 101, kWidth * kHeight); \
|
||||
memset(dst_y_opt, 102, kWidth * kHeight); \
|
||||
memset(dst_uv_opt, 103, kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
for (int i = 0; i < kHeight; ++i) \
|
||||
for (int j = 0; j < kStride; ++j) \
|
||||
@ -765,19 +768,19 @@ TESTATOPLANARA(ARGB, 4, 1, I420Alpha, 2, 2)
|
||||
const int kHeight = benchmark_height_; \
|
||||
const int kStride = SUBSAMPLE(kWidth, SUB_A) * BPP_A; \
|
||||
const int kStrideUV = SUBSAMPLE(kWidth, SUBSAMP_X); \
|
||||
align_buffer_page_end(src_argb, kStride* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight); \
|
||||
align_buffer_page_end(src_argb, kStride * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv_c, \
|
||||
kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv_opt, \
|
||||
kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
for (int i = 0; i < kHeight; ++i) \
|
||||
for (int j = 0; j < kStride; ++j) \
|
||||
src_argb[(i * kStride) + j + OFF] = (fastrand() & 0xff); \
|
||||
memset(dst_y_c, 1, kWidth* kHeight); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight); \
|
||||
memset(dst_uv_c, 2, kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight); \
|
||||
memset(dst_uv_opt, 102, kStrideUV * 2 * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
MaskCpuFlags(disable_cpu_flags_); \
|
||||
FMT_A##To##FMT_PLANAR(src_argb + OFF, kStride, dst_y_c, kWidth, dst_uv_c, \
|
||||
@ -1950,17 +1953,17 @@ TEST_F(LibYUVConvertTest, I420CropOddY) {
|
||||
const int kHeight = benchmark_height_; \
|
||||
\
|
||||
align_buffer_page_end(orig_uyvy, 4 * SUBSAMPLE(kWidth, 2) * kHeight); \
|
||||
align_buffer_page_end(orig_y, kWidth* kHeight); \
|
||||
align_buffer_page_end(orig_y, kWidth * kHeight); \
|
||||
align_buffer_page_end(orig_u, \
|
||||
SUBSAMPLE(kWidth, 2) * SUBSAMPLE(kHeight, 2)); \
|
||||
align_buffer_page_end(orig_v, \
|
||||
SUBSAMPLE(kWidth, 2) * SUBSAMPLE(kHeight, 2)); \
|
||||
\
|
||||
align_buffer_page_end(dst_y_orig, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y_orig, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv_orig, \
|
||||
2 * SUBSAMPLE(kWidth, 2) * SUBSAMPLE(kHeight, 2)); \
|
||||
\
|
||||
align_buffer_page_end(dst_y, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_uv, \
|
||||
2 * SUBSAMPLE(kWidth, 2) * SUBSAMPLE(kHeight, 2)); \
|
||||
\
|
||||
@ -2287,12 +2290,13 @@ TEST_F(LibYUVConvertTest, TestARGBToI420Matrix) {
|
||||
dst_v, kWidth / 2, &kArgbU2020Constants, kWidth, kHeight);
|
||||
|
||||
// Reference BT.709 (limited range)
|
||||
// Y = round(0.2126 * 219 / 255 * R + 0.7152 * 219 / 255 * G + 0.0722 * 219 / 255 * B + 16)
|
||||
// Y = round(0.1826 * R + 0.6142 * G + 0.0620 * B + 16)
|
||||
// 47 * 255 + 157 * 255 + 16 * 255 + 4224 = 11985 + 40035 + 4080 + 4224 = 60324
|
||||
// 60324 / 256 = 235.64 -> 235. Correct.
|
||||
// Y = round(0.2126 * 219 / 255 * R + 0.7152 * 219 / 255 * G + 0.0722 * 219 /
|
||||
// 255 * B + 16) Y = round(0.1826 * R + 0.6142 * G + 0.0620 * B + 16) 47 * 255
|
||||
// + 157 * 255 + 16 * 255 + 4224 = 11985 + 40035 + 4080 + 4224 = 60324 60324 /
|
||||
// 256 = 235.64 -> 235. Correct.
|
||||
|
||||
for (int i = 0; i < kWidth * kHeight * 4; ++i) src_argb[i] = 255;
|
||||
for (int i = 0; i < kWidth * kHeight * 4; ++i)
|
||||
src_argb[i] = 255;
|
||||
ARGBToI420Matrix(src_argb, kWidth * 4, dst_y, kWidth, dst_u, kWidth / 2,
|
||||
dst_v, kWidth / 2, &kArgbH709Constants, kWidth, kHeight);
|
||||
ASSERT_EQ(dst_y[0], 235);
|
||||
@ -2423,6 +2427,132 @@ TEST_F(LibYUVConvertTest, TestARGBToI444Matrix) {
|
||||
free_aligned_buffer_page_end(ref_v);
|
||||
}
|
||||
|
||||
template <typename ConvertToYUV, typename ConvertToARGB>
|
||||
static void TestRGBToI420(ConvertToYUV convert_to_yuv,
|
||||
ConvertToARGB convert_to_argb,
|
||||
int width,
|
||||
int height,
|
||||
int disable_cpu_flags,
|
||||
int benchmark_cpu_info) {
|
||||
align_buffer_page_end(src_rgb, width * height * 4);
|
||||
align_buffer_page_end(dst_y, width * height);
|
||||
align_buffer_page_end(dst_u, (width + 1) / 2 * (height + 1) / 2);
|
||||
align_buffer_page_end(dst_v, (width + 1) / 2 * (height + 1) / 2);
|
||||
|
||||
align_buffer_page_end(tmp_argb, width * height * 4);
|
||||
align_buffer_page_end(ref_y, width * height);
|
||||
align_buffer_page_end(ref_u, (width + 1) / 2 * (height + 1) / 2);
|
||||
align_buffer_page_end(ref_v, (width + 1) / 2 * (height + 1) / 2);
|
||||
|
||||
MemRandomize(src_rgb, width * height * 4);
|
||||
|
||||
{
|
||||
SCOPED_TRACE("C_Version");
|
||||
MaskCpuFlags(disable_cpu_flags);
|
||||
|
||||
// Clear buffers
|
||||
memset(dst_y, 0, width * height);
|
||||
memset(dst_u, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(dst_v, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(ref_y, 0, width * height);
|
||||
memset(ref_u, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(ref_v, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(tmp_argb, 0, width * height * 4);
|
||||
|
||||
int r1 =
|
||||
convert_to_yuv(src_rgb, width * 4, dst_y, width, dst_u, (width + 1) / 2,
|
||||
dst_v, (width + 1) / 2, width, height);
|
||||
ASSERT_EQ(r1, 0);
|
||||
|
||||
int r2 =
|
||||
convert_to_argb(src_rgb, width * 4, tmp_argb, width * 4, width, height);
|
||||
ASSERT_EQ(r2, 0);
|
||||
|
||||
int r3 = ARGBToI420(tmp_argb, width * 4, ref_y, width, ref_u,
|
||||
(width + 1) / 2, ref_v, (width + 1) / 2, width, height);
|
||||
ASSERT_EQ(r3, 0);
|
||||
|
||||
for (int i = 0; i < width * height; ++i) {
|
||||
ASSERT_EQ(dst_y[i], ref_y[i]);
|
||||
}
|
||||
for (int i = 0; i < (width + 1) / 2 * (height + 1) / 2; ++i) {
|
||||
ASSERT_EQ(dst_u[i], ref_u[i]);
|
||||
ASSERT_EQ(dst_v[i], ref_v[i]);
|
||||
}
|
||||
}
|
||||
|
||||
{
|
||||
SCOPED_TRACE("SIMD_Version");
|
||||
MaskCpuFlags(benchmark_cpu_info);
|
||||
|
||||
// Clear buffers
|
||||
memset(dst_y, 0, width * height);
|
||||
memset(dst_u, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(dst_v, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(ref_y, 0, width * height);
|
||||
memset(ref_u, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(ref_v, 0, (width + 1) / 2 * (height + 1) / 2);
|
||||
memset(tmp_argb, 0, width * height * 4);
|
||||
|
||||
int r1 =
|
||||
convert_to_yuv(src_rgb, width * 4, dst_y, width, dst_u, (width + 1) / 2,
|
||||
dst_v, (width + 1) / 2, width, height);
|
||||
ASSERT_EQ(r1, 0);
|
||||
|
||||
int r2 =
|
||||
convert_to_argb(src_rgb, width * 4, tmp_argb, width * 4, width, height);
|
||||
ASSERT_EQ(r2, 0);
|
||||
|
||||
int r3 = ARGBToI420(tmp_argb, width * 4, ref_y, width, ref_u,
|
||||
(width + 1) / 2, ref_v, (width + 1) / 2, width, height);
|
||||
ASSERT_EQ(r3, 0);
|
||||
|
||||
for (int i = 0; i < width * height; ++i) {
|
||||
ASSERT_EQ(dst_y[i], ref_y[i]);
|
||||
}
|
||||
for (int i = 0; i < (width + 1) / 2 * (height + 1) / 2; ++i) {
|
||||
ASSERT_EQ(dst_u[i], ref_u[i]);
|
||||
ASSERT_EQ(dst_v[i], ref_v[i]);
|
||||
}
|
||||
}
|
||||
|
||||
free_aligned_buffer_page_end(src_rgb);
|
||||
free_aligned_buffer_page_end(dst_y);
|
||||
free_aligned_buffer_page_end(dst_u);
|
||||
free_aligned_buffer_page_end(dst_v);
|
||||
free_aligned_buffer_page_end(tmp_argb);
|
||||
free_aligned_buffer_page_end(ref_y);
|
||||
free_aligned_buffer_page_end(ref_u);
|
||||
free_aligned_buffer_page_end(ref_v);
|
||||
}
|
||||
|
||||
TEST_F(LibYUVConvertTest, BGRAToI420_Check) {
|
||||
TestRGBToI420(BGRAToI420, BGRAToARGB, 16, 16, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
TestRGBToI420(BGRAToI420, BGRAToARGB, 17, 17, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
TestRGBToI420(BGRAToI420, BGRAToARGB, 1280, 720, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
}
|
||||
|
||||
TEST_F(LibYUVConvertTest, RGBAToI420_Check) {
|
||||
TestRGBToI420(RGBAToI420, RGBAToARGB, 16, 16, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
TestRGBToI420(RGBAToI420, RGBAToARGB, 17, 17, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
TestRGBToI420(RGBAToI420, RGBAToARGB, 1280, 720, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
}
|
||||
|
||||
TEST_F(LibYUVConvertTest, ABGRToI420_Check) {
|
||||
TestRGBToI420(ABGRToI420, ABGRToARGB, 16, 16, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
TestRGBToI420(ABGRToI420, ABGRToARGB, 17, 17, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
TestRGBToI420(ABGRToI420, ABGRToARGB, 1280, 720, disable_cpu_flags_,
|
||||
benchmark_cpu_info_);
|
||||
}
|
||||
|
||||
#endif // !defined(LEAN_TESTS)
|
||||
|
||||
} // namespace libyuv
|
||||
|
||||
@ -1212,10 +1212,10 @@ TEST_F(LibYUVPlanarTest, TestInterpolatePlane_16) {
|
||||
(kWidth * BPP_A + STRIDE_A - 1) / STRIDE_A * STRIDE_A; \
|
||||
const int kStrideB = \
|
||||
(kWidth * BPP_B + STRIDE_B - 1) / STRIDE_B * STRIDE_B; \
|
||||
align_buffer_page_end(src_argb_a, kStrideA* kHeight + OFF); \
|
||||
align_buffer_page_end(src_argb_b, kStrideA* kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB* kHeight); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB* kHeight); \
|
||||
align_buffer_page_end(src_argb_a, kStrideA * kHeight + OFF); \
|
||||
align_buffer_page_end(src_argb_b, kStrideA * kHeight + OFF); \
|
||||
align_buffer_page_end(dst_argb_c, kStrideB * kHeight); \
|
||||
align_buffer_page_end(dst_argb_opt, kStrideB * kHeight); \
|
||||
for (int i = 0; i < kStrideA * kHeight; ++i) { \
|
||||
src_argb_a[i + OFF] = (fastrand() & 0xff); \
|
||||
src_argb_b[i + OFF] = (fastrand() & 0xff); \
|
||||
@ -1418,7 +1418,7 @@ TEST_F(LibYUVPlanarTest, BlendPlane_Invert) {
|
||||
disable_cpu_flags_, benchmark_cpu_info_, -1, 1);
|
||||
}
|
||||
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a)-1)) / (a))
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a) - 1)) / (a))
|
||||
|
||||
static void TestI420Blend(int width,
|
||||
int height,
|
||||
|
||||
@ -20,7 +20,7 @@
|
||||
|
||||
namespace libyuv {
|
||||
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a)-1)) / (a))
|
||||
#define SUBSAMPLE(v, a) ((((v) + (a) - 1)) / (a))
|
||||
|
||||
static void I420TestRotate(int src_width,
|
||||
int src_height,
|
||||
@ -495,15 +495,15 @@ TEST_F(LibYUVRotateTest, NV12Rotate270_Invert) {
|
||||
const int kHeight = benchmark_height_; \
|
||||
const int kSizeUV = \
|
||||
SUBSAMPLE(kWidth, SRC_SUBSAMP_X) * SUBSAMPLE(kHeight, SRC_SUBSAMP_Y); \
|
||||
align_buffer_page_end(src_y, kWidth* kHeight + OFF); \
|
||||
align_buffer_page_end(src_y, kWidth * kHeight + OFF); \
|
||||
align_buffer_page_end(src_uv, \
|
||||
kSizeUV*((PIXEL_STRIDE == 3) ? 3 : 2) + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth* kHeight); \
|
||||
kSizeUV * ((PIXEL_STRIDE == 3) ? 3 : 2) + OFF); \
|
||||
align_buffer_page_end(dst_y_c, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_u_c, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_v_c, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth* kHeight); \
|
||||
align_buffer_page_end(dst_y_opt, kWidth * kHeight); \
|
||||
align_buffer_page_end(dst_u_opt, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
align_buffer_page_end(dst_v_opt, SUBSAMPLE(kWidth, SUBSAMP_X) * \
|
||||
@ -522,12 +522,12 @@ TEST_F(LibYUVRotateTest, NV12Rotate270_Invert) {
|
||||
(fastrand() & 0xff); \
|
||||
} \
|
||||
} \
|
||||
memset(dst_y_c, 1, kWidth* kHeight); \
|
||||
memset(dst_y_c, 1, kWidth * kHeight); \
|
||||
memset(dst_u_c, 2, \
|
||||
SUBSAMPLE(kWidth, SUBSAMP_X) * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_v_c, 3, \
|
||||
SUBSAMPLE(kWidth, SUBSAMP_X) * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_y_opt, 101, kWidth* kHeight); \
|
||||
memset(dst_y_opt, 101, kWidth * kHeight); \
|
||||
memset(dst_u_opt, 102, \
|
||||
SUBSAMPLE(kWidth, SUBSAMP_X) * SUBSAMPLE(kHeight, SUBSAMP_Y)); \
|
||||
memset(dst_v_opt, 103, \
|
||||
|
||||
@ -431,13 +431,13 @@ static void FillRamp(uint8_t* buf,
|
||||
|
||||
// Test scaling with C vs Opt and return maximum pixel difference. 0 = exact.
|
||||
static void YUVToARGBTestFilter(int src_width,
|
||||
int src_height,
|
||||
int dst_width,
|
||||
int dst_height,
|
||||
FilterMode f,
|
||||
int benchmark_iterations,
|
||||
int error_threshold,
|
||||
int* max_diff_out) {
|
||||
int src_height,
|
||||
int dst_width,
|
||||
int dst_height,
|
||||
FilterMode f,
|
||||
int benchmark_iterations,
|
||||
int error_threshold,
|
||||
int* max_diff_out) {
|
||||
int64_t src_y_plane_size = Abs(src_width) * Abs(src_height);
|
||||
int64_t src_uv_plane_size =
|
||||
((Abs(src_width) + 1) / 2) * ((Abs(src_height) + 1) / 2);
|
||||
@ -448,8 +448,8 @@ static void YUVToARGBTestFilter(int src_width,
|
||||
align_buffer_page_end(src_u, src_uv_plane_size);
|
||||
align_buffer_page_end(src_v, src_uv_plane_size);
|
||||
|
||||
int64_t dst_argb_plane_size = (dst_width) * (dst_height)*4LL;
|
||||
int dst_stride_argb = (dst_width)*4;
|
||||
int64_t dst_argb_plane_size = (dst_width) * (dst_height) * 4LL;
|
||||
int dst_stride_argb = (dst_width) * 4;
|
||||
align_buffer_page_end(dst_argb_c, dst_argb_plane_size);
|
||||
align_buffer_page_end(dst_argb_opt, dst_argb_plane_size);
|
||||
if (!dst_argb_c || !dst_argb_opt || !src_y || !src_u || !src_v) {
|
||||
@ -516,10 +516,10 @@ TEST_F(LibYUVScaleTest, YUVToRGBScaleUp) {
|
||||
|
||||
TEST_F(LibYUVScaleTest, YUVToRGBScaleDown) {
|
||||
int diff = 0;
|
||||
YUVToARGBTestFilter(
|
||||
benchmark_width_ * 3 / 2, benchmark_height_ * 3 / 2, benchmark_width_,
|
||||
benchmark_height_, libyuv::kFilterBilinear, benchmark_iterations_, 10,
|
||||
&diff);
|
||||
YUVToARGBTestFilter(benchmark_width_ * 3 / 2, benchmark_height_ * 3 / 2,
|
||||
benchmark_width_, benchmark_height_,
|
||||
libyuv::kFilterBilinear, benchmark_iterations_, 10,
|
||||
&diff);
|
||||
ASSERT_LE(diff, 10);
|
||||
}
|
||||
|
||||
|
||||
@ -757,7 +757,7 @@ static int NV12TestFilter(int src_width,
|
||||
int src_height_uv = (Abs(src_height) + 1) >> 1;
|
||||
|
||||
int64_t src_y_plane_size = (Abs(src_width)) * (Abs(src_height));
|
||||
int64_t src_uv_plane_size = (src_width_uv) * (src_height_uv)*2;
|
||||
int64_t src_uv_plane_size = (src_width_uv) * (src_height_uv) * 2;
|
||||
|
||||
int src_stride_y = Abs(src_width);
|
||||
int src_stride_uv = src_width_uv * 2;
|
||||
@ -775,7 +775,7 @@ static int NV12TestFilter(int src_width,
|
||||
int dst_height_uv = (dst_height + 1) >> 1;
|
||||
|
||||
int64_t dst_y_plane_size = (dst_width) * (dst_height);
|
||||
int64_t dst_uv_plane_size = (dst_width_uv) * (dst_height_uv)*2;
|
||||
int64_t dst_uv_plane_size = (dst_width_uv) * (dst_height_uv) * 2;
|
||||
|
||||
int dst_stride_y = dst_width;
|
||||
int dst_stride_uv = dst_width_uv * 2;
|
||||
|
||||
@ -85,10 +85,11 @@ static inline bool SizeValid(int src_width,
|
||||
#define align_buffer_page_end_16(var, size) \
|
||||
uint16_t* var = NULL; \
|
||||
uint8_t* var##_mem = \
|
||||
reinterpret_cast<uint8_t*>(malloc(((size)*2 + 4095 + 63) & ~4095)); \
|
||||
reinterpret_cast<uint8_t*>(malloc(((size) * 2 + 4095 + 63) & ~4095)); \
|
||||
if (var##_mem) \
|
||||
var = reinterpret_cast<uint16_t*>( \
|
||||
(intptr_t)(var##_mem + (((size)*2 + 4095 + 63) & ~4095) - (size)*2) & \
|
||||
(intptr_t)(var##_mem + (((size) * 2 + 4095 + 63) & ~4095) - \
|
||||
(size) * 2) & \
|
||||
~63)
|
||||
|
||||
#define free_aligned_buffer_page_end_16(var) \
|
||||
|
||||
34
util/ssim.cc
34
util/ssim.cc
@ -244,23 +244,23 @@ double GetSSIMFullKernel(const uint8_t* org,
|
||||
|
||||
// Read 8 pixels at line #L, and convert to 16bit, perform weighting
|
||||
// and acccumulate.
|
||||
#define LOAD_LINE_PAIR(L, WEIGHT) \
|
||||
do { \
|
||||
const __m128i v0 = \
|
||||
_mm_loadl_epi64(reinterpret_cast<const __m128i*>(org + (L)*stride)); \
|
||||
const __m128i v1 = \
|
||||
_mm_loadl_epi64(reinterpret_cast<const __m128i*>(rec + (L)*stride)); \
|
||||
const __m128i w0 = _mm_unpacklo_epi8(v0, zero); \
|
||||
const __m128i w1 = _mm_unpacklo_epi8(v1, zero); \
|
||||
const __m128i ww0 = _mm_mullo_epi16(w0, (WEIGHT).values_.m_); \
|
||||
const __m128i ww1 = _mm_mullo_epi16(w1, (WEIGHT).values_.m_); \
|
||||
x = _mm_add_epi32(x, _mm_unpacklo_epi16(ww0, zero)); \
|
||||
y = _mm_add_epi32(y, _mm_unpacklo_epi16(ww1, zero)); \
|
||||
x = _mm_add_epi32(x, _mm_unpackhi_epi16(ww0, zero)); \
|
||||
y = _mm_add_epi32(y, _mm_unpackhi_epi16(ww1, zero)); \
|
||||
xx = _mm_add_epi32(xx, _mm_madd_epi16(ww0, w0)); \
|
||||
xy = _mm_add_epi32(xy, _mm_madd_epi16(ww0, w1)); \
|
||||
yy = _mm_add_epi32(yy, _mm_madd_epi16(ww1, w1)); \
|
||||
#define LOAD_LINE_PAIR(L, WEIGHT) \
|
||||
do { \
|
||||
const __m128i v0 = \
|
||||
_mm_loadl_epi64(reinterpret_cast<const __m128i*>(org + (L) * stride)); \
|
||||
const __m128i v1 = \
|
||||
_mm_loadl_epi64(reinterpret_cast<const __m128i*>(rec + (L) * stride)); \
|
||||
const __m128i w0 = _mm_unpacklo_epi8(v0, zero); \
|
||||
const __m128i w1 = _mm_unpacklo_epi8(v1, zero); \
|
||||
const __m128i ww0 = _mm_mullo_epi16(w0, (WEIGHT).values_.m_); \
|
||||
const __m128i ww1 = _mm_mullo_epi16(w1, (WEIGHT).values_.m_); \
|
||||
x = _mm_add_epi32(x, _mm_unpacklo_epi16(ww0, zero)); \
|
||||
y = _mm_add_epi32(y, _mm_unpacklo_epi16(ww1, zero)); \
|
||||
x = _mm_add_epi32(x, _mm_unpackhi_epi16(ww0, zero)); \
|
||||
y = _mm_add_epi32(y, _mm_unpackhi_epi16(ww1, zero)); \
|
||||
xx = _mm_add_epi32(xx, _mm_madd_epi16(ww0, w0)); \
|
||||
xy = _mm_add_epi32(xy, _mm_madd_epi16(ww0, w1)); \
|
||||
yy = _mm_add_epi32(yy, _mm_madd_epi16(ww1, w1)); \
|
||||
} while (0)
|
||||
|
||||
#define ADD_AND_STORE_FOUR_EPI32(M, OUT) \
|
||||
|
||||
Loading…
x
Reference in New Issue
Block a user