Alternatives to RGB24 and AYUV for working with GPU.
BUG=libyuv:832
TESTED=out/Release/libyuv_unittest --gtest_filter=*NV21To???24* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=1000 --libyuv_flags=-1 --libyuv_cpu_info=-1
R=rrwinterton@gmail.com
Change-Id: I5559c63f4bd4c847492fcb1571f7b03c58146689
Reviewed-on: https://chromium-review.googlesource.com/c/libyuv/libyuv/+/1501735
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
This was changed in 21be9122aadf7824efe3fc19b2a09ff253a688e1.
Change-Id: I6c04dc92f673557e10c231bd090ec8aa88b6bee4
Reviewed-on: https://chromium-review.googlesource.com/1146183
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Currently, libyuv supports MIPS SIMD Arch(MSA),
but libyuv does not supports MultiMedia Instruction(MMI)(such as loongson3a platform).
In order to improve performance of libyuv on loongson3a platform,
this provides optimize 98 functions with mmi.
BUG=libyuv:804
Change-Id: I8947626009efad769b3103a867363ece25d79629
Reviewed-on: https://chromium-review.googlesource.com/1122064
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Mask was set to 32, but should have been 31.
BUG=libyuv:798
TESTED=try bots tested
Change-Id: I6120928873a4a2f1efef907d8e8296ca8c20bb03
Reviewed-on: https://chromium-review.googlesource.com/1054830
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Use VMBI instructions but on AVX2 registers to avoid clockrate change.
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: Id4f8ad1e0e142a380c8a46c5eab90ce145a10edd
Reviewed-on: https://chromium-review.googlesource.com/956609
Reviewed-by: richard winterton <rrwinterton@gmail.com>
AVX2 port of SSSE3 conversion to output 24 bit RGB
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I14f7815522d1b790ecd2bb39d9a3441e803b694a
Reviewed-on: https://chromium-review.googlesource.com/953303
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Use 2 step conversion for NV21ToRGB24 to leverage AVX2
low levels instead of C.
Was C
NV21ToRGB24_Opt (882 ms)
Now SSSE3
NV21ToRGB24_Opt (218 ms)
Bug: libyuv:778
Test: LibYUVConvertTest.NV21ToRGB24_Opt
Change-Id: I58faf766bbec4cc595aab2e217f6c874dd4b4363
Reviewed-on: https://chromium-review.googlesource.com/951629
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Each byte is converted to float (0.0 to 255.0) and then multiplied
by a scale parameter.
Bug: None
Test: arm 64 build passes.
Change-Id: I04736798540b8d985f60abdf0388e24a209d075b
Reviewed-on: https://chromium-review.googlesource.com/930226
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Ian Field <ianfield@google.com>
vpshufb is used to reverse R and B channels;
Code is otherwise the same as ARGBToAR30.
Bug: libyuv:751
Test: ABGRToAR30 unittest
Change-Id: I30e02925f5c729e4496c5963ba4ba4af16633b3b
Reviewed-on: https://chromium-review.googlesource.com/891807
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Switch YUV conversion macro to output 16 bits per channel.
STOREAR30 macro to output AR30.
[ RUN ] LibYUVConvertTest.TestH420ToARGB
uniques: B 220, G, 220, R 220
[ OK ] LibYUVConvertTest.TestH420ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToARGB
uniques: B 256, G, 256, R 256
[ OK ] LibYUVConvertTest.TestH010ToARGB (0 ms)
[ RUN ] LibYUVConvertTest.TestH010ToAR30
uniques: B 883, G, 883, R 883
[ OK ] LibYUVConvertTest.TestH010ToAR30 (0 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: I902b718e2c8b68ede69625ccafebc6519d5af70d
Reviewed-on: https://chromium-review.googlesource.com/869511
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Previously H010ToAR30 was done in a 3 step conversion:
H010ToH420, H420ToARGB, ARGBToAR30.
This CL merges the first 2 steps into H010ToARGB, to
improve performance.
Caveat - only 10 bit YUV is supported at this time.
Previously the low level code supported different numbers
of bits - 9, 10, 12 or 16.
Was 3 step conversion:
LibYUVConvertTest.H010ToAR30_Any (1263 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (951 ms)
LibYUVConvertTest.H010ToAR30_Invert (913 ms)
LibYUVConvertTest.H010ToAR30_Opt (901 ms)
Now 2 step conversion:
LibYUVConvertTest.H010ToAR30_Any (853 ms)
LibYUVConvertTest.H010ToAR30_Unaligned (811 ms)
LibYUVConvertTest.H010ToAR30_Invert (781 ms)
LibYUVConvertTest.H010ToAR30_Opt (755 ms)
Bug: libyuv:751
Test: LibYUVConvertTest.H010ToAR30_Opt
Change-Id: Ica7574040401cd57145a4827acdf3c0e58346a2a
Reviewed-on: https://chromium-review.googlesource.com/853288
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Miguel Casas <mcasas@chromium.org>
SSSE3 optimized 10 bit YUV conversion to ARGB in single step.
Bug: libyuv:751
Test: I010ToARGB
Change-Id: I234b2850e35992113ee6bd638732bafc7010a60d
Reviewed-on: https://chromium-review.googlesource.com/848238
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
LIBYUV_SSSE3_ONLY was for functions that have SSE2 and SSSE3 but are compiling for SSSE3, so SSE2 will never be used.
Remove the SSE2 implementation of ARGBSHUFFLEROW_SSE2 and rely on SSSE3.
Bug: libyuv: 769
Test: ~/intelsde/sde -p4 -- out/Release/libyuv_unittest --gtest_filter=LibYUVConvertTest.ARGBToABGR_Opt
Change-Id: I7443f4d8ee3c6f47edd2cf1d5a1eb0f8d7a1eeeb
Reviewed-on: https://chromium-review.googlesource.com/846541
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
Convert planar 8 bit formats to planar 16 bit formats.
Accepts a parameter that determines the number of bits.
Bug: libyuv:751
Test: Convert8To16 unittest
Change-Id: I8f6ffe64428ddf5769b87e0c069093a50a2541e9
Reviewed-on: https://chromium-review.googlesource.com/835410
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Bug: libyuv:765
Test: build for mips still passes
Change-Id: I99105ad3951d2210c0793e3b9241c178442fdc37
Reviewed-on: https://chromium-review.googlesource.com/826404
Reviewed-by: Weiyong Yao <braveyao@chromium.org>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
AR30 is optimized with 3 techniques
1. pmulhuw is used to replicate 8 bits to 10 bits.
2. Two channels are processed at a time. R and B, and A and G.
3. pshufb is used to shift and mask 2 channels of R and B
Bug: libyuv:751
Test: ARGBToAR30_Opt
Change-Id: I4e62d6caa4df7d0ae80395fa911d3c922b6b897b
Reviewed-on: https://chromium-review.googlesource.com/822520
Reviewed-by: richard winterton <rrwinterton@gmail.com>
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Port ARGBToAR30Row_AVX2 to ARGBToAR30Row_SSE2 using same instructions
but xmm registers and doing half as many pixels per loop.
Bug: libyuv:751
Test: LibYUVConvertTest.ARGBToAR30_Opt
Change-Id: Id644e54639133d1caf28ea3cd11ff6ab6891a673
Reviewed-on: https://chromium-review.googlesource.com/817918
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: richard winterton <rrwinterton@gmail.com>
H010ToAR30 uses Convert16To8Row_SSSE3 to convert 10 bit YUV to 8 bit.
Then standard YUV conversion can be used. This improves performance
on low end CPUs.
Future CL will by pass this conversion allowing for 10 bit YUV source,
but the function will be useful as a utility for YUV conversions.
Bug: libyuv:559, libyuv:751
Test: out/Release/libyuv_unittest --gtest_filter=*H010ToAR30* --libyuv_width=1280 --libyuv_height=720 --libyuv_repeat=999 --libyuv_flags=-1 --libyuv_cpu_info=-1
Change-Id: I9b3ef22d88a5fd861de4cf1900b4c6e8fd24d0af
Reviewed-on: https://chromium-review.googlesource.com/792334
Commit-Queue: Frank Barchard <fbarchard@chromium.org>
Reviewed-by: Frank Barchard <fbarchard@chromium.org>
I422ToRGB565Row_Any_AVX2 uses 2 step row conversion that calls
I422ToARGBRow_AVX2 and then ARGBToRGB565.
I422ToARGBRow_AVX2 expects multiple of 16 pixels.
Adjust the I422ToRGB565Row_Any_AVX2 to do multiple of 16 with AVX2
and then remainder in a buffer.
Bug: libyuv: 657
Test: out/Release/libyuv_unittest --gtest_filter=*Convert*I*To* --libyuv_width=1280 --libyuv_height=720
Change-Id: Ice1cb6c7ff6b2295513e8b4a9f77522e1c659810
Reviewed-on: https://chromium-review.googlesource.com/474232
Commit-Queue: Frank Barchard <fbarchard@google.com>
Reviewed-by: Henrik Kjellander <kjellander@chromium.org>
R=fbarchard@google.com
BUG=libyuv:634
Performance Gains :- (vs C vectorized)
I422ToARGBRow_MSA : ~1.6x
I422ToRGBARow_MSA : ~1.6x
I422ToARGBRow_Any_MSA : ~1.58x
I422ToRGBARow_Any_MSA : ~1.6x
Performance Gains :- (vs C non-vectorized)
I422ToARGBRow_MSA : ~7x
I422ToRGBARow_MSA : ~7x
I422ToARGBRow_Any_MSA : ~6.9x
I422ToRGBARow_Any_MSA : ~6.8x
Regarding performance measurement, We have created standalone tests which pass in row's data from a 1920x1080 filled buffer to both the C and MSA functions. And such N iterations are executed to get more accurate timings of C vs MSA.
Review URL: https://codereview.chromium.org/2430313005 .