Expand description
A low precision raster pipeline implementation.
A lowp pipeline uses u16 instead of f32 for math. Because of that, it doesn’t implement stages that require high precision. The pipeline compiler will automatically decide which one to use.
Skia uses u16x8 (128bit) types for a generic CPU and u16x16 (256bit) for modern x86 CPUs. But instead of explicit SIMD instructions, it mainly relies on clang’s vector extensions. And since they are unavailable in Rust, we have to do everything manually.
According to our benchmarks, a SIMD-accelerated u16x8 in Rust is almost 2x slower than in Skia. Not sure why. For example, there are no div instruction for u16x8, so we have to use a basic scalar version. Which means unnecessary load/store. No idea what clang does in this case. Surprisingly, a SIMD-accelerated u16x8 is even slower than a scalar one. Again, not sure why.
Therefore we are using scalar u16x16 by default and relying on rustc/llvm auto vectorization instead.
When targeting a generic CPU, we’re just 5-10% slower than Skia. While u16x8 is 30-40% slower.
And while -C target-cpu=haswell
boosts our performance by around 25%,
we are still 40-60% behind Skia built for Haswell.
On ARM AArch64 the story is different and explicit SIMD make our code up to 2-3x faster.
Macros§
- blend_
fn 🔒
Structs§
Constants§
Functions§
- clear 🔒
- darken 🔒
- div255 🔒
- gradient 🔒
- inv 🔒
- join 🔒
- lerp 🔒
- lerp_u8 🔒
- lighten 🔒
- load_8 🔒
- mad 🔒
- mask_u8 🔒
- modulate 🔒
- multiply 🔒
- overlay 🔒
- pad_x1 🔒
- plus 🔒
- scale_
u8 🔒 - screen 🔒
- split 🔒
- xor 🔒