Module lowp

Help

Expand description

A low precision raster pipeline implementation.

A lowp pipeline uses u16 instead of f32 for math. Because of that, it doesn’t implement stages that require high precision. The pipeline compiler will automatically decide which one to use.

Skia uses u16x8 (128bit) types for a generic CPU and u16x16 (256bit) for modern x86 CPUs. But instead of explicit SIMD instructions, it mainly relies on clang’s vector extensions. And since they are unavailable in Rust, we have to do everything manually.

According to our benchmarks, a SIMD-accelerated u16x8 in Rust is almost 2x slower than in Skia. Not sure why. For example, there are no div instruction for u16x8, so we have to use a basic scalar version. Which means unnecessary load/store. No idea what clang does in this case. Surprisingly, a SIMD-accelerated u16x8 is even slower than a scalar one. Again, not sure why.

Therefore we are using scalar u16x16 by default and relying on rustc/llvm auto vectorization instead. When targeting a generic CPU, we’re just 5-10% slower than Skia. While u16x8 is 30-40% slower. And while -C target-cpu=haswell boosts our performance by around 25%, we are still 40-60% behind Skia built for Haswell.

On ARM AArch64 the story is different and explicit SIMD make our code up to 2-3x faster.

Module lowp

Macros§

Structs§

Constants§

Functions§

Type Aliases§

Module lowpCopy item path

Macros§

Structs§

Constants§

Functions§

Type Aliases§

Module lowp