pub(crate) const fn div_255(val: u16) -> u16
Expand description
Perform an approximate division by 255.
There are three reasons for having this method.
- Divisions are slower than shifting + adding, and the compiler does not seem to replace divisions by 255 with an equivalent (this was verified by benchmarking; doing / 255 was significantly slower).
- Integer divisions are usually not available in SIMD, so this provides a good baseline implementation.
- There are two options for performing the division: One is to perform the division
in a way that completely preserves the rounding semantics of a integer division by
255. This could be achieved using the implementation
(val + 1 + (val >> 8)) >> 8
. The second approach (used here) has slightly different rounding behavior to a normal division by 255, but is much faster (see https://github.com/linebender/vello/issues/904) and therefore preferable for the high-performance pipeline.
Four properties worth mentioning:
- This actually calculates the ceiling of
val / 256
. - Within the allowed range for
val
, rounding errors do not appear for values divisible by 255, i.e. any calldiv_255(val * 255)
will always yieldval
. - If there is a discrepancy, this division will always yield a value 1 higher than the original.
- This holds for values of
val
up to and including65279
. You should not call this function with higher values.