Expand description
AVX optimised IDCT.
Okay not thaat optimised.
§The implementation
The implementation is neatly broken down into two operations.
- Test for zeroes
There is a shortcut method for idct where when all AC values are zero, we can get the answer really quickly. by scaling the 1/8th of the DCT coefficient of the block to the whole block and level shifting.
- If above fails, we proceed to carry out IDCT as a two pass one dimensional algorithm. IT does two whole scans where it carries out IDCT on all items After each successive scan, data is transposed in register(thank you x86 SIMD powers). and the second pass is carried out.
The code is not super optimized, it produces bit identical results with scalar code hence it’s
mm256_add_epi16
and it also has the advantage of making this implementation easy to maintain.
Constants§
Functions§
- clamp_
avx 🔒 ⚠avx2
- idct_
avx2 - SAFETY
- idct_
int_ ⚠avx2_ inner avx2
- shuffle 🔒
- A copy of
_MM_SHUFFLE()
that doesn’t require a nightly compiler