Module avx2

Expand description

AVX optimised IDCT.

Okay not thaat optimised.

§The implementation

The implementation is neatly broken down into two operations.

Test for zeroes

There is a shortcut method for idct where when all AC values are zero, we can get the answer really quickly. by scaling the 1/8th of the DCT coefficient of the block to the whole block and level shifting.

If above fails, we proceed to carry out IDCT as a two pass one dimensional algorithm. IT does two whole scans where it carries out IDCT on all items After each successive scan, data is transposed in register(thank you x86 SIMD powers). and the second pass is carried out.

The code is not super optimized, it produces bit identical results with scalar code hence it’s mm256_add_epi16 and it also has the advantage of making this implementation easy to maintain.

Constants§

SCALE_BITS 🔒

Functions§

clamp_avx 🔒 ^⚠avx2
idct_avx2: SAFETY
idct_int_avx2_inner^⚠avx2
shuffle 🔒: A copy of _MM_SHUFFLE() that doesn’t require a nightly compiler

Module avx2

Module avx2 Copy item path

§The implementation

Constants§

Functions§

Module avx2