Function tracing_core::stdlib::arch::x86_64::_tile_cmmrlfp16ps

source ·
pub unsafe fn _tile_cmmrlfp16ps<const DST: i32, const A: i32, const B: i32>()
🔬This is a nightly-only experimental API. (x86_amx_intrinsics #126622)
Available on x86-64 only.
Expand description

Perform matrix multiplication of two tiles containing complex elements and accumulate the results into a packed single precision tile. Each dword element in input tiles a and b is interpreted as a complex number with FP16 real part and FP16 imaginary part. Calculates the real part of the result. For each possible combination of (row of a, column of b), it performs a set of multiplication and accumulations on all corresponding complex numbers (one from a and one from b). The real part of the a element is multiplied with the real part of the corresponding b element, and the negated imaginary part of the a element is multiplied with the imaginary part of the corresponding b elements. The two accumulated results are added, and then accumulated into the corresponding row and column of dst.

Intel’s documentation