pub(crate) fn f_fmla(a: f64, b: f64, c: f64) -> f64
Optional FMA, if it is available hardware FMA will use, if not then just scalar c + a * b
c + a * b