This is a โsensibleโ movemask implementation where each bit represents
whether the most significant bit is set in each corresponding lane of a
vector. This is used on x86-64 and wasm, but such a mask is more expensive
to get on aarch64 so we use something a little different.