const fn montgomery_reduce(r: &[u64; 8]) -> [u64; 4]Expand description
Montgomery Reduction
The general algorithm is:
A <- input (2n b-limbs)
for i in 0..n {
    k <- A[i] p' mod b
    A <- A + k p b^i
}
A <- A / b^n
if A >= p {
    A <- A - p
}For secp256r1, we have the following simplifications:
- 
p'is 1, so our multiplicand is simply the first limb of the intermediate A. - 
The first limb of p is 2^64 - 1; multiplications by this limb can be simplified to a shift and subtraction:
a_i * (2^64 - 1) = a_i * 2^64 - a_i = (a_i << 64) - a_iHowever, because
p' = 1, the first limb of p is multiplied by limb i of the intermediate A and then immediately added to that same limb, so we simply initialize the carry to limb i of the intermediate. - 
The third limb of p is zero, so we can ignore any multiplications by it and just add the carry.
 
References:
- 
Handbook of Applied Cryptography, Chapter 14 Algorithm 14.32 http://cacr.uwaterloo.ca/hac/about/chap14.pdf
 - 
Efficient and Secure Elliptic Curve Cryptography Implementation of Curve P-256 Algorithm 7) Montgomery Word-by-Word Reduction https://csrc.nist.gov/csrc/media/events/workshop-on-elliptic-curve-cryptography-standards/documents/papers/session6-adalier-mehmet.pdf