logo
pub fn compress256(state: &mut [u32; 8], block: &[u8; 64])
Expand description

Process a block with the SHA-256 algorithm. (See more…)

Internally, this uses functions which resemble the new Intel SHA instruction sets, and so it’s data locality properties may improve performance. However, to benefit the most from this implementation, replace these functions with x86 intrinsics to get a possible speed boost.

Implementation

The Sha256 algorithm is implemented with functions that resemble the new Intel SHA instruction set extensions. These intructions fall into two categories: message schedule calculation, and the message block 64-round digest calculation. The schedule-related instructions allow 4 rounds to be calculated as:

use std::simd::u32x4;
use self::crypto::sha2::{
    sha256msg1,
    sha256msg2,
    sha256load
};

fn schedule4_data(work: &mut [u32x4], w: &[u32]) {

    // this is to illustrate the data order
    work[0] = u32x4(w[3], w[2], w[1], w[0]);
    work[1] = u32x4(w[7], w[6], w[5], w[4]);
    work[2] = u32x4(w[11], w[10], w[9], w[8]);
    work[3] = u32x4(w[15], w[14], w[13], w[12]);
}

fn schedule4_work(work: &mut [u32x4], t: usize) {

    // this is the core expression
    work[t] = sha256msg2(sha256msg1(work[t - 4], work[t - 3]) +
                         sha256load(work[t - 2], work[t - 1]),
                         work[t - 1])
}

instead of 4 rounds of:

fn schedule_work(w: &mut [u32], t: usize) {
    w[t] = sigma1!(w[t - 2]) + w[t - 7] + sigma0!(w[t - 15]) + w[t - 16];
}

and the digest-related instructions allow 4 rounds to be calculated as:

use std::simd::u32x4;
use self::crypto::sha2::{K32X4,
    sha256rnds2,
    sha256swap
};

fn rounds4(state: &mut [u32; 8], work: &mut [u32x4], t: usize) {
    let [a, b, c, d, e, f, g, h]: [u32; 8] = *state;

    // this is to illustrate the data order
    let mut abef = u32x4(a, b, e, f);
    let mut cdgh = u32x4(c, d, g, h);
    let temp = K32X4[t] + work[t];

    // this is the core expression
    cdgh = sha256rnds2(cdgh, abef, temp);
    abef = sha256rnds2(abef, cdgh, sha256swap(temp));

    *state = [abef.0, abef.1, cdgh.0, cdgh.1,
              abef.2, abef.3, cdgh.2, cdgh.3];
}

instead of 4 rounds of:

fn round(state: &mut [u32; 8], w: &mut [u32], t: usize) {
    let [a, b, c, mut d, e, f, g, mut h]: [u32; 8] = *state;

    h += big_sigma1!(e) +   choose!(e, f, g) + K32[t] + w[t]; d += h;
    h += big_sigma0!(a) + majority!(a, b, c);

    *state = [h, a, b, c, d, e, f, g];
}

NOTE: It is important to note, however, that these instructions are not implemented by any CPU (at the time of this writing), and so they are emulated in this library until the instructions become more common, and gain support in LLVM (and GCC, etc.).