Trait aho_corasick::packed::vector::FatVector

pub(crate) trait FatVector: Vector {
    type Half: Vector;

    // Required methods
    unsafe fn load_half_unaligned(data: *const u8) -> Self;
    unsafe fn half_shift_in_one_byte(self, vector2: Self) -> Self;
    unsafe fn half_shift_in_two_bytes(self, vector2: Self) -> Self;
    unsafe fn half_shift_in_three_bytes(self, vector2: Self) -> Self;
    unsafe fn swap_halves(self) -> Self;
    unsafe fn interleave_low_8bit_lanes(self, vector2: Self) -> Self;
    unsafe fn interleave_high_8bit_lanes(self, vector2: Self) -> Self;
    unsafe fn for_each_low_64bit_lane<T>(
        self,
        vector2: Self,
        f: impl FnMut(usize, u64) -> Option<T>,
    ) -> Option<T>;
}

Expand description

This trait extends the Vector trait with additional operations to support Fat Teddy.

Fat Teddy uses 16 buckets instead of 8, but reads half as many bytes (as the vector size) instead of the full size of a vector per iteration. For example, when using a 256-bit vector, Slim Teddy reads 32 bytes at a timr but Fat Teddy reads 16 bytes at a time.

Fat Teddy is useful when searching for a large number of literals. The extra number of buckets spreads the literals out more and reduces verification time.

Currently we only implement this for AVX on x86_64. It would be nice to implement this for SSE on x86_64 and NEON on aarch64, with the latter two only reading 8 bytes at a time. It’s not clear how well it would work, but there are some tricky things to figure out in terms of implementation. The half_shift_in_{one,two,three}_bytes methods in particular are probably the trickiest of the bunch. For AVX2, these are implemented by taking advantage of the fact that _mm256_alignr_epi8 operates on each 128-bit half instead of the full 256-bit vector. (Where as _mm_alignr_epi8 operates on the full 128-bit vector and not on each 64-bit half.) I didn’t do a careful survey of NEON to see if it could easily support these operations.

Required Associated Types§

type Half: Vector

Required Methods§

unsafe fn load_half_unaligned(data: *const u8) -> Self

Read a half-vector-size number of bytes from the given pointer, and broadcast it across both halfs of a full vector. The pointer does not need to be aligned.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

Callers must guarantee that at least Self::HALF::BYTES bytes are readable from data.

unsafe fn half_shift_in_one_byte(self, vector2: Self) -> Self

Like Vector::shift_in_one_byte, except this is done for each half of the vector instead.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

unsafe fn half_shift_in_two_bytes(self, vector2: Self) -> Self

Like Vector::shift_in_two_bytes, except this is done for each half of the vector instead.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

unsafe fn half_shift_in_three_bytes(self, vector2: Self) -> Self

Like Vector::shift_in_two_bytes, except this is done for each half of the vector instead.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

unsafe fn swap_halves(self) -> Self

Swap the 128-bit lanes in this vector.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

unsafe fn interleave_low_8bit_lanes(self, vector2: Self) -> Self

Unpack and interleave the 8-bit lanes from the low 128 bits of each vector and return the result.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

unsafe fn interleave_high_8bit_lanes(self, vector2: Self) -> Self

Unpack and interleave the 8-bit lanes from the high 128 bits of each vector and return the result.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

unsafe fn for_each_low_64bit_lane<T>( self, vector2: Self, f: impl FnMut(usize, u64) -> Option<T>, ) -> Option<T>

Call the provided function for each 64-bit lane in the lower half of this vector and then in the other vector. The given function is provided the lane index and lane value as a u64. (The high 128-bits of each vector are ignored.)

If f returns Some, then iteration over the lanes is stopped and the value is returned. Otherwise, this returns None.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

Object Safety§

This trait is not object safe.

Implementations on Foreign Types§

impl FatVector for __m256i

type Half = __m128i

unsafe fn load_half_unaligned(data: *const u8) -> Self

unsafe fn half_shift_in_one_byte(self, vector2: Self) -> Self

unsafe fn half_shift_in_two_bytes(self, vector2: Self) -> Self

unsafe fn half_shift_in_three_bytes(self, vector2: Self) -> Self

unsafe fn swap_halves(self) -> Self

unsafe fn interleave_low_8bit_lanes(self, vector2: Self) -> Self

unsafe fn interleave_high_8bit_lanes(self, vector2: Self) -> Self

unsafe fn for_each_low_64bit_lane<T>( self, vector2: Self, f: impl FnMut(usize, u64) -> Option<T>, ) -> Option<T>

Implementors§