Trait aho_corasick::packed::vector::FatVector
source · pub(crate) trait FatVector: Vector {
type Half: Vector;
// Required methods
unsafe fn load_half_unaligned(data: *const u8) -> Self;
unsafe fn half_shift_in_one_byte(self, vector2: Self) -> Self;
unsafe fn half_shift_in_two_bytes(self, vector2: Self) -> Self;
unsafe fn half_shift_in_three_bytes(self, vector2: Self) -> Self;
unsafe fn swap_halves(self) -> Self;
unsafe fn interleave_low_8bit_lanes(self, vector2: Self) -> Self;
unsafe fn interleave_high_8bit_lanes(self, vector2: Self) -> Self;
unsafe fn for_each_low_64bit_lane<T>(
self,
vector2: Self,
f: impl FnMut(usize, u64) -> Option<T>,
) -> Option<T>;
}
Expand description
This trait extends the Vector
trait with additional operations to support
Fat Teddy.
Fat Teddy uses 16 buckets instead of 8, but reads half as many bytes (as the vector size) instead of the full size of a vector per iteration. For example, when using a 256-bit vector, Slim Teddy reads 32 bytes at a timr but Fat Teddy reads 16 bytes at a time.
Fat Teddy is useful when searching for a large number of literals. The extra number of buckets spreads the literals out more and reduces verification time.
Currently we only implement this for AVX on x86_64. It would be nice to
implement this for SSE on x86_64 and NEON on aarch64, with the latter two
only reading 8 bytes at a time. It’s not clear how well it would work, but
there are some tricky things to figure out in terms of implementation. The
half_shift_in_{one,two,three}_bytes
methods in particular are probably
the trickiest of the bunch. For AVX2, these are implemented by taking
advantage of the fact that _mm256_alignr_epi8
operates on each 128-bit
half instead of the full 256-bit vector. (Where as _mm_alignr_epi8
operates on the full 128-bit vector and not on each 64-bit half.) I didn’t
do a careful survey of NEON to see if it could easily support these
operations.
Required Associated Types§
Required Methods§
sourceunsafe fn load_half_unaligned(data: *const u8) -> Self
unsafe fn load_half_unaligned(data: *const u8) -> Self
Read a half-vector-size number of bytes from the given pointer, and broadcast it across both halfs of a full vector. The pointer does not need to be aligned.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
Callers must guarantee that at least Self::HALF::BYTES
bytes are
readable from data
.
sourceunsafe fn half_shift_in_one_byte(self, vector2: Self) -> Self
unsafe fn half_shift_in_one_byte(self, vector2: Self) -> Self
Like Vector::shift_in_one_byte
, except this is done for each half
of the vector instead.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
sourceunsafe fn half_shift_in_two_bytes(self, vector2: Self) -> Self
unsafe fn half_shift_in_two_bytes(self, vector2: Self) -> Self
Like Vector::shift_in_two_bytes
, except this is done for each half
of the vector instead.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
sourceunsafe fn half_shift_in_three_bytes(self, vector2: Self) -> Self
unsafe fn half_shift_in_three_bytes(self, vector2: Self) -> Self
Like Vector::shift_in_two_bytes
, except this is done for each half
of the vector instead.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
sourceunsafe fn swap_halves(self) -> Self
unsafe fn swap_halves(self) -> Self
Swap the 128-bit lanes in this vector.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
sourceunsafe fn interleave_low_8bit_lanes(self, vector2: Self) -> Self
unsafe fn interleave_low_8bit_lanes(self, vector2: Self) -> Self
Unpack and interleave the 8-bit lanes from the low 128 bits of each vector and return the result.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
sourceunsafe fn interleave_high_8bit_lanes(self, vector2: Self) -> Self
unsafe fn interleave_high_8bit_lanes(self, vector2: Self) -> Self
Unpack and interleave the 8-bit lanes from the high 128 bits of each vector and return the result.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
sourceunsafe fn for_each_low_64bit_lane<T>(
self,
vector2: Self,
f: impl FnMut(usize, u64) -> Option<T>,
) -> Option<T>
unsafe fn for_each_low_64bit_lane<T>( self, vector2: Self, f: impl FnMut(usize, u64) -> Option<T>, ) -> Option<T>
Call the provided function for each 64-bit lane in the lower half
of this vector and then in the other vector. The given function is
provided the lane index and lane value as a u64
. (The high 128-bits
of each vector are ignored.)
If f
returns Some
, then iteration over the lanes is stopped and the
value is returned. Otherwise, this returns None
.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.