struct Teddy<const BUCKETS: usize> {
patterns: Arc<Patterns>,
buckets: [Vec<PatternID>; BUCKETS],
}
Expand description
The common elements of all “slim” and “fat” Teddy search implementations.
Essentially, this contains the patterns and the buckets. Namely, it contains enough to implement the verification step after candidates are identified via the shuffle masks.
It is generic over the number of buckets used. In general, the number of
buckets is either 8 (for “slim” Teddy) or 16 (for “fat” Teddy). The generic
parameter isn’t really meant to be instantiated for any value other than
8 or 16, although it is technically possible. The main hiccup is that there
is some bit-shifting done in the critical part of verification that could
be quite expensive if N
is not a multiple of 2.
Fields§
§patterns: Arc<Patterns>
The patterns we are searching for.
A pattern string can be found by its PatternID
.
buckets: [Vec<PatternID>; BUCKETS]
The allocation of patterns in buckets. This only contains the IDs of patterns. In order to do full verification, callers must provide the actual patterns when using Teddy.
Implementations§
source§impl<const BUCKETS: usize> Teddy<BUCKETS>
impl<const BUCKETS: usize> Teddy<BUCKETS>
sourcefn new(patterns: Arc<Patterns>) -> Teddy<BUCKETS>
fn new(patterns: Arc<Patterns>) -> Teddy<BUCKETS>
Create a new generic data structure for Teddy verification.
sourceunsafe fn verify64(
&self,
cur: *const u8,
end: *const u8,
candidate_chunk: u64,
) -> Option<Match>
unsafe fn verify64( &self, cur: *const u8, end: *const u8, candidate_chunk: u64, ) -> Option<Match>
Verify whether there are any matches starting at or after cur
in the
haystack. The candidate chunk given should correspond to 8-bit bitsets
for N buckets.
§Safety
The given pointers representing the haystack must be valid to read from.
sourceunsafe fn verify_bucket(
&self,
cur: *const u8,
end: *const u8,
bucket: usize,
) -> Option<Match>
unsafe fn verify_bucket( &self, cur: *const u8, end: *const u8, bucket: usize, ) -> Option<Match>
Verify whether there are any matches starting at at
in the given
haystack
corresponding only to patterns in the given bucket.
§Safety
The given pointers representing the haystack must be valid to read from.
The bucket index must be less than or equal to self.buckets.len()
.
sourcefn mask_len(&self) -> usize
fn mask_len(&self) -> usize
Returns the total number of masks required by the patterns in this Teddy searcher.
Basically, the mask length corresponds to the type of Teddy searcher to use: a 1-byte, 2-byte, 3-byte or 4-byte searcher. The bigger the better, typically, since searching for longer substrings usually decreases the rate of false positives. Therefore, the number of masks needed is the length of the shortest pattern in this searcher. If the length of the shortest pattern (in bytes) is bigger than 4, then the mask length is 4 since there are no Teddy searchers for more than 4 bytes.
sourcefn memory_usage(&self) -> usize
fn memory_usage(&self) -> usize
Returns the approximate total amount of heap used by this type, in units of bytes.
source§impl Teddy<8>
impl Teddy<8>
sourceunsafe fn verify<V: Vector>(
&self,
cur: *const u8,
end: *const u8,
candidate: V,
) -> Option<Match>
unsafe fn verify<V: Vector>( &self, cur: *const u8, end: *const u8, candidate: V, ) -> Option<Match>
Runs the verification routine for “slim” Teddy.
The candidate given should be a collection of 8-bit bitsets (one bitset
per lane), where the ith bit is set in the jth lane if and only if the
byte occurring at at + j
in cur
is in the bucket i
.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
The given pointers must be valid to read from.
source§impl Teddy<16>
impl Teddy<16>
sourceunsafe fn verify<V: FatVector>(
&self,
cur: *const u8,
end: *const u8,
candidate: V,
) -> Option<Match>
unsafe fn verify<V: FatVector>( &self, cur: *const u8, end: *const u8, candidate: V, ) -> Option<Match>
Runs the verification routine for “fat” Teddy.
The candidate given should be a collection of 8-bit bitsets (one bitset
per lane), where the ith bit is set in the jth lane if and only if the
byte occurring at at + (j < 16 ? j : j - 16)
in cur
is in the
bucket j < 16 ? i : i + 8
.
§Safety
Callers must ensure that this is okay to call in the current target for the current CPU.
The given pointers must be valid to read from.