Struct aho_corasick::packed::teddy::generic::Teddy

source ·
struct Teddy<const BUCKETS: usize> {
    patterns: Arc<Patterns>,
    buckets: [Vec<PatternID>; BUCKETS],
}
Expand description

The common elements of all “slim” and “fat” Teddy search implementations.

Essentially, this contains the patterns and the buckets. Namely, it contains enough to implement the verification step after candidates are identified via the shuffle masks.

It is generic over the number of buckets used. In general, the number of buckets is either 8 (for “slim” Teddy) or 16 (for “fat” Teddy). The generic parameter isn’t really meant to be instantiated for any value other than 8 or 16, although it is technically possible. The main hiccup is that there is some bit-shifting done in the critical part of verification that could be quite expensive if N is not a multiple of 2.

Fields§

§patterns: Arc<Patterns>

The patterns we are searching for.

A pattern string can be found by its PatternID.

§buckets: [Vec<PatternID>; BUCKETS]

The allocation of patterns in buckets. This only contains the IDs of patterns. In order to do full verification, callers must provide the actual patterns when using Teddy.

Implementations§

source§

impl<const BUCKETS: usize> Teddy<BUCKETS>

source

fn new(patterns: Arc<Patterns>) -> Teddy<BUCKETS>

Create a new generic data structure for Teddy verification.

source

unsafe fn verify64( &self, cur: *const u8, end: *const u8, candidate_chunk: u64, ) -> Option<Match>

Verify whether there are any matches starting at or after cur in the haystack. The candidate chunk given should correspond to 8-bit bitsets for N buckets.

§Safety

The given pointers representing the haystack must be valid to read from.

source

unsafe fn verify_bucket( &self, cur: *const u8, end: *const u8, bucket: usize, ) -> Option<Match>

Verify whether there are any matches starting at at in the given haystack corresponding only to patterns in the given bucket.

§Safety

The given pointers representing the haystack must be valid to read from.

The bucket index must be less than or equal to self.buckets.len().

source

fn mask_len(&self) -> usize

Returns the total number of masks required by the patterns in this Teddy searcher.

Basically, the mask length corresponds to the type of Teddy searcher to use: a 1-byte, 2-byte, 3-byte or 4-byte searcher. The bigger the better, typically, since searching for longer substrings usually decreases the rate of false positives. Therefore, the number of masks needed is the length of the shortest pattern in this searcher. If the length of the shortest pattern (in bytes) is bigger than 4, then the mask length is 4 since there are no Teddy searchers for more than 4 bytes.

source

fn memory_usage(&self) -> usize

Returns the approximate total amount of heap used by this type, in units of bytes.

source§

impl Teddy<8>

source

unsafe fn verify<V: Vector>( &self, cur: *const u8, end: *const u8, candidate: V, ) -> Option<Match>

Runs the verification routine for “slim” Teddy.

The candidate given should be a collection of 8-bit bitsets (one bitset per lane), where the ith bit is set in the jth lane if and only if the byte occurring at at + j in cur is in the bucket i.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

The given pointers must be valid to read from.

source§

impl Teddy<16>

source

unsafe fn verify<V: FatVector>( &self, cur: *const u8, end: *const u8, candidate: V, ) -> Option<Match>

Runs the verification routine for “fat” Teddy.

The candidate given should be a collection of 8-bit bitsets (one bitset per lane), where the ith bit is set in the jth lane if and only if the byte occurring at at + (j < 16 ? j : j - 16) in cur is in the bucket j < 16 ? i : i + 8.

§Safety

Callers must ensure that this is okay to call in the current target for the current CPU.

The given pointers must be valid to read from.

Trait Implementations§

source§

impl<const BUCKETS: usize> Clone for Teddy<BUCKETS>

source§

fn clone(&self) -> Teddy<BUCKETS>

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl<const BUCKETS: usize> Debug for Teddy<BUCKETS>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<const BUCKETS: usize> Freeze for Teddy<BUCKETS>

§

impl<const BUCKETS: usize> RefUnwindSafe for Teddy<BUCKETS>

§

impl<const BUCKETS: usize> Send for Teddy<BUCKETS>

§

impl<const BUCKETS: usize> Sync for Teddy<BUCKETS>

§

impl<const BUCKETS: usize> Unpin for Teddy<BUCKETS>

§

impl<const BUCKETS: usize> UnwindSafe for Teddy<BUCKETS>

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.