Struct aho_corasick::packed::rabinkarp::RabinKarp

source ·
pub(crate) struct RabinKarp {
    patterns: Arc<Patterns>,
    buckets: Vec<Vec<(usize, PatternID)>>,
    hash_len: usize,
    hash_2pow: usize,
}
Expand description

An implementation of the Rabin-Karp algorithm. The main idea of this algorithm is to maintain a rolling hash as it moves through the input, and then check whether that hash corresponds to the same hash for any of the patterns we’re looking for.

A draw back of naively scaling Rabin-Karp to multiple patterns is that it requires all of the patterns to be the same length, which in turn corresponds to the number of bytes to hash. We adapt this to work for multiple patterns of varying size by fixing the number of bytes to hash to be the length of the smallest pattern. We also split the patterns into several buckets to hopefully make the confirmation step faster.

Wikipedia has a decent explanation, if a bit heavy on the theory: https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm

But ESMAJ provides something a bit more concrete: https://www-igm.univ-mlv.fr/~lecroq/string/node5.html

Fields§

§patterns: Arc<Patterns>

The patterns we’re searching for.

§buckets: Vec<Vec<(usize, PatternID)>>

The order of patterns in each bucket is significant. Namely, they are arranged such that the first one to match is the correct match. This may not necessarily correspond to the order provided by the caller. For example, if leftmost-longest semantics are used, then the patterns are sorted by their length in descending order. If leftmost-first semantics are used, then the patterns are sorted by their pattern ID in ascending order (which corresponds to the caller’s order).

§hash_len: usize

The length of the hashing window. Generally, this corresponds to the length of the smallest pattern.

§hash_2pow: usize

The factor to subtract out of a hash before updating it with a new byte.

Implementations§

source§

impl RabinKarp

source

pub(crate) fn new(patterns: &Arc<Patterns>) -> RabinKarp

Compile a new Rabin-Karp matcher from the patterns given.

This panics if any of the patterns in the collection are empty, or if the collection is itself empty.

source

pub(crate) fn find_at(&self, haystack: &[u8], at: usize) -> Option<Match>

Return the first matching pattern in the given haystack, begining the search at at.

source

pub(crate) fn memory_usage(&self) -> usize

Returns the approximate total amount of heap used by this searcher, in units of bytes.

source

fn verify(&self, id: PatternID, haystack: &[u8], at: usize) -> Option<Match>

Verify whether the pattern with the given id matches at haystack[at..].

We tag this function as cold because it helps improve codegen. Intuitively, it would seem like inlining it would be better. However, the only time this is called and a match is not found is when there there is a hash collision, or when a prefix of a pattern matches but the entire pattern doesn’t match. This is hopefully fairly rare, and if it does occur a lot, it’s going to be slow no matter what we do.

source

fn hash(&self, bytes: &[u8]) -> usize

Hash the given bytes.

source

fn update_hash(&self, prev: usize, old_byte: u8, new_byte: u8) -> usize

Update the hash given based on removing old_byte at the beginning of some byte string, and appending new_byte to the end of that same byte string.

Trait Implementations§

source§

impl Clone for RabinKarp

source§

fn clone(&self) -> RabinKarp

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for RabinKarp

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.