Struct memchr::memmem::searcher::Prefilter

source ·
struct Prefilter {
    call: unsafe fn(strat: &Prefilter, haystack: &[u8]) -> Option<usize>,
    kind: PrefilterKind,
    rarest_byte: u8,
    rarest_offset: u8,
}
Expand description

The implementation of a prefilter.

This type encapsulates dispatch to one of several possible choices for a prefilter. Generally speaking, all prefilters have the same approximate algorithm: they choose a couple of bytes from the needle that are believed to be rare, use a fast vector algorithm to look for those bytes and return positions as candidates for some substring search algorithm (currently only Two-Way) to confirm as a match or not.

The differences between the algorithms are actually at the vector implementation level. Namely, we need different routines based on both which target architecture we’re on and what CPU features are supported.

The straight-forwardly obvious approach here is to use an enum, and make Prefilter::find do case analysis to determine which algorithm was selected and invoke it. However, I’ve observed that this leads to poor codegen in some cases, especially in latency sensitive benchmarks. That is, this approach comes with overhead that I wasn’t able to eliminate.

The second obvious approach is to use dynamic dispatch with traits. Doing that in this context where Prefilter owns the selection generally requires heap allocation, and this code is designed to run in core-only environments.

So we settle on using a union (that’s PrefilterKind) and a function pointer (that’s PrefilterKindFn). We select the right function pointer based on which field in the union we set, and that function in turn knows which field of the union to access. The downside of this approach is that it forces us to think about safety, but the upside is that there are some nice latency improvements to benchmarks. (Especially the memmem/sliceslice/short benchmark.)

In cases where we’ve selected a vector algorithm and the haystack given is too short, we fallback to the scalar version of memchr on the rarest_byte. (The scalar version of memchr is still better than a naive byte-at-a-time loop because it will read in usize-sized chunks at a time.)

Fields§

§call: unsafe fn(strat: &Prefilter, haystack: &[u8]) -> Option<usize>§kind: PrefilterKind§rarest_byte: u8§rarest_offset: u8

Implementations§

source§

impl Prefilter

source

fn fallback<R: HeuristicFrequencyRank>( ranker: R, pair: Pair, needle: &[u8], ) -> Option<Prefilter>

Return a “fallback” prefilter, but only if it is believed to be effective.

source

fn sse2(finder: Finder, needle: &[u8]) -> Prefilter

Return a prefilter using a x86_64 SSE2 vector algorithm.

source

fn avx2(finder: Finder, needle: &[u8]) -> Prefilter

Return a prefilter using a x86_64 AVX2 vector algorithm.

source

fn find(&self, haystack: &[u8]) -> Option<usize>

Return a candidate position for a match.

When this returns an offset, it implies that a match could begin at that offset, but it may not. That is, it is possible for a false positive to be returned.

When None is returned, then it is guaranteed that there are no matches for the needle in the given haystack. That is, it is impossible for a false negative to be returned.

The purpose of this routine is to look for candidate matching positions as quickly as possible before running a (likely) slower confirmation step.

source

fn find_simple(&self, haystack: &[u8]) -> Option<usize>

A “simple” prefilter that just looks for the occurrence of the rarest byte from the needle. This is generally only used for very small haystacks.

Trait Implementations§

source§

impl Clone for Prefilter

source§

fn clone(&self) -> Prefilter

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Prefilter

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Copy for Prefilter

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.