Struct memchr::arch::all::twoway::TwoWay

source ·
struct TwoWay {
    byteset: ApproximateByteSet,
    critical_pos: usize,
    shift: Shift,
}
Expand description

An implementation of the TwoWay substring search algorithm.

This searcher supports forward and reverse search, although not simultaneously. It runs in O(n + m) time and O(1) space, where n ~ len(needle) and m ~ len(haystack).

The implementation here roughly matches that which was developed by Crochemore and Perrin in their 1991 paper “Two-way string-matching.” The changes in this implementation are 1) the use of zero-based indices, 2) a heuristic skip table based on the last byte (borrowed from Rust’s standard library) and 3) the addition of heuristics for a fast skip loop. For (3), callers can pass any kind of prefilter they want, but usually it’s one based on a heuristic that uses an approximate background frequency of bytes to choose rare bytes to quickly look for candidate match positions. Note though that currently, this prefilter functionality is not exposed directly in the public API. (File an issue if you want it and provide a use case please.)

The heuristic for fast skipping is automatically shut off if it’s detected to be ineffective at search time. Generally, this only occurs in pathological cases. But this is generally necessary in order to preserve a O(n + m) time bound.

The code below is fairly complex and not obviously correct at all. It’s likely necessary to read the Two-Way paper cited above in order to fully grok this code. The essence of it is:

  1. Do something to detect a “critical” position in the needle.
  2. For the current position in the haystack, look if needle[critical..] matches at that position.
  3. If so, look if needle[..critical] matches.
  4. If a mismatch occurs, shift the search by some amount based on the critical position and a pre-computed shift.

This type is wrapped in the forward and reverse finders that expose consistent forward or reverse APIs.

Fields§

§byteset: ApproximateByteSet

A small bitset used as a quick prefilter (in addition to any prefilter given by the caller). Namely, a bit i is set if and only if b%64==i for any b == needle[i].

When used as a prefilter, if the last byte at the current candidate position is NOT in this set, then we can skip that entire candidate position (the length of the needle). This is essentially the shift trick found in Boyer-Moore, but only applied to bytes that don’t appear in the needle.

N.B. This trick was inspired by something similar in std’s implementation of Two-Way.

§critical_pos: usize

A critical position in needle. Specifically, this position corresponds to beginning of either the minimal or maximal suffix in needle. (N.B. See SuffixType below for why “minimal” isn’t quite the correct word here.)

This is the position at which every search begins. Namely, search starts by scanning text to the right of this position, and only if there’s a match does the text to the left of this position get scanned.

§shift: Shift

The amount we shift by in the Two-Way search algorithm. This corresponds to the “small period” and “large period” cases.

Trait Implementations§

source§

impl Clone for TwoWay

source§

fn clone(&self) -> TwoWay

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for TwoWay

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Copy for TwoWay

Auto Trait Implementations§

§

impl Freeze for TwoWay

§

impl RefUnwindSafe for TwoWay

§

impl Send for TwoWay

§

impl Sync for TwoWay

§

impl Unpin for TwoWay

§

impl UnwindSafe for TwoWay

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.