Struct aho_corasick::nfa::contiguous::State

source ·
struct State<'a> {
    fail: StateID,
    match_len: usize,
    trans: StateTrans<'a>,
}
Expand description

The “in memory” representation a single dense or sparse state.

A State’s in memory representation is not ever actually materialized during a search with a contiguous NFA. Doing so would be too slow. (Indeed, the only time a State is actually constructed is in Debug impls.) Instead, a State exposes a number of static methods for reading certain things from the raw binary encoding of the state.

Fields§

§fail: StateID

The state to transition to when ‘class_to_next’ yields a transition to the FAIL state.

§match_len: usize

The number of pattern IDs in this state. For a non-match state, this is always zero. Otherwise it is always bigger than zero.

§trans: StateTrans<'a>

The sparse or dense representation of the transitions for this state.

Implementations§

source§

impl<'a> State<'a>

source

const KIND: usize = 0usize

The offset of where the “kind” of a state is stored. If it isn’t one of the sentinel values below, then it’s a sparse state and the kind corresponds to the number of transitions in the state.

source

const KIND_DENSE: u32 = 255u32

A sentinel value indicating that the state uses a dense representation.

source

const KIND_ONE: u32 = 254u32

A sentinel value indicating that the state uses a special “one transition” encoding. In practice, non-match states with one transition make up the overwhelming majority of all states in any given Aho-Corasick automaton, so we can specialize them using a very compact representation.

source

const MAX_SPARSE_TRANSITIONS: usize = 127usize

The maximum number of transitions to encode as a sparse state. Usually states with a lot of transitions are either very rare, or occur near the start state. In the latter case, they are probably dense already anyway. In the former case, making them dense is fine because they’re rare.

This needs to be small enough to permit each of the sentinel values for ‘KIND’ above. Namely, a sparse state embeds the number of transitions into the ‘KIND’. Basically, “sparse” is a state kind too, but it’s the “else” branch.

N.B. There isn’t anything particularly magical about 127 here. I just picked it because I figured any sparse state with this many transitions is going to be exceptionally rare, and if it did have this many transitions, then it would be quite slow to do a linear scan on the transitions during a search anyway.

source

fn remap( alphabet_len: usize, old_to_new: &[StateID], state: &mut [u32], ) -> Result<(), BuildError>

Remap state IDs in-place.

state should be the the raw binary encoding of a state. (The start of the slice must correspond to the start of the state, but the slice may extend past the end of the encoding of the state.)

source

fn len(alphabet_len: usize, is_match: bool, state: &[u32]) -> usize

Returns the length, in number of u32s, of this state.

This is useful for reading states consecutively, e.g., in the Debug impl without needing to store a separate map from state index to state identifier.

state should be the the raw binary encoding of a state. (The start of the slice must correspond to the start of the state, but the slice may extend past the end of the encoding of the state.)

source

fn kind(state: &[u32]) -> u32

Returns the kind of this state.

This only includes the low byte.

source

fn sparse_trans_len(state: &[u32]) -> usize

Get the number of sparse transitions in this state. This can never be more than State::MAX_SPARSE_TRANSITIONS, as all states with more transitions are encoded as dense states.

state should be the the raw binary encoding of a sparse state. (The start of the slice must correspond to the start of the state, but the slice may extend past the end of the encoding of the state.) If this isn’t a sparse state, then the return value is unspecified.

Do note that this is only legal to call on a sparse state. So for example, “one transition” state is not a sparse state, so it would not be legal to call this method on such a state.

source

fn match_len(alphabet_len: usize, state: &[u32]) -> usize

Returns the total number of matching pattern IDs in this state. Calling this on a state that isn’t a match results in unspecified behavior. Thus, the returned number is never 0 for all correct calls.

state should be the the raw binary encoding of a state. (The start of the slice must correspond to the start of the state, but the slice may extend past the end of the encoding of the state.)

source

fn match_pattern(alphabet_len: usize, state: &[u32], index: usize) -> PatternID

Returns the pattern ID corresponding to the given index for the state given. The index provided must be less than the number of pattern IDs in this state.

state should be the the raw binary encoding of a state. (The start of the slice must correspond to the start of the state, but the slice may extend past the end of the encoding of the state.)

If the given state is not a match state or if the index is out of bounds, then this has unspecified behavior.

source

fn read(alphabet_len: usize, is_match: bool, state: &'a [u32]) -> State<'a>

Read a state’s binary encoding to its in-memory representation.

alphabet_len should be the total number of transitions defined for dense states.

is_match should be true if this state is a match state and false otherwise.

state should be the the raw binary encoding of a state. (The start of the slice must correspond to the start of the state, but the slice may extend past the end of the encoding of the state.)

source

fn write( nnfa: &NFA, oldsid: StateID, old: &State, classes: &ByteClasses, dst: &mut Vec<u32>, force_dense: bool, ) -> Result<StateID, BuildError>

Encode the “old” state from a noncontiguous NFA to its binary representation to the given dst slice. classes should be the byte classes computed for the noncontiguous NFA that the given state came from.

This returns an error if dst became so big that StateIDs can no longer be created for new states. Otherwise, it returns the state ID of the new state created.

When force_dense is true, then the encoded state will always use a dense format. Otherwise, the choice between dense and sparse will be automatically chosen based on the old state.

source

fn write_sparse_trans( nnfa: &NFA, oldsid: StateID, classes: &ByteClasses, dst: &mut Vec<u32>, ) -> Result<(), BuildError>

Encode the “old” state transitions from a noncontiguous NFA to its binary sparse representation to the given dst slice. classes should be the byte classes computed for the noncontiguous NFA that the given state came from.

This returns an error if dst became so big that StateIDs can no longer be created for new states.

source

fn write_dense_trans( nnfa: &NFA, oldsid: StateID, classes: &ByteClasses, dst: &mut Vec<u32>, ) -> Result<(), BuildError>

Encode the “old” state transitions from a noncontiguous NFA to its binary dense representation to the given dst slice. classes should be the byte classes computed for the noncontiguous NFA that the given state came from.

This returns an error if dst became so big that StateIDs can no longer be created for new states.

source

fn transitions<'b>(&'b self) -> impl Iterator<Item = (u8, StateID)> + 'b

Return an iterator over every explicitly defined transition in this state.

Trait Implementations§

source§

impl<'a> Clone for State<'a>

source§

fn clone(&self) -> State<'a>

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl<'a> Debug for State<'a>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'a> Freeze for State<'a>

§

impl<'a> RefUnwindSafe for State<'a>

§

impl<'a> Send for State<'a>

§

impl<'a> Sync for State<'a>

§

impl<'a> Unpin for State<'a>

§

impl<'a> UnwindSafe for State<'a>

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.