Struct aho_corasick::nfa::contiguous::State
source · struct State<'a> {
fail: StateID,
match_len: usize,
trans: StateTrans<'a>,
}
Expand description
The “in memory” representation a single dense or sparse state.
A State
’s in memory representation is not ever actually materialized
during a search with a contiguous NFA. Doing so would be too slow. (Indeed,
the only time a State
is actually constructed is in Debug
impls.)
Instead, a State
exposes a number of static methods for reading certain
things from the raw binary encoding of the state.
Fields§
§fail: StateID
The state to transition to when ‘class_to_next’ yields a transition to the FAIL state.
match_len: usize
The number of pattern IDs in this state. For a non-match state, this is always zero. Otherwise it is always bigger than zero.
trans: StateTrans<'a>
The sparse or dense representation of the transitions for this state.
Implementations§
source§impl<'a> State<'a>
impl<'a> State<'a>
sourceconst KIND: usize = 0usize
const KIND: usize = 0usize
The offset of where the “kind” of a state is stored. If it isn’t one of the sentinel values below, then it’s a sparse state and the kind corresponds to the number of transitions in the state.
sourceconst KIND_DENSE: u32 = 255u32
const KIND_DENSE: u32 = 255u32
A sentinel value indicating that the state uses a dense representation.
sourceconst KIND_ONE: u32 = 254u32
const KIND_ONE: u32 = 254u32
A sentinel value indicating that the state uses a special “one transition” encoding. In practice, non-match states with one transition make up the overwhelming majority of all states in any given Aho-Corasick automaton, so we can specialize them using a very compact representation.
sourceconst MAX_SPARSE_TRANSITIONS: usize = 127usize
const MAX_SPARSE_TRANSITIONS: usize = 127usize
The maximum number of transitions to encode as a sparse state. Usually states with a lot of transitions are either very rare, or occur near the start state. In the latter case, they are probably dense already anyway. In the former case, making them dense is fine because they’re rare.
This needs to be small enough to permit each of the sentinel values for ‘KIND’ above. Namely, a sparse state embeds the number of transitions into the ‘KIND’. Basically, “sparse” is a state kind too, but it’s the “else” branch.
N.B. There isn’t anything particularly magical about 127 here. I just picked it because I figured any sparse state with this many transitions is going to be exceptionally rare, and if it did have this many transitions, then it would be quite slow to do a linear scan on the transitions during a search anyway.
sourcefn remap(
alphabet_len: usize,
old_to_new: &[StateID],
state: &mut [u32],
) -> Result<(), BuildError>
fn remap( alphabet_len: usize, old_to_new: &[StateID], state: &mut [u32], ) -> Result<(), BuildError>
Remap state IDs in-place.
state
should be the the raw binary encoding of a state. (The start
of the slice must correspond to the start of the state, but the slice
may extend past the end of the encoding of the state.)
sourcefn len(alphabet_len: usize, is_match: bool, state: &[u32]) -> usize
fn len(alphabet_len: usize, is_match: bool, state: &[u32]) -> usize
Returns the length, in number of u32s, of this state.
This is useful for reading states consecutively, e.g., in the Debug impl without needing to store a separate map from state index to state identifier.
state
should be the the raw binary encoding of a state. (The start
of the slice must correspond to the start of the state, but the slice
may extend past the end of the encoding of the state.)
sourcefn sparse_trans_len(state: &[u32]) -> usize
fn sparse_trans_len(state: &[u32]) -> usize
Get the number of sparse transitions in this state. This can never be more than State::MAX_SPARSE_TRANSITIONS, as all states with more transitions are encoded as dense states.
state
should be the the raw binary encoding of a sparse state. (The
start of the slice must correspond to the start of the state, but the
slice may extend past the end of the encoding of the state.) If this
isn’t a sparse state, then the return value is unspecified.
Do note that this is only legal to call on a sparse state. So for example, “one transition” state is not a sparse state, so it would not be legal to call this method on such a state.
sourcefn match_len(alphabet_len: usize, state: &[u32]) -> usize
fn match_len(alphabet_len: usize, state: &[u32]) -> usize
Returns the total number of matching pattern IDs in this state. Calling this on a state that isn’t a match results in unspecified behavior. Thus, the returned number is never 0 for all correct calls.
state
should be the the raw binary encoding of a state. (The start
of the slice must correspond to the start of the state, but the slice
may extend past the end of the encoding of the state.)
sourcefn match_pattern(alphabet_len: usize, state: &[u32], index: usize) -> PatternID
fn match_pattern(alphabet_len: usize, state: &[u32], index: usize) -> PatternID
Returns the pattern ID corresponding to the given index for the state
given. The index
provided must be less than the number of pattern IDs
in this state.
state
should be the the raw binary encoding of a state. (The start of
the slice must correspond to the start of the state, but the slice may
extend past the end of the encoding of the state.)
If the given state is not a match state or if the index is out of bounds, then this has unspecified behavior.
sourcefn read(alphabet_len: usize, is_match: bool, state: &'a [u32]) -> State<'a>
fn read(alphabet_len: usize, is_match: bool, state: &'a [u32]) -> State<'a>
Read a state’s binary encoding to its in-memory representation.
alphabet_len
should be the total number of transitions defined for
dense states.
is_match
should be true if this state is a match state and false
otherwise.
state
should be the the raw binary encoding of a state. (The start
of the slice must correspond to the start of the state, but the slice
may extend past the end of the encoding of the state.)
sourcefn write(
nnfa: &NFA,
oldsid: StateID,
old: &State,
classes: &ByteClasses,
dst: &mut Vec<u32>,
force_dense: bool,
) -> Result<StateID, BuildError>
fn write( nnfa: &NFA, oldsid: StateID, old: &State, classes: &ByteClasses, dst: &mut Vec<u32>, force_dense: bool, ) -> Result<StateID, BuildError>
Encode the “old” state from a noncontiguous NFA to its binary
representation to the given dst
slice. classes
should be the byte
classes computed for the noncontiguous NFA that the given state came
from.
This returns an error if dst
became so big that StateID
s can no
longer be created for new states. Otherwise, it returns the state ID of
the new state created.
When force_dense
is true, then the encoded state will always use a
dense format. Otherwise, the choice between dense and sparse will be
automatically chosen based on the old state.
sourcefn write_sparse_trans(
nnfa: &NFA,
oldsid: StateID,
classes: &ByteClasses,
dst: &mut Vec<u32>,
) -> Result<(), BuildError>
fn write_sparse_trans( nnfa: &NFA, oldsid: StateID, classes: &ByteClasses, dst: &mut Vec<u32>, ) -> Result<(), BuildError>
Encode the “old” state transitions from a noncontiguous NFA to its
binary sparse representation to the given dst
slice. classes
should
be the byte classes computed for the noncontiguous NFA that the given
state came from.
This returns an error if dst
became so big that StateID
s can no
longer be created for new states.
sourcefn write_dense_trans(
nnfa: &NFA,
oldsid: StateID,
classes: &ByteClasses,
dst: &mut Vec<u32>,
) -> Result<(), BuildError>
fn write_dense_trans( nnfa: &NFA, oldsid: StateID, classes: &ByteClasses, dst: &mut Vec<u32>, ) -> Result<(), BuildError>
Encode the “old” state transitions from a noncontiguous NFA to its
binary dense representation to the given dst
slice. classes
should
be the byte classes computed for the noncontiguous NFA that the given
state came from.
This returns an error if dst
became so big that StateID
s can no
longer be created for new states.
sourcefn transitions<'b>(&'b self) -> impl Iterator<Item = (u8, StateID)> + 'b
fn transitions<'b>(&'b self) -> impl Iterator<Item = (u8, StateID)> + 'b
Return an iterator over every explicitly defined transition in this state.