Struct NFA

Help

pub struct NFA {
    match_kind: MatchKind,
    states: Vec<State>,
    sparse: Vec<Transition>,
    dense: Vec<StateID>,
    matches: Vec<Match>,
    pattern_lens: Vec<SmallIndex>,
    prefilter: Option<Prefilter>,
    byte_classes: ByteClasses,
    min_pattern_len: usize,
    max_pattern_len: usize,
    special: Special,
}

Expand description

A noncontiguous NFA implementation of Aho-Corasick.

When possible, prefer using AhoCorasick instead of this type directly. Using an NFA directly is typically only necessary when one needs access to the Automaton trait implementation.

This NFA represents the “core” implementation of Aho-Corasick in this crate. Namely, constructing this NFA involving building a trie and then filling in the failure transitions between states, similar to what is described in any standard textbook description of Aho-Corasick.

In order to minimize heap usage and to avoid additional construction costs, this implementation represents the transitions of all states as distinct sparse memory allocations. This is where it gets its name from. That is, this NFA has no contiguous memory allocation for its transition table. Each state gets its own allocation.

While the sparse representation keeps memory usage to somewhat reasonable levels, it is still quite large and also results in somewhat mediocre search performance. For this reason, it is almost always a good idea to use a contiguous::NFA instead. It is marginally slower to build, but has higher throughput and can sometimes use an order of magnitude less memory. The main reason to use a noncontiguous NFA is when you need the fastest possible construction time, or when a contiguous NFA does not have the desired capacity. (The total number of NFA states it can have is fewer than a noncontiguous NFA.)

§Example

This example shows how to build an NFA directly and use it to execute Automaton::try_find:

use aho_corasick::{
    automaton::Automaton,
    nfa::noncontiguous::NFA,
    Input, Match,
};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let nfa = NFA::new(patterns).unwrap();
assert_eq!(
    Some(Match::must(0, 1..2)),
    nfa.try_find(&Input::new(haystack))?,
);

It is also possible to implement your own version of try_find. See the Automaton documentation for an example.

Fields§

§match_kind: MatchKind

The match semantics built into this NFA.

§states: Vec<State>

A set of states. Each state defines its own transitions, a fail transition and a set of indices corresponding to matches.

The first state is always the fail state, which is used only as a sentinel. Namely, in the final NFA, no transition into the fail state exists. (Well, they do, but they aren’t followed. Instead, the state’s failure transition is followed.)

The second state (index 1) is always the dead state. Dead states are in every automaton, but only used when leftmost-{first,longest} match semantics are enabled. Specifically, they instruct search to stop at specific points in order to report the correct match location. In the standard Aho-Corasick construction, there are no transitions to the dead state.

The third state (index 2) is generally intended to be the starting or “root” state.

§sparse: Vec<Transition>

Transitions stored in a sparse representation via a linked list.

Each transition contains three pieces of information: the byte it is defined for, the state it transitions to and a link to the next transition in the same state (or StateID::ZERO if it is the last transition).

The first transition for each state is determined by State::sparse.

Note that this contains a complete set of all transitions in this NFA, including states that have a dense representation for transitions. (Adding dense transitions for a state doesn’t remove its sparse transitions, since deleting transitions from this particular sparse representation would be fairly expensive.)

§dense: Vec<StateID>

Transitions stored in a dense representation.

A state has a row in this table if and only if State::dense is not equal to StateID::ZERO. When not zero, there are precisely NFA::byte_classes::alphabet_len() entries beginning at State::dense in this table.

Generally a very small minority of states have a dense representation since it uses so much memory.

§matches: Vec<Match>

Matches stored in linked list for each state.

Like sparse transitions, each match has a link to the next match in the state.

The first match for each state is determined by State::matches.

§pattern_lens: Vec<SmallIndex>

The length, in bytes, of each pattern in this NFA. This slice is indexed by PatternID.

The number of entries in this vector corresponds to the total number of patterns in this automaton.

§prefilter: Option<Prefilter>

A prefilter for quickly skipping to candidate matches, if pertinent.

§byte_classes: ByteClasses

A set of equivalence classes in terms of bytes. We compute this while building the NFA, but don’t use it in the NFA’s states. Instead, we use this for building the DFA. We store it on the NFA since it’s easy to compute while visiting the patterns.

§min_pattern_len: usize

The length, in bytes, of the shortest pattern in this automaton. This information is useful for detecting whether an automaton matches the empty string or not.

§max_pattern_len: usize

The length, in bytes, of the longest pattern in this automaton. This information is useful for keeping correct buffer sizes when searching on streams.

§special: Special

The information required to deduce which states are “special” in this NFA.

Since the DEAD and FAIL states are always the first two states and there are only ever two start states (which follow all of the match states), it follows that we can determine whether a state is a fail, dead, match or start with just a few comparisons on the ID itself:

is_dead(sid): sid == NFA::DEAD is_fail(sid): sid == NFA::FAIL is_match(sid): NFA::FAIL < sid && sid <= max_match_id is_start(sid): sid == start_unanchored_id || sid == start_anchored_id

Note that this only applies to the NFA after it has been constructed. During construction, the start states are the first ones added and the match states are inter-leaved with non-match states. Once all of the states have been added, the states are shuffled such that the above predicates hold.

Struct NFACopy item path

§Example

Fields§

Implementations§

impl NFA

pub fn new<I, P>(patterns: I) -> Result<NFA, BuildError>where I: IntoIterator<Item = P>, P: AsRef<[u8]>,

pub fn builder() -> Builder

impl NFA

pub(crate) const DEAD: StateID

pub(crate) const FAIL: StateID

pub(crate) fn byte_classes(&self) -> &ByteClasses

pub(crate) fn pattern_lens_raw(&self) -> &[SmallIndex]

pub(crate) fn states(&self) -> &[State]

pub(crate) fn special(&self) -> &Special

pub(crate) fn swap_states(&mut self, id1: StateID, id2: StateID)

pub(crate) fn remap(&mut self, map: impl Fn(StateID) -> StateID)

pub(crate) fn iter_trans( &self, sid: StateID, ) -> impl Iterator<Item = Transition> + '_

pub(crate) fn iter_matches( &self, sid: StateID, ) -> impl Iterator<Item = PatternID> + '_

fn next_link(&self, sid: StateID, prev: Option<StateID>) -> Option<StateID>

fn follow_transition(&self, sid: StateID, byte: u8) -> StateID

fn follow_transition_sparse(&self, sid: StateID, byte: u8) -> StateID

fn add_transition( &mut self, prev: StateID, byte: u8, next: StateID, ) -> Result<(), BuildError>

fn init_full_state( &mut self, prev: StateID, next: StateID, ) -> Result<(), BuildError>

§Panics

fn add_match(&mut self, sid: StateID, pid: PatternID) -> Result<(), BuildError>

fn copy_matches(&mut self, src: StateID, dst: StateID) -> Result<(), BuildError>

fn alloc_transition(&mut self) -> Result<StateID, BuildError>

fn alloc_match(&mut self) -> Result<StateID, BuildError>

fn alloc_dense_state(&mut self) -> Result<StateID, BuildError>

fn alloc_state(&mut self, depth: usize) -> Result<StateID, BuildError>

Trait Implementations§

impl Automaton for NFA

fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>

fn next_state(&self, anchored: Anchored, sid: StateID, byte: u8) -> StateID

fn is_special(&self, sid: StateID) -> bool

fn is_dead(&self, sid: StateID) -> bool

fn is_match(&self, sid: StateID) -> bool

fn is_start(&self, sid: StateID) -> bool

fn match_kind(&self) -> MatchKind

fn patterns_len(&self) -> usize

fn pattern_len(&self, pid: PatternID) -> usize

fn min_pattern_len(&self) -> usize

fn max_pattern_len(&self) -> usize

fn match_len(&self, sid: StateID) -> usize

fn match_pattern(&self, sid: StateID, index: usize) -> PatternID

fn memory_usage(&self) -> usize

fn prefilter(&self) -> Option<&Prefilter>

fn try_find(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>

fn try_find_overlapping( &self, input: &Input<'_>, state: &mut OverlappingState, ) -> Result<(), MatchError>

fn try_find_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindIter<'a, 'h, Self>, MatchError>where Self: Sized,

fn try_find_overlapping_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>where Self: Sized,

fn try_replace_all<B>( &self, haystack: &str, replace_with: &[B], ) -> Result<String, MatchError>where Self: Sized, B: AsRef<str>,

fn try_replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Result<Vec<u8>, MatchError>where Self: Sized, B: AsRef<[u8]>,

fn try_replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, ) -> Result<(), MatchError>where Self: Sized, F: FnMut(&Match, &str, &mut String) -> bool,

fn try_replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, ) -> Result<(), MatchError>where Self: Sized, F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool,

fn try_stream_find_iter<'a, R: Read>( &'a self, rdr: R, ) -> Result<StreamFindIter<'a, Self, R>, MatchError>where Self: Sized,

fn try_stream_replace_all<R, W, B>( &self, rdr: R, wtr: W, replace_with: &[B], ) -> Result<()>where Self: Sized, R: Read, W: Write, B: AsRef<[u8]>,

fn try_stream_replace_all_with<R, W, F>( &self, rdr: R, wtr: W, replace_with: F, ) -> Result<()>where Self: Sized, R: Read, W: Write, F: FnMut(&Match, &[u8], &mut W) -> Result<()>,

impl Clone for NFA

fn clone(&self) -> NFA

fn clone_from(&mut self, source: &Self)

impl Debug for NFA

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Remappable for NFA

fn state_len(&self) -> usize

fn swap_states(&mut self, id1: StateID, id2: StateID)

fn remap(&mut self, map: impl Fn(StateID) -> StateID)

impl Sealed for NFA

Auto Trait Implementations§

impl Freeze for NFA

impl RefUnwindSafe for NFA

impl Send for NFA

impl Sync for NFA

impl Unpin for NFA

impl UnwindSafe for NFA

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

Struct NFA

pub fn new<I, P>(patterns: I) -> Result<NFA, BuildError>
where I: IntoIterator<Item = P>, P: AsRef<[u8]>,

fn try_find_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindIter<'a, 'h, Self>, MatchError>
where Self: Sized,

fn try_find_overlapping_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>
where Self: Sized,

fn try_replace_all<B>( &self, haystack: &str, replace_with: &[B], ) -> Result<String, MatchError>
where Self: Sized, B: AsRef<str>,

fn try_replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Result<Vec<u8>, MatchError>
where Self: Sized, B: AsRef<[u8]>,

fn try_replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, ) -> Result<(), MatchError>
where Self: Sized, F: FnMut(&Match, &str, &mut String) -> bool,

fn try_replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, ) -> Result<(), MatchError>
where Self: Sized, F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool,

fn try_stream_find_iter<'a, R: Read>( &'a self, rdr: R, ) -> Result<StreamFindIter<'a, Self, R>, MatchError>
where Self: Sized,

fn try_stream_replace_all<R, W, B>( &self, rdr: R, wtr: W, replace_with: &[B], ) -> Result<()>
where Self: Sized, R: Read, W: Write, B: AsRef<[u8]>,

fn try_stream_replace_all_with<R, W, F>( &self, rdr: R, wtr: W, replace_with: F, ) -> Result<()>
where Self: Sized, R: Read, W: Write, F: FnMut(&Match, &[u8], &mut W) -> Result<()>,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,