Struct aho_corasick::nfa::contiguous::NFA

source ·
pub struct NFA {
    repr: Vec<u32>,
    pattern_lens: Vec<SmallIndex>,
    state_len: usize,
    prefilter: Option<Prefilter>,
    match_kind: MatchKind,
    alphabet_len: usize,
    byte_classes: ByteClasses,
    min_pattern_len: usize,
    max_pattern_len: usize,
    special: Special,
}
Expand description

A contiguous NFA implementation of Aho-Corasick.

When possible, prefer using AhoCorasick instead of this type directly. Using an NFA directly is typically only necessary when one needs access to the Automaton trait implementation.

This NFA can only be built by first constructing a noncontiguous::NFA. Both NFA::new and Builder::build do this for you automatically, but Builder::build_from_noncontiguous permits doing it explicitly.

The main difference between a noncontiguous NFA and a contiguous NFA is that the latter represents all of its states and transitions in a single allocation, where as the former uses a separate allocation for each state. Doing this at construction time while keeping a low memory footprint isn’t feasible, which is primarily why there are two different NFA types: one that does the least amount of work possible to build itself, and another that does a little extra work to compact itself and make state transitions faster by making some states use a dense representation.

Because a contiguous NFA uses a single allocation, there is a lot more opportunity for compression tricks to reduce the heap memory used. Indeed, it is not uncommon for a contiguous NFA to use an order of magnitude less heap memory than a noncontiguous NFA. Since building a contiguous NFA usually only takes a fraction of the time it takes to build a noncontiguous NFA, the overall build time is not much slower. Thus, in most cases, a contiguous NFA is the best choice.

Since a contiguous NFA uses various tricks for compression and to achieve faster state transitions, currently, its limit on the number of states is somewhat smaller than what a noncontiguous NFA can achieve. Generally speaking, you shouldn’t expect to run into this limit if the number of patterns is under 1 million. It is plausible that this limit will be increased in the future. If the limit is reached, building a contiguous NFA will return an error. Often, since building a contiguous NFA is relatively cheap, it can make sense to always try it even if you aren’t sure if it will fail or not. If it does, you can always fall back to a noncontiguous NFA. (Indeed, the main AhoCorasick type employs a strategy similar to this at construction time.)

§Example

This example shows how to build an NFA directly and use it to execute Automaton::try_find:

use aho_corasick::{
    automaton::Automaton,
    nfa::contiguous::NFA,
    Input, Match,
};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let nfa = NFA::new(patterns).unwrap();
assert_eq!(
    Some(Match::must(0, 1..2)),
    nfa.try_find(&Input::new(haystack))?,
);

It is also possible to implement your own version of try_find. See the Automaton documentation for an example.

Fields§

§repr: Vec<u32>

The raw NFA representation. Each state is packed with a header (containing the format of the state, the failure transition and, for a sparse state, the number of transitions), its transitions and any matching pattern IDs for match states.

§pattern_lens: Vec<SmallIndex>

The length of each pattern. This is used to compute the start offset of a match.

§state_len: usize

The total number of states in this NFA.

§prefilter: Option<Prefilter>

A prefilter for accelerating searches, if one exists.

§match_kind: MatchKind

The match semantics built into this NFA.

§alphabet_len: usize

The alphabet size, or total number of equivalence classes, for this NFA. Dense states always have this many transitions.

§byte_classes: ByteClasses

The equivalence classes for this NFA. All transitions, dense and sparse, are defined on equivalence classes and not on the 256 distinct byte values.

§min_pattern_len: usize

The length of the shortest pattern in this automaton.

§max_pattern_len: usize

The length of the longest pattern in this automaton.

§special: Special

The information required to deduce which states are “special” in this NFA.

Implementations§

source§

impl NFA

source

pub fn new<I, P>(patterns: I) -> Result<NFA, BuildError>
where I: IntoIterator<Item = P>, P: AsRef<[u8]>,

Create a new Aho-Corasick contiguous NFA using the default configuration.

Use a Builder if you want to change the configuration.

source

pub fn builder() -> Builder

A convenience method for returning a new Aho-Corasick contiguous NFA builder.

This usually permits one to just import the NFA type.

source§

impl NFA

source

const DEAD: StateID = _

A sentinel state ID indicating that a search should stop once it has entered this state. When a search stops, it returns a match if one has been found, otherwise no match. A contiguous NFA always has an actual dead state at this ID.

source

const FAIL: StateID = _

Another sentinel state ID indicating that a search should move through current state’s failure transition.

Note that unlike DEAD, this does not actually point to a valid state in a contiguous NFA. (noncontiguous::NFA::FAIL does point to a valid state.) Instead, this points to the position that is guaranteed to never be a valid state ID (by making sure it points to a place in the middle of the encoding of the DEAD state). Since we never need to actually look at the FAIL state itself, this works out.

By why do it this way? So that FAIL is a constant. I don’t have any concrete evidence that this materially helps matters, but it’s easy to do. The alternative would be making the FAIL ID point to the second state, which could be made a constant but is a little trickier to do. The easiest path is to just make the FAIL state a runtime value, but since comparisons with FAIL occur in perf critical parts of the search, we want it to be as tight as possible and not waste any registers.

Very hand wavy… But the code complexity that results from this is very mild.

Trait Implementations§

source§

impl Automaton for NFA

source§

fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>

Returns the starting state for the given anchor mode. Read more
source§

fn next_state(&self, anchored: Anchored, sid: StateID, byte: u8) -> StateID

Performs a state transition from sid for byte and returns the next state. Read more
source§

fn is_special(&self, sid: StateID) -> bool

Returns true if the given ID represents a “special” state. A special state is a dead, match or start state. Read more
source§

fn is_dead(&self, sid: StateID) -> bool

Returns true if the given ID represents a dead state. Read more
source§

fn is_match(&self, sid: StateID) -> bool

Returns true if the given ID represents a match state. Read more
source§

fn is_start(&self, sid: StateID) -> bool

Returns true if the given ID represents a start state. Read more
source§

fn match_kind(&self) -> MatchKind

Returns the match semantics that this automaton was built with.
source§

fn patterns_len(&self) -> usize

Returns the total number of patterns compiled into this automaton.
source§

fn pattern_len(&self, pid: PatternID) -> usize

Returns the length of the pattern for the given ID. Read more
source§

fn min_pattern_len(&self) -> usize

Returns the length, in bytes, of the shortest pattern in this automaton.
source§

fn max_pattern_len(&self) -> usize

Returns the length, in bytes, of the longest pattern in this automaton.
source§

fn match_len(&self, sid: StateID) -> usize

Returns the total number of matches for the given state ID. Read more
source§

fn match_pattern(&self, sid: StateID, index: usize) -> PatternID

Returns the pattern ID for the match state given by sid at the index given. Read more
source§

fn memory_usage(&self) -> usize

Returns the heap memory usage, in bytes, used by this automaton.
source§

fn prefilter(&self) -> Option<&Prefilter>

Returns a prefilter, if available, that can be used to accelerate searches for this automaton. Read more
source§

fn try_find(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>

Executes a non-overlapping search with this automaton using the given configuration. Read more
source§

fn try_find_overlapping( &self, input: &Input<'_>, state: &mut OverlappingState, ) -> Result<(), MatchError>

Executes a overlapping search with this automaton using the given configuration. Read more
source§

fn try_find_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindIter<'a, 'h, Self>, MatchError>
where Self: Sized,

Returns an iterator of non-overlapping matches with this automaton using the given configuration. Read more
source§

fn try_find_overlapping_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>
where Self: Sized,

Returns an iterator of overlapping matches with this automaton using the given configuration. Read more
source§

fn try_replace_all<B>( &self, haystack: &str, replace_with: &[B], ) -> Result<String, MatchError>
where Self: Sized, B: AsRef<str>,

Replaces all non-overlapping matches in haystack with strings from replace_with depending on the pattern that matched. The replace_with slice must have length equal to Automaton::patterns_len. Read more
source§

fn try_replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Result<Vec<u8>, MatchError>
where Self: Sized, B: AsRef<[u8]>,

Replaces all non-overlapping matches in haystack with strings from replace_with depending on the pattern that matched. The replace_with slice must have length equal to Automaton::patterns_len. Read more
source§

fn try_replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, ) -> Result<(), MatchError>
where Self: Sized, F: FnMut(&Match, &str, &mut String) -> bool,

Replaces all non-overlapping matches in haystack by calling the replace_with closure given. Read more
source§

fn try_replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, ) -> Result<(), MatchError>
where Self: Sized, F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool,

Replaces all non-overlapping matches in haystack by calling the replace_with closure given. Read more
source§

fn try_stream_find_iter<'a, R: Read>( &'a self, rdr: R, ) -> Result<StreamFindIter<'a, Self, R>, MatchError>
where Self: Sized,

Returns an iterator of non-overlapping matches with this automaton from the stream given. Read more
source§

fn try_stream_replace_all<R, W, B>( &self, rdr: R, wtr: W, replace_with: &[B], ) -> Result<()>
where Self: Sized, R: Read, W: Write, B: AsRef<[u8]>,

Replaces all non-overlapping matches in rdr with strings from replace_with depending on the pattern that matched, and writes the result to wtr. The replace_with slice must have length equal to Automaton::patterns_len. Read more
source§

fn try_stream_replace_all_with<R, W, F>( &self, rdr: R, wtr: W, replace_with: F, ) -> Result<()>
where Self: Sized, R: Read, W: Write, F: FnMut(&Match, &[u8], &mut W) -> Result<()>,

Replaces all non-overlapping matches in rdr by calling the replace_with closure given and writing the result to wtr. Read more
source§

impl Clone for NFA

source§

fn clone(&self) -> NFA

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for NFA

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Sealed for NFA

Auto Trait Implementations§

§

impl Freeze for NFA

§

impl RefUnwindSafe for NFA

§

impl Send for NFA

§

impl Sync for NFA

§

impl Unpin for NFA

§

impl UnwindSafe for NFA

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.