pub struct DFA {Show 13 fields
    trans: Vec<StateID>,
    matches: Vec<Vec<PatternID>>,
    matches_memory_usage: usize,
    pattern_lens: Vec<SmallIndex>,
    prefilter: Option<Prefilter>,
    match_kind: MatchKind,
    state_len: usize,
    alphabet_len: usize,
    stride2: usize,
    byte_classes: ByteClasses,
    min_pattern_len: usize,
    max_pattern_len: usize,
    special: Special,
}Expand description
A DFA implementation of Aho-Corasick.
When possible, prefer using AhoCorasick instead of
this type directly. Using a DFA directly is typically only necessary when
one needs access to the Automaton trait implementation.
This DFA can only be built by first constructing a noncontiguous::NFA.
Both DFA::new and Builder::build do this for you automatically, but
Builder::build_from_noncontiguous permits doing it explicitly.
A DFA provides the best possible search performance (in this crate) via two mechanisms:
- All states use a dense representation for their transitions.
 - All failure transitions are pre-computed such that they are never explicitly handled at search time.
 
These two facts combined mean that every state transition is performed
using a constant number of instructions. However, this comes at
great cost. The memory usage of a DFA can be quite exorbitant.
It is potentially multiple orders of magnitude greater than a
contiguous::NFA for example. In exchange,
a DFA will typically have better search speed than a contiguous::NFA, but
not by orders of magnitude.
Unless you have a small number of patterns or memory usage is not a concern and search performance is critical, a DFA is usually not the best choice.
Moreover, unlike the NFAs in this crate, it is costly for a DFA to
support for anchored and unanchored search configurations. Namely,
since failure transitions are pre-computed, supporting both anchored
and unanchored searches requires a duplication of the transition table,
making the memory usage of such a DFA ever bigger. (The NFAs in this crate
unconditionally support both anchored and unanchored searches because there
is essentially no added cost for doing so.) It is for this reason that
a DFA’s support for anchored and unanchored searches can be configured
via Builder::start_kind. By default, a DFA only supports unanchored
searches.
§Example
This example shows how to build an DFA directly and use it to execute
Automaton::try_find:
use aho_corasick::{
    automaton::Automaton,
    dfa::DFA,
    Input, Match,
};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let nfa = DFA::new(patterns).unwrap();
assert_eq!(
    Some(Match::must(0, 1..2)),
    nfa.try_find(&Input::new(haystack))?,
);It is also possible to implement your own version of try_find. See the
Automaton documentation for an example.
Fields§
§trans: Vec<StateID>The DFA transition table. IDs in this table are pre-multiplied. So instead of the IDs being 0, 1, 2, 3, …, they are 0stride, 1stride, 2stride, 3stride, …
matches: Vec<Vec<PatternID>>The matches for every match state in this DFA. This is first indexed by
state index (so that’s sid >> stride2) and then by order in which the
matches are meant to occur.
matches_memory_usage: usizeThe amount of heap memory used, in bytes, by the inner Vecs of ‘matches’.
pattern_lens: Vec<SmallIndex>The length of each pattern. This is used to compute the start offset of a match.
prefilter: Option<Prefilter>A prefilter for accelerating searches, if one exists.
match_kind: MatchKindThe match semantics built into this DFA.
state_len: usizeThe total number of states in this DFA.
alphabet_len: usizeThe alphabet size, or total number of equivalence classes, for this DFA. Note that the actual number of transitions in each state is stride=2^stride2, where stride is the smallest power of 2 greater than or equal to alphabet_len. We do things this way so that we can use bitshifting to go from a state ID to an index into ‘matches’.
stride2: usizeThe exponent with a base 2, such that stride=2^stride2. Given a state index ‘i’, its state identifier is ‘i << stride2’. Given a state identifier ‘sid’, its state index is ‘sid >> stride2’.
byte_classes: ByteClassesThe equivalence classes for this DFA. All transitions are defined on equivalence classes and not on the 256 distinct byte values.
min_pattern_len: usizeThe length of the shortest pattern in this automaton.
max_pattern_len: usizeThe length of the longest pattern in this automaton.
special: SpecialThe information required to deduce which states are “special” in this DFA.
Implementations§
Source§impl DFA
 
impl DFA
Source§impl DFA
 
impl DFA
Sourceconst DEAD: StateID
 
const DEAD: StateID
A sentinel state ID indicating that a search should stop once it has entered this state. When a search stops, it returns a match if one has been found, otherwise no match. A DFA always has an actual dead state at this ID.
N.B. DFAs, unlike NFAs, do not have any notion of a FAIL state. Namely, the whole point of a DFA is that the FAIL state is completely compiled away. That is, DFA construction involves pre-computing the failure transitions everywhere, such that failure transitions are no longer used at search time. This, combined with its uniformly dense representation, are the two most important factors in why it’s faster than the NFAs in this crate.
Sourcefn set_matches(&mut self, sid: StateID, pids: impl Iterator<Item = PatternID>)
 
fn set_matches(&mut self, sid: StateID, pids: impl Iterator<Item = PatternID>)
Adds the given pattern IDs as matches to the given state and also records the added memory usage.
Trait Implementations§
Source§impl Automaton for DFA
 
impl Automaton for DFA
Source§fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>
 
fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>
Source§fn is_special(&self, sid: StateID) -> bool
 
fn is_special(&self, sid: StateID) -> bool
Source§fn is_dead(&self, sid: StateID) -> bool
 
fn is_dead(&self, sid: StateID) -> bool
Source§fn is_match(&self, sid: StateID) -> bool
 
fn is_match(&self, sid: StateID) -> bool
Source§fn is_start(&self, sid: StateID) -> bool
 
fn is_start(&self, sid: StateID) -> bool
Source§fn match_kind(&self) -> MatchKind
 
fn match_kind(&self) -> MatchKind
Source§fn patterns_len(&self) -> usize
 
fn patterns_len(&self) -> usize
Source§fn pattern_len(&self, pid: PatternID) -> usize
 
fn pattern_len(&self, pid: PatternID) -> usize
Source§fn min_pattern_len(&self) -> usize
 
fn min_pattern_len(&self) -> usize
Source§fn max_pattern_len(&self) -> usize
 
fn max_pattern_len(&self) -> usize
Source§fn match_len(&self, sid: StateID) -> usize
 
fn match_len(&self, sid: StateID) -> usize
Source§fn memory_usage(&self) -> usize
 
fn memory_usage(&self) -> usize
Source§fn prefilter(&self) -> Option<&Prefilter>
 
fn prefilter(&self) -> Option<&Prefilter>
Source§fn try_find(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>
 
fn try_find(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>
Source§fn try_find_overlapping(
    &self,
    input: &Input<'_>,
    state: &mut OverlappingState,
) -> Result<(), MatchError>
 
fn try_find_overlapping( &self, input: &Input<'_>, state: &mut OverlappingState, ) -> Result<(), MatchError>
Source§fn try_find_iter<'a, 'h>(
    &'a self,
    input: Input<'h>,
) -> Result<FindIter<'a, 'h, Self>, MatchError>where
    Self: Sized,
 
fn try_find_iter<'a, 'h>(
    &'a self,
    input: Input<'h>,
) -> Result<FindIter<'a, 'h, Self>, MatchError>where
    Self: Sized,
Source§fn try_find_overlapping_iter<'a, 'h>(
    &'a self,
    input: Input<'h>,
) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>where
    Self: Sized,
 
fn try_find_overlapping_iter<'a, 'h>(
    &'a self,
    input: Input<'h>,
) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>where
    Self: Sized,
Source§fn try_replace_all<B>(
    &self,
    haystack: &str,
    replace_with: &[B],
) -> Result<String, MatchError>
 
fn try_replace_all<B>( &self, haystack: &str, replace_with: &[B], ) -> Result<String, MatchError>
haystack with
strings from replace_with depending on the pattern that
matched. The replace_with slice must have length equal to
Automaton::patterns_len. Read moreSource§fn try_replace_all_bytes<B>(
    &self,
    haystack: &[u8],
    replace_with: &[B],
) -> Result<Vec<u8>, MatchError>
 
fn try_replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Result<Vec<u8>, MatchError>
haystack with
strings from replace_with depending on the pattern that
matched. The replace_with slice must have length equal to
Automaton::patterns_len. Read moreSource§fn try_replace_all_with<F>(
    &self,
    haystack: &str,
    dst: &mut String,
    replace_with: F,
) -> Result<(), MatchError>
 
fn try_replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, ) -> Result<(), MatchError>
haystack by calling the
replace_with closure given. Read moreSource§fn try_replace_all_with_bytes<F>(
    &self,
    haystack: &[u8],
    dst: &mut Vec<u8>,
    replace_with: F,
) -> Result<(), MatchError>
 
fn try_replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, ) -> Result<(), MatchError>
haystack by calling the
replace_with closure given. Read moreSource§fn try_stream_find_iter<'a, R: Read>(
    &'a self,
    rdr: R,
) -> Result<StreamFindIter<'a, Self, R>, MatchError>where
    Self: Sized,
 
fn try_stream_find_iter<'a, R: Read>(
    &'a self,
    rdr: R,
) -> Result<StreamFindIter<'a, Self, R>, MatchError>where
    Self: Sized,
Source§fn try_stream_replace_all<R, W, B>(
    &self,
    rdr: R,
    wtr: W,
    replace_with: &[B],
) -> Result<()>
 
fn try_stream_replace_all<R, W, B>( &self, rdr: R, wtr: W, replace_with: &[B], ) -> Result<()>
rdr with strings from
replace_with depending on the pattern that matched, and writes the
result to wtr. The replace_with slice must have length equal to
Automaton::patterns_len. Read more