Struct aho_corasick::dfa::DFA
source · pub struct DFA {Show 13 fields
trans: Vec<StateID>,
matches: Vec<Vec<PatternID>>,
matches_memory_usage: usize,
pattern_lens: Vec<SmallIndex>,
prefilter: Option<Prefilter>,
match_kind: MatchKind,
state_len: usize,
alphabet_len: usize,
stride2: usize,
byte_classes: ByteClasses,
min_pattern_len: usize,
max_pattern_len: usize,
special: Special,
}
Expand description
A DFA implementation of Aho-Corasick.
When possible, prefer using AhoCorasick
instead of
this type directly. Using a DFA
directly is typically only necessary when
one needs access to the Automaton
trait implementation.
This DFA can only be built by first constructing a noncontiguous::NFA
.
Both DFA::new
and Builder::build
do this for you automatically, but
Builder::build_from_noncontiguous
permits doing it explicitly.
A DFA provides the best possible search performance (in this crate) via two mechanisms:
- All states use a dense representation for their transitions.
- All failure transitions are pre-computed such that they are never explicitly handled at search time.
These two facts combined mean that every state transition is performed
using a constant number of instructions. However, this comes at
great cost. The memory usage of a DFA can be quite exorbitant.
It is potentially multiple orders of magnitude greater than a
contiguous::NFA
for example. In exchange,
a DFA will typically have better search speed than a contiguous::NFA
, but
not by orders of magnitude.
Unless you have a small number of patterns or memory usage is not a concern and search performance is critical, a DFA is usually not the best choice.
Moreover, unlike the NFAs in this crate, it is costly for a DFA to
support for anchored and unanchored search configurations. Namely,
since failure transitions are pre-computed, supporting both anchored
and unanchored searches requires a duplication of the transition table,
making the memory usage of such a DFA ever bigger. (The NFAs in this crate
unconditionally support both anchored and unanchored searches because there
is essentially no added cost for doing so.) It is for this reason that
a DFA’s support for anchored and unanchored searches can be configured
via Builder::start_kind
. By default, a DFA only supports unanchored
searches.
§Example
This example shows how to build an DFA
directly and use it to execute
Automaton::try_find
:
use aho_corasick::{
automaton::Automaton,
dfa::DFA,
Input, Match,
};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let nfa = DFA::new(patterns).unwrap();
assert_eq!(
Some(Match::must(0, 1..2)),
nfa.try_find(&Input::new(haystack))?,
);
It is also possible to implement your own version of try_find
. See the
Automaton
documentation for an example.
Fields§
§trans: Vec<StateID>
The DFA transition table. IDs in this table are pre-multiplied. So instead of the IDs being 0, 1, 2, 3, …, they are 0stride, 1stride, 2stride, 3stride, …
matches: Vec<Vec<PatternID>>
The matches for every match state in this DFA. This is first indexed by
state index (so that’s sid >> stride2
) and then by order in which the
matches are meant to occur.
matches_memory_usage: usize
The amount of heap memory used, in bytes, by the inner Vecs of ‘matches’.
pattern_lens: Vec<SmallIndex>
The length of each pattern. This is used to compute the start offset of a match.
prefilter: Option<Prefilter>
A prefilter for accelerating searches, if one exists.
match_kind: MatchKind
The match semantics built into this DFA.
state_len: usize
The total number of states in this DFA.
alphabet_len: usize
The alphabet size, or total number of equivalence classes, for this DFA. Note that the actual number of transitions in each state is stride=2^stride2, where stride is the smallest power of 2 greater than or equal to alphabet_len. We do things this way so that we can use bitshifting to go from a state ID to an index into ‘matches’.
stride2: usize
The exponent with a base 2, such that stride=2^stride2. Given a state index ‘i’, its state identifier is ‘i << stride2’. Given a state identifier ‘sid’, its state index is ‘sid >> stride2’.
byte_classes: ByteClasses
The equivalence classes for this DFA. All transitions are defined on equivalence classes and not on the 256 distinct byte values.
min_pattern_len: usize
The length of the shortest pattern in this automaton.
max_pattern_len: usize
The length of the longest pattern in this automaton.
special: Special
The information required to deduce which states are “special” in this DFA.
Implementations§
source§impl DFA
impl DFA
source§impl DFA
impl DFA
sourceconst DEAD: StateID = _
const DEAD: StateID = _
A sentinel state ID indicating that a search should stop once it has entered this state. When a search stops, it returns a match if one has been found, otherwise no match. A DFA always has an actual dead state at this ID.
N.B. DFAs, unlike NFAs, do not have any notion of a FAIL state. Namely, the whole point of a DFA is that the FAIL state is completely compiled away. That is, DFA construction involves pre-computing the failure transitions everywhere, such that failure transitions are no longer used at search time. This, combined with its uniformly dense representation, are the two most important factors in why it’s faster than the NFAs in this crate.
sourcefn set_matches(&mut self, sid: StateID, pids: impl Iterator<Item = PatternID>)
fn set_matches(&mut self, sid: StateID, pids: impl Iterator<Item = PatternID>)
Adds the given pattern IDs as matches to the given state and also records the added memory usage.
Trait Implementations§
source§impl Automaton for DFA
impl Automaton for DFA
source§fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>
fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>
source§fn is_special(&self, sid: StateID) -> bool
fn is_special(&self, sid: StateID) -> bool
source§fn is_dead(&self, sid: StateID) -> bool
fn is_dead(&self, sid: StateID) -> bool
source§fn is_match(&self, sid: StateID) -> bool
fn is_match(&self, sid: StateID) -> bool
source§fn is_start(&self, sid: StateID) -> bool
fn is_start(&self, sid: StateID) -> bool
source§fn match_kind(&self) -> MatchKind
fn match_kind(&self) -> MatchKind
source§fn patterns_len(&self) -> usize
fn patterns_len(&self) -> usize
source§fn pattern_len(&self, pid: PatternID) -> usize
fn pattern_len(&self, pid: PatternID) -> usize
source§fn min_pattern_len(&self) -> usize
fn min_pattern_len(&self) -> usize
source§fn max_pattern_len(&self) -> usize
fn max_pattern_len(&self) -> usize
source§fn match_len(&self, sid: StateID) -> usize
fn match_len(&self, sid: StateID) -> usize
source§fn memory_usage(&self) -> usize
fn memory_usage(&self) -> usize
source§fn prefilter(&self) -> Option<&Prefilter>
fn prefilter(&self) -> Option<&Prefilter>
source§fn try_find(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>
fn try_find(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>
source§fn try_find_overlapping(
&self,
input: &Input<'_>,
state: &mut OverlappingState,
) -> Result<(), MatchError>
fn try_find_overlapping( &self, input: &Input<'_>, state: &mut OverlappingState, ) -> Result<(), MatchError>
source§fn try_find_iter<'a, 'h>(
&'a self,
input: Input<'h>,
) -> Result<FindIter<'a, 'h, Self>, MatchError>where
Self: Sized,
fn try_find_iter<'a, 'h>(
&'a self,
input: Input<'h>,
) -> Result<FindIter<'a, 'h, Self>, MatchError>where
Self: Sized,
source§fn try_find_overlapping_iter<'a, 'h>(
&'a self,
input: Input<'h>,
) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>where
Self: Sized,
fn try_find_overlapping_iter<'a, 'h>(
&'a self,
input: Input<'h>,
) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>where
Self: Sized,
source§fn try_replace_all<B>(
&self,
haystack: &str,
replace_with: &[B],
) -> Result<String, MatchError>
fn try_replace_all<B>( &self, haystack: &str, replace_with: &[B], ) -> Result<String, MatchError>
haystack
with
strings from replace_with
depending on the pattern that
matched. The replace_with
slice must have length equal to
Automaton::patterns_len
. Read moresource§fn try_replace_all_bytes<B>(
&self,
haystack: &[u8],
replace_with: &[B],
) -> Result<Vec<u8>, MatchError>
fn try_replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Result<Vec<u8>, MatchError>
haystack
with
strings from replace_with
depending on the pattern that
matched. The replace_with
slice must have length equal to
Automaton::patterns_len
. Read moresource§fn try_replace_all_with<F>(
&self,
haystack: &str,
dst: &mut String,
replace_with: F,
) -> Result<(), MatchError>
fn try_replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, ) -> Result<(), MatchError>
haystack
by calling the
replace_with
closure given. Read moresource§fn try_replace_all_with_bytes<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
replace_with: F,
) -> Result<(), MatchError>
fn try_replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, ) -> Result<(), MatchError>
haystack
by calling the
replace_with
closure given. Read moresource§fn try_stream_find_iter<'a, R: Read>(
&'a self,
rdr: R,
) -> Result<StreamFindIter<'a, Self, R>, MatchError>where
Self: Sized,
fn try_stream_find_iter<'a, R: Read>(
&'a self,
rdr: R,
) -> Result<StreamFindIter<'a, Self, R>, MatchError>where
Self: Sized,
source§fn try_stream_replace_all<R, W, B>(
&self,
rdr: R,
wtr: W,
replace_with: &[B],
) -> Result<()>
fn try_stream_replace_all<R, W, B>( &self, rdr: R, wtr: W, replace_with: &[B], ) -> Result<()>
rdr
with strings from
replace_with
depending on the pattern that matched, and writes the
result to wtr
. The replace_with
slice must have length equal to
Automaton::patterns_len
. Read more