Struct aho_corasick::dfa::DFA

source ·
pub struct DFA {
Show 13 fields trans: Vec<StateID>, matches: Vec<Vec<PatternID>>, matches_memory_usage: usize, pattern_lens: Vec<SmallIndex>, prefilter: Option<Prefilter>, match_kind: MatchKind, state_len: usize, alphabet_len: usize, stride2: usize, byte_classes: ByteClasses, min_pattern_len: usize, max_pattern_len: usize, special: Special,
}
Expand description

A DFA implementation of Aho-Corasick.

When possible, prefer using AhoCorasick instead of this type directly. Using a DFA directly is typically only necessary when one needs access to the Automaton trait implementation.

This DFA can only be built by first constructing a noncontiguous::NFA. Both DFA::new and Builder::build do this for you automatically, but Builder::build_from_noncontiguous permits doing it explicitly.

A DFA provides the best possible search performance (in this crate) via two mechanisms:

  • All states use a dense representation for their transitions.
  • All failure transitions are pre-computed such that they are never explicitly handled at search time.

These two facts combined mean that every state transition is performed using a constant number of instructions. However, this comes at great cost. The memory usage of a DFA can be quite exorbitant. It is potentially multiple orders of magnitude greater than a contiguous::NFA for example. In exchange, a DFA will typically have better search speed than a contiguous::NFA, but not by orders of magnitude.

Unless you have a small number of patterns or memory usage is not a concern and search performance is critical, a DFA is usually not the best choice.

Moreover, unlike the NFAs in this crate, it is costly for a DFA to support for anchored and unanchored search configurations. Namely, since failure transitions are pre-computed, supporting both anchored and unanchored searches requires a duplication of the transition table, making the memory usage of such a DFA ever bigger. (The NFAs in this crate unconditionally support both anchored and unanchored searches because there is essentially no added cost for doing so.) It is for this reason that a DFA’s support for anchored and unanchored searches can be configured via Builder::start_kind. By default, a DFA only supports unanchored searches.

§Example

This example shows how to build an DFA directly and use it to execute Automaton::try_find:

use aho_corasick::{
    automaton::Automaton,
    dfa::DFA,
    Input, Match,
};

let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";

let nfa = DFA::new(patterns).unwrap();
assert_eq!(
    Some(Match::must(0, 1..2)),
    nfa.try_find(&Input::new(haystack))?,
);

It is also possible to implement your own version of try_find. See the Automaton documentation for an example.

Fields§

§trans: Vec<StateID>

The DFA transition table. IDs in this table are pre-multiplied. So instead of the IDs being 0, 1, 2, 3, …, they are 0stride, 1stride, 2stride, 3stride, …

§matches: Vec<Vec<PatternID>>

The matches for every match state in this DFA. This is first indexed by state index (so that’s sid >> stride2) and then by order in which the matches are meant to occur.

§matches_memory_usage: usize

The amount of heap memory used, in bytes, by the inner Vecs of ‘matches’.

§pattern_lens: Vec<SmallIndex>

The length of each pattern. This is used to compute the start offset of a match.

§prefilter: Option<Prefilter>

A prefilter for accelerating searches, if one exists.

§match_kind: MatchKind

The match semantics built into this DFA.

§state_len: usize

The total number of states in this DFA.

§alphabet_len: usize

The alphabet size, or total number of equivalence classes, for this DFA. Note that the actual number of transitions in each state is stride=2^stride2, where stride is the smallest power of 2 greater than or equal to alphabet_len. We do things this way so that we can use bitshifting to go from a state ID to an index into ‘matches’.

§stride2: usize

The exponent with a base 2, such that stride=2^stride2. Given a state index ‘i’, its state identifier is ‘i << stride2’. Given a state identifier ‘sid’, its state index is ‘sid >> stride2’.

§byte_classes: ByteClasses

The equivalence classes for this DFA. All transitions are defined on equivalence classes and not on the 256 distinct byte values.

§min_pattern_len: usize

The length of the shortest pattern in this automaton.

§max_pattern_len: usize

The length of the longest pattern in this automaton.

§special: Special

The information required to deduce which states are “special” in this DFA.

Implementations§

source§

impl DFA

source

pub fn new<I, P>(patterns: I) -> Result<DFA, BuildError>
where I: IntoIterator<Item = P>, P: AsRef<[u8]>,

Create a new Aho-Corasick DFA using the default configuration.

Use a Builder if you want to change the configuration.

source

pub fn builder() -> Builder

A convenience method for returning a new Aho-Corasick DFA builder.

This usually permits one to just import the DFA type.

source§

impl DFA

source

const DEAD: StateID = _

A sentinel state ID indicating that a search should stop once it has entered this state. When a search stops, it returns a match if one has been found, otherwise no match. A DFA always has an actual dead state at this ID.

N.B. DFAs, unlike NFAs, do not have any notion of a FAIL state. Namely, the whole point of a DFA is that the FAIL state is completely compiled away. That is, DFA construction involves pre-computing the failure transitions everywhere, such that failure transitions are no longer used at search time. This, combined with its uniformly dense representation, are the two most important factors in why it’s faster than the NFAs in this crate.

source

fn set_matches(&mut self, sid: StateID, pids: impl Iterator<Item = PatternID>)

Adds the given pattern IDs as matches to the given state and also records the added memory usage.

Trait Implementations§

source§

impl Automaton for DFA

source§

fn start_state(&self, anchored: Anchored) -> Result<StateID, MatchError>

Returns the starting state for the given anchor mode. Read more
source§

fn next_state(&self, _anchored: Anchored, sid: StateID, byte: u8) -> StateID

Performs a state transition from sid for byte and returns the next state. Read more
source§

fn is_special(&self, sid: StateID) -> bool

Returns true if the given ID represents a “special” state. A special state is a dead, match or start state. Read more
source§

fn is_dead(&self, sid: StateID) -> bool

Returns true if the given ID represents a dead state. Read more
source§

fn is_match(&self, sid: StateID) -> bool

Returns true if the given ID represents a match state. Read more
source§

fn is_start(&self, sid: StateID) -> bool

Returns true if the given ID represents a start state. Read more
source§

fn match_kind(&self) -> MatchKind

Returns the match semantics that this automaton was built with.
source§

fn patterns_len(&self) -> usize

Returns the total number of patterns compiled into this automaton.
source§

fn pattern_len(&self, pid: PatternID) -> usize

Returns the length of the pattern for the given ID. Read more
source§

fn min_pattern_len(&self) -> usize

Returns the length, in bytes, of the shortest pattern in this automaton.
source§

fn max_pattern_len(&self) -> usize

Returns the length, in bytes, of the longest pattern in this automaton.
source§

fn match_len(&self, sid: StateID) -> usize

Returns the total number of matches for the given state ID. Read more
source§

fn match_pattern(&self, sid: StateID, index: usize) -> PatternID

Returns the pattern ID for the match state given by sid at the index given. Read more
source§

fn memory_usage(&self) -> usize

Returns the heap memory usage, in bytes, used by this automaton.
source§

fn prefilter(&self) -> Option<&Prefilter>

Returns a prefilter, if available, that can be used to accelerate searches for this automaton. Read more
source§

fn try_find(&self, input: &Input<'_>) -> Result<Option<Match>, MatchError>

Executes a non-overlapping search with this automaton using the given configuration. Read more
source§

fn try_find_overlapping( &self, input: &Input<'_>, state: &mut OverlappingState, ) -> Result<(), MatchError>

Executes a overlapping search with this automaton using the given configuration. Read more
source§

fn try_find_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindIter<'a, 'h, Self>, MatchError>
where Self: Sized,

Returns an iterator of non-overlapping matches with this automaton using the given configuration. Read more
source§

fn try_find_overlapping_iter<'a, 'h>( &'a self, input: Input<'h>, ) -> Result<FindOverlappingIter<'a, 'h, Self>, MatchError>
where Self: Sized,

Returns an iterator of overlapping matches with this automaton using the given configuration. Read more
source§

fn try_replace_all<B>( &self, haystack: &str, replace_with: &[B], ) -> Result<String, MatchError>
where Self: Sized, B: AsRef<str>,

Replaces all non-overlapping matches in haystack with strings from replace_with depending on the pattern that matched. The replace_with slice must have length equal to Automaton::patterns_len. Read more
source§

fn try_replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Result<Vec<u8>, MatchError>
where Self: Sized, B: AsRef<[u8]>,

Replaces all non-overlapping matches in haystack with strings from replace_with depending on the pattern that matched. The replace_with slice must have length equal to Automaton::patterns_len. Read more
source§

fn try_replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, ) -> Result<(), MatchError>
where Self: Sized, F: FnMut(&Match, &str, &mut String) -> bool,

Replaces all non-overlapping matches in haystack by calling the replace_with closure given. Read more
source§

fn try_replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, ) -> Result<(), MatchError>
where Self: Sized, F: FnMut(&Match, &[u8], &mut Vec<u8>) -> bool,

Replaces all non-overlapping matches in haystack by calling the replace_with closure given. Read more
source§

fn try_stream_find_iter<'a, R: Read>( &'a self, rdr: R, ) -> Result<StreamFindIter<'a, Self, R>, MatchError>
where Self: Sized,

Returns an iterator of non-overlapping matches with this automaton from the stream given. Read more
source§

fn try_stream_replace_all<R, W, B>( &self, rdr: R, wtr: W, replace_with: &[B], ) -> Result<()>
where Self: Sized, R: Read, W: Write, B: AsRef<[u8]>,

Replaces all non-overlapping matches in rdr with strings from replace_with depending on the pattern that matched, and writes the result to wtr. The replace_with slice must have length equal to Automaton::patterns_len. Read more
source§

fn try_stream_replace_all_with<R, W, F>( &self, rdr: R, wtr: W, replace_with: F, ) -> Result<()>
where Self: Sized, R: Read, W: Write, F: FnMut(&Match, &[u8], &mut W) -> Result<()>,

Replaces all non-overlapping matches in rdr by calling the replace_with closure given and writing the result to wtr. Read more
source§

impl Clone for DFA

source§

fn clone(&self) -> DFA

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for DFA

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Sealed for DFA

Auto Trait Implementations§

§

impl Freeze for DFA

§

impl RefUnwindSafe for DFA

§

impl Send for DFA

§

impl Sync for DFA

§

impl Unpin for DFA

§

impl UnwindSafe for DFA

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.