Struct regex_automata::dfa::onepass::InternalBuilder

source ·
struct InternalBuilder<'a> {
    dfa: DFA,
    uncompiled_nfa_ids: Vec<StateID>,
    nfa_to_dfa_id: Vec<StateID>,
    stack: Vec<(StateID, Epsilons)>,
    seen: SparseSet,
    matched: bool,
    config: Config,
    nfa: &'a NFA,
    classes: ByteClasses,
}
Expand description

An internal builder for encapsulating the state necessary to build a one-pass DFA. Typical use is just InternalBuilder::new(..).build().

There is no separate pass for determining whether the NFA is one-pass or not. We just try to build the DFA. If during construction we discover that it is not one-pass, we bail out. This is likely to lead to some undesirable expense in some cases, so it might make sense to try an identify common patterns in the NFA that make it definitively not one-pass. That way, we can avoid ever trying to build a one-pass DFA in the first place. For example, ‘\w*\s’ is not one-pass, and since ‘\w’ is Unicode-aware by default, it’s probably not a trivial cost to try and build a one-pass DFA for it and then fail.

Note that some (immutable) fields are duplicated here. For example, the ‘nfa’ and ‘classes’ fields are both in the ‘DFA’. They are the same thing, but we duplicate them because it makes composition easier below. Otherwise, since the borrow checker can’t see through method calls, the mutable borrow we use to mutate the DFA winds up preventing borrowing from any other part of the DFA, even though we aren’t mutating those parts. We only do this because the duplication is cheap.

Fields§

§dfa: DFA

The DFA we’re building.

§uncompiled_nfa_ids: Vec<StateID>

An unordered collection of NFA state IDs that we haven’t yet tried to build into a DFA state yet.

This collection does not ultimately wind up including every NFA state ID. Instead, each ID represents a “start” state for a sub-graph of the NFA. The set of NFA states we then use to build a DFA state consists of that “start” state and all states reachable from it via epsilon transitions.

§nfa_to_dfa_id: Vec<StateID>

A map from NFA state ID to DFA state ID. This is useful for easily determining whether an NFA state has been used as a “starting” point to build a DFA state yet. If it hasn’t, then it is mapped to DEAD, and since DEAD is specially added and never corresponds to any NFA state, it follows that a mapping to DEAD implies the NFA state has no corresponding DFA state yet.

§stack: Vec<(StateID, Epsilons)>

A stack used to traverse the NFA states that make up a single DFA state. Traversal occurs until the stack is empty, and we only push to the stack when the state ID isn’t in ‘seen’. Actually, even more than that, if we try to push something on to this stack that is already in ‘seen’, then we bail out on construction completely, since it implies that the NFA is not one-pass.

§seen: SparseSet

The set of NFA states that we’ve visited via ‘stack’.

§matched: bool

Whether a match NFA state has been observed while constructing a one-pass DFA state. Once a match state is seen, assuming we are using leftmost-first match semantics, then we don’t add any more transitions to the DFA state we’re building.

§config: Config

The config passed to the builder.

This is duplicated in dfa.config.

§nfa: &'a NFA

The NFA we’re building a one-pass DFA from.

This is duplicated in dfa.nfa.

§classes: ByteClasses

The equivalence classes that make up the alphabet for this DFA>

This is duplicated in dfa.classes.

Implementations§

source§

impl<'a> InternalBuilder<'a>

source

fn new(config: Config, nfa: &'a NFA) -> InternalBuilder<'a>

Create a new builder with an initial empty DFA.

source

fn build(self) -> Result<DFA, BuildError>

Build the DFA from the NFA given to this builder. If the NFA is not one-pass, then return an error. An error may also be returned if a particular limit is exceeded. (Some limits, like the total heap memory used, are configurable. Others, like the total patterns or slots, are hard-coded based on representational limitations.)

source

fn shuffle_states(&mut self)

Shuffle all match states to the end of the transition table and set ‘min_match_id’ to the ID of the first such match state.

The point of this is to make it extremely cheap to determine whether a state is a match state or not. We need to check on this on every transition during a search, so it being cheap is important. This permits us to check it by simply comparing two state identifiers, as opposed to looking for the pattern ID in the state’s PatternEpsilons. (Which requires a memory load and some light arithmetic.)

source

fn compile_transition( &mut self, dfa_id: StateID, trans: &Transition, epsilons: Epsilons, ) -> Result<(), BuildError>

Compile the given NFA transition into the DFA state given.

‘Epsilons’ corresponds to any conditional epsilon transitions that need to be satisfied to follow this transition, and any slots that need to be saved if the transition is followed.

If this transition indicates that the NFA is not one-pass, then this returns an error. (This occurs, for example, if the DFA state already has a transition defined for the same input symbols as the given transition, and the result of the old and new transitions is different.)

source

fn add_start_state( &mut self, pid: Option<PatternID>, nfa_id: StateID, ) -> Result<StateID, BuildError>

Add a start state to the DFA corresponding to the given NFA starting state ID.

If adding a state would blow any limits (configured or hard-coded), then an error is returned.

If the starting state is an anchored state for a particular pattern, then callers must provide the pattern ID for that starting state. Callers must also ensure that the first starting state added is the start state for all patterns, and then each anchored starting state for each pattern (if necessary) added in order. Otherwise, this panics.

source

fn add_dfa_state_for_nfa_state( &mut self, nfa_id: StateID, ) -> Result<StateID, BuildError>

Add a new DFA state corresponding to the given NFA state. If adding a state would blow any limits (configured or hard-coded), then an error is returned. If a DFA state already exists for the given NFA state, then that DFA state’s ID is returned and no new states are added.

It is not expected that this routine is called for every NFA state. Instead, an NFA state ID will usually correspond to the “start” state for a sub-graph of the NFA, where all states in the sub-graph are reachable via epsilon transitions (conditional or unconditional). That sub-graph of NFA states is ultimately what produces a single DFA state.

source

fn add_empty_state(&mut self) -> Result<StateID, BuildError>

Unconditionally add a new empty DFA state. If adding it would exceed any limits (configured or hard-coded), then an error is returned. The ID of the new state is returned on success.

The added state is not a match state.

source

fn stack_push( &mut self, nfa_id: StateID, epsilons: Epsilons, ) -> Result<(), BuildError>

Push the given NFA state ID and its corresponding epsilons (slots and conditional epsilon transitions) on to a stack for use in a depth first traversal of a sub-graph of the NFA.

If the given NFA state ID has already been pushed on to the stack, then it indicates the regex is not one-pass and this correspondingly returns an error.

Trait Implementations§

source§

impl<'a> Debug for InternalBuilder<'a>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'a> Freeze for InternalBuilder<'a>

§

impl<'a> RefUnwindSafe for InternalBuilder<'a>

§

impl<'a> Send for InternalBuilder<'a>

§

impl<'a> Sync for InternalBuilder<'a>

§

impl<'a> Unpin for InternalBuilder<'a>

§

impl<'a> UnwindSafe for InternalBuilder<'a>

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.