Enum regex_automata::nfa::thompson::nfa::State

source ·

pub enum State {
    ByteRange {
        trans: Transition,
    },
    Sparse(SparseTransitions),
    Dense(DenseTransitions),
    Look {
        look: Look,
        next: StateID,
    },
    Union {
        alternates: Box<[StateID]>,
    },
    BinaryUnion {
        alt1: StateID,
        alt2: StateID,
    },
    Capture {
        next: StateID,
        pattern_id: PatternID,
        group_index: SmallIndex,
        slot: SmallIndex,
    },
    Fail,
    Match {
        pattern_id: PatternID,
    },
}

Expand description

A state in an NFA.

In theory, it can help to conceptualize an NFA as a graph consisting of States. Each State contains its complete set of outgoing transitions.

In practice, it can help to conceptualize an NFA as a sequence of instructions for a virtual machine. Each State says what to do and where to go next.

Strictly speaking, the practical interpretation is the most correct one, because of the Capture state. Namely, a Capture state always forwards execution to another state unconditionally. Its only purpose is to cause a side effect: the recording of the current input position at a particular location in memory. In this sense, an NFA has more power than a theoretical non-deterministic finite automaton.

For most uses of this crate, it is likely that one may never even need to be aware of this type at all. The main use cases for looking at States directly are if you need to write your own search implementation or if you need to do some kind of analysis on the NFA.

Variants§

§

ByteRange

Fields

§trans: Transition

The transition from this state to the next.

A state with a single transition that can only be taken if the current input symbol is in a particular range of bytes.

§

Sparse(SparseTransitions)

A state with possibly many transitions represented in a sparse fashion. Transitions are non-overlapping and ordered lexicographically by input range.

In practice, this is used for encoding UTF-8 automata. Its presence is primarily an optimization that avoids many additional unconditional epsilon transitions (via Union states), and thus decreases the overhead of traversing the NFA. This can improve both matching time and DFA construction time.

§

Dense(DenseTransitions)

A dense representation of a state with multiple transitions.

§

Look

Fields

§look: Look

The look-around assertion that must be satisfied before moving to next.

§next: StateID

The state to transition to if the look-around assertion is satisfied.

A conditional epsilon transition satisfied via some sort of look-around. Look-around is limited to anchor and word boundary assertions.

Look-around states are meant to be evaluated while performing epsilon closure (computing the set of states reachable from a particular state via only epsilon transitions). If the current position in the haystack satisfies the look-around assertion, then you’re permitted to follow that epsilon transition.

§

Union

Fields

§alternates: Box<[StateID]>

An ordered sequence of unconditional epsilon transitions to other states. Transitions earlier in the sequence are preferred over transitions later in the sequence.

An alternation such that there exists an epsilon transition to all states in alternates, where matches found via earlier transitions are preferred over later transitions.

§

BinaryUnion

Fields

§alt1: StateID

An unconditional epsilon transition to another NFA state. This is preferred over alt2.

§alt2: StateID

An unconditional epsilon transition to another NFA state. Matches reported via this transition should only be reported if no matches were found by following alt1.

An alternation such that there exists precisely two unconditional epsilon transitions, where matches found via alt1 are preferred over matches found via alt2.

This state exists as a common special case of Union where there are only two alternates. In this case, we don’t need any allocations to represent the state. This saves a bit of memory and also saves an additional memory access when traversing the NFA.

§

Capture

Fields

§next: StateID

The state to transition to, unconditionally.

§pattern_id: PatternID

The pattern ID that this capture belongs to.

§group_index: SmallIndex

The capture group index that this capture belongs to. Capture group indices are local to each pattern. For example, when capturing groups are enabled, every pattern has a capture group at index 0.

§slot: SmallIndex

The slot index for this capture. Every capturing group has two slots: one for the start haystack offset and one for the end haystack offset. Unlike capture group indices, slot indices are global across all patterns in this NFA. That is, each slot belongs to a single pattern, but there is only one slot at index i.

An empty state that records a capture location.

From the perspective of finite automata, this is precisely equivalent to an unconditional epsilon transition, but serves the purpose of instructing NFA simulations to record additional state when the finite state machine passes through this epsilon transition.

slot in this context refers to the specific capture group slot offset that is being recorded. Each capturing group has two slots corresponding to the start and end of the matching portion of that group.

The pattern ID and capture group index are also included in this state in case they are useful. But mostly, all you’ll need is next and slot.

§

Fail

A state that cannot be transitioned out of. This is useful for cases where you want to prevent matching from occurring. For example, if your regex parser permits empty character classes, then one could choose a Fail state to represent them. (An empty character class can be thought of as an empty set. Since nothing is in an empty set, they can never match anything.)

§

Match

Fields

§pattern_id: PatternID

The matching pattern ID.

A match state. There is at least one such occurrence of this state for each regex that can match that is in this NFA.

Enum regex_automata::nfa::thompson::nfa::State

Variants§

ByteRange

Fields

Sparse(SparseTransitions)

Dense(DenseTransitions)

Look

Fields

Union

Fields

BinaryUnion

Fields

Capture

Fields

Fail

Match

Fields

Implementations§

impl State

pub fn is_epsilon(&self) -> bool

fn memory_usage(&self) -> usize

fn remap(&mut self, remap: &[StateID])

Trait Implementations§

impl Clone for State

fn clone(&self) -> State

fn clone_from(&mut self, source: &Self)

impl Debug for State

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl PartialEq<State> for State

fn eq(&self, other: &State) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Eq for State

impl StructuralEq for State

impl StructuralPartialEq for State

Auto Trait Implementations§

impl RefUnwindSafe for State

impl Send for State

impl Sync for State

impl Unpin for State

impl UnwindSafe for State

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>