pub enum State {
    ByteRange {
        trans: Transition,
    },
    Sparse(SparseTransitions),
    Dense(DenseTransitions),
    Look {
        look: Look,
        next: StateID,
    },
    Union {
        alternates: Box<[StateID]>,
    },
    BinaryUnion {
        alt1: StateID,
        alt2: StateID,
    },
    Capture {
        next: StateID,
        pattern_id: PatternID,
        group_index: SmallIndex,
        slot: SmallIndex,
    },
    Fail,
    Match {
        pattern_id: PatternID,
    },
}
Expand description

A state in an NFA.

In theory, it can help to conceptualize an NFA as a graph consisting of States. Each State contains its complete set of outgoing transitions.

In practice, it can help to conceptualize an NFA as a sequence of instructions for a virtual machine. Each State says what to do and where to go next.

Strictly speaking, the practical interpretation is the most correct one, because of the Capture state. Namely, a Capture state always forwards execution to another state unconditionally. Its only purpose is to cause a side effect: the recording of the current input position at a particular location in memory. In this sense, an NFA has more power than a theoretical non-deterministic finite automaton.

For most uses of this crate, it is likely that one may never even need to be aware of this type at all. The main use cases for looking at States directly are if you need to write your own search implementation or if you need to do some kind of analysis on the NFA.

Variants§

§

ByteRange

Fields

§trans: Transition

The transition from this state to the next.

A state with a single transition that can only be taken if the current input symbol is in a particular range of bytes.

§

Sparse(SparseTransitions)

A state with possibly many transitions represented in a sparse fashion. Transitions are non-overlapping and ordered lexicographically by input range.

In practice, this is used for encoding UTF-8 automata. Its presence is primarily an optimization that avoids many additional unconditional epsilon transitions (via Union states), and thus decreases the overhead of traversing the NFA. This can improve both matching time and DFA construction time.

§

Dense(DenseTransitions)

A dense representation of a state with multiple transitions.

§

Look

Fields

§look: Look

The look-around assertion that must be satisfied before moving to next.

§next: StateID

The state to transition to if the look-around assertion is satisfied.

A conditional epsilon transition satisfied via some sort of look-around. Look-around is limited to anchor and word boundary assertions.

Look-around states are meant to be evaluated while performing epsilon closure (computing the set of states reachable from a particular state via only epsilon transitions). If the current position in the haystack satisfies the look-around assertion, then you’re permitted to follow that epsilon transition.

§

Union

Fields

§alternates: Box<[StateID]>

An ordered sequence of unconditional epsilon transitions to other states. Transitions earlier in the sequence are preferred over transitions later in the sequence.

An alternation such that there exists an epsilon transition to all states in alternates, where matches found via earlier transitions are preferred over later transitions.

§

BinaryUnion

Fields

§alt1: StateID

An unconditional epsilon transition to another NFA state. This is preferred over alt2.

§alt2: StateID

An unconditional epsilon transition to another NFA state. Matches reported via this transition should only be reported if no matches were found by following alt1.

An alternation such that there exists precisely two unconditional epsilon transitions, where matches found via alt1 are preferred over matches found via alt2.

This state exists as a common special case of Union where there are only two alternates. In this case, we don’t need any allocations to represent the state. This saves a bit of memory and also saves an additional memory access when traversing the NFA.

§

Capture

Fields

§next: StateID

The state to transition to, unconditionally.

§pattern_id: PatternID

The pattern ID that this capture belongs to.

§group_index: SmallIndex

The capture group index that this capture belongs to. Capture group indices are local to each pattern. For example, when capturing groups are enabled, every pattern has a capture group at index 0.

§slot: SmallIndex

The slot index for this capture. Every capturing group has two slots: one for the start haystack offset and one for the end haystack offset. Unlike capture group indices, slot indices are global across all patterns in this NFA. That is, each slot belongs to a single pattern, but there is only one slot at index i.

An empty state that records a capture location.

From the perspective of finite automata, this is precisely equivalent to an unconditional epsilon transition, but serves the purpose of instructing NFA simulations to record additional state when the finite state machine passes through this epsilon transition.

slot in this context refers to the specific capture group slot offset that is being recorded. Each capturing group has two slots corresponding to the start and end of the matching portion of that group.

The pattern ID and capture group index are also included in this state in case they are useful. But mostly, all you’ll need is next and slot.

§

Fail

A state that cannot be transitioned out of. This is useful for cases where you want to prevent matching from occurring. For example, if your regex parser permits empty character classes, then one could choose a Fail state to represent them. (An empty character class can be thought of as an empty set. Since nothing is in an empty set, they can never match anything.)

§

Match

Fields

§pattern_id: PatternID

The matching pattern ID.

A match state. There is at least one such occurrence of this state for each regex that can match that is in this NFA.

Implementations§

source§

impl State

source

pub fn is_epsilon(&self) -> bool

Returns true if and only if this state contains one or more epsilon transitions.

In practice, a state has no outgoing transitions (like Match), has only non-epsilon transitions (like ByteRange) or has only epsilon transitions (like Union).

Example
use regex_automata::{
    nfa::thompson::{State, Transition},
    util::primitives::{PatternID, StateID, SmallIndex},
};

// Capture states are epsilon transitions.
let state = State::Capture {
    next: StateID::ZERO,
    pattern_id: PatternID::ZERO,
    group_index: SmallIndex::ZERO,
    slot: SmallIndex::ZERO,
};
assert!(state.is_epsilon());

// ByteRange states are not.
let state = State::ByteRange {
    trans: Transition { start: b'a', end: b'z', next: StateID::ZERO },
};
assert!(!state.is_epsilon());
source

fn memory_usage(&self) -> usize

Returns the heap memory usage of this NFA state in bytes.

source

fn remap(&mut self, remap: &[StateID])

Remap the transitions in this state using the given map. Namely, the given map should be indexed according to the transitions currently in this state.

This is used during the final phase of the NFA compiler, which turns its intermediate NFA into the final NFA.

Trait Implementations§

source§

impl Clone for State

source§

fn clone(&self) -> State

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for State

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq<State> for State

source§

fn eq(&self, other: &State) -> bool

This method tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

This method tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for State

source§

impl StructuralEq for State

source§

impl StructuralPartialEq for State

Auto Trait Implementations§

§

impl RefUnwindSafe for State

§

impl Send for State

§

impl Sync for State

§

impl Unpin for State

§

impl UnwindSafe for State

Blanket Implementations§

source§

impl<T> Any for Twhere T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for Twhere T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for Twhere T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for Twhere U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for Twhere T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.