Enum regex_automata::nfa::thompson::builder::State

source ·
enum State {
    Empty {
        next: StateID,
    },
    ByteRange {
        trans: Transition,
    },
    Sparse {
        transitions: Vec<Transition>,
    },
    Look {
        look: Look,
        next: StateID,
    },
    CaptureStart {
        pattern_id: PatternID,
        group_index: SmallIndex,
        next: StateID,
    },
    CaptureEnd {
        pattern_id: PatternID,
        group_index: SmallIndex,
        next: StateID,
    },
    Union {
        alternates: Vec<StateID>,
    },
    UnionReverse {
        alternates: Vec<StateID>,
    },
    Fail,
    Match {
        pattern_id: PatternID,
    },
}
Expand description

An intermediate NFA state used during construction.

During construction of an NFA, it is often convenient to work with states that are amenable to mutation and other carry more information than we otherwise need once an NFA has been built. This type represents those needs.

Once construction is finished, the builder will convert these states to a nfa::thompson::State. This conversion not only results in a simpler representation, but in some cases, entire classes of states are completely removed (such as State::Empty).

Variants§

§

Empty

An empty state whose only purpose is to forward the automaton to another state via an unconditional epsilon transition.

Unconditional epsilon transitions are quite useful during the construction of an NFA, as they permit the insertion of no-op placeholders that make it easier to compose NFA sub-graphs. When the Thompson NFA builder produces a final NFA, all unconditional epsilon transitions are removed, and state identifiers are remapped accordingly.

Fields

§next: StateID

The next state that this state should transition to.

§

ByteRange

A state that only transitions to another state if the current input byte is in a particular range of bytes.

Fields

§

Sparse

A state with possibly many transitions, represented in a sparse fashion. Transitions must be ordered lexicographically by input range and be non-overlapping. As such, this may only be used when every transition has equal priority. (In practice, this is only used for encoding large UTF-8 automata.) In contrast, a Union state has each alternate in order of priority. Priority is used to implement greedy matching and also alternations themselves, e.g., abc|a where abc has priority over a.

To clarify, it is possible to remove Sparse and represent all things that Sparse is used for via Union. But this creates a more bloated NFA with more epsilon transitions than is necessary in the special case of character classes.

Fields

§transitions: Vec<Transition>
§

Look

A conditional epsilon transition satisfied via some sort of look-around.

Fields

§look: Look
§next: StateID
§

CaptureStart

An empty state that records the start of a capture location. This is an unconditional epsilon transition like Empty, except it can be used to record position information for a capture group when using the NFA for search.

Fields

§pattern_id: PatternID

The ID of the pattern that this capture was defined.

§group_index: SmallIndex

The capture group index that this capture state corresponds to. The capture group index is always relative to its corresponding pattern. Therefore, in the presence of multiple patterns, both the pattern ID and the capture group index are required to uniquely identify a capturing group.

§next: StateID

The next state that this state should transition to.

§

CaptureEnd

An empty state that records the end of a capture location. This is an unconditional epsilon transition like Empty, except it can be used to record position information for a capture group when using the NFA for search.

Fields

§pattern_id: PatternID

The ID of the pattern that this capture was defined.

§group_index: SmallIndex

The capture group index that this capture state corresponds to. The capture group index is always relative to its corresponding pattern. Therefore, in the presence of multiple patterns, both the pattern ID and the capture group index are required to uniquely identify a capturing group.

§next: StateID

The next state that this state should transition to.

§

Union

An alternation such that there exists an epsilon transition to all states in alternates, where matches found via earlier transitions are preferred over later transitions.

Fields

§alternates: Vec<StateID>
§

UnionReverse

An alternation such that there exists an epsilon transition to all states in alternates, where matches found via later transitions are preferred over earlier transitions.

This “reverse” state exists for convenience during compilation that permits easy construction of non-greedy combinations of NFA states. At the end of compilation, Union and UnionReverse states are merged into one Union type of state, where the latter has its epsilon transitions reversed to reflect the priority inversion.

The “convenience” here arises from the fact that as new states are added to the list of alternates, we would like that add operation to be amortized constant time. But if we used a Union, we’d need to prepend the state, which takes O(n) time. There are other approaches we could use to solve this, but this seems simple enough.

Fields

§alternates: Vec<StateID>
§

Fail

A state that cannot be transitioned out of. This is useful for cases where you want to prevent matching from occurring. For example, if your regex parser permits empty character classes, then one could choose a Fail state to represent it.

§

Match

A match state. There is at most one such occurrence of this state in an NFA for each pattern compiled into the NFA. At time of writing, a match state is always produced for every pattern given, but in theory, if a pattern can never lead to a match, then the match state could be omitted.

pattern_id refers to the ID of the pattern itself, which corresponds to the pattern’s index (starting at 0).

Fields

§pattern_id: PatternID

Implementations§

source§

impl State

source

fn goto(&self) -> Option<StateID>

If this state is an unconditional epsilon transition, then this returns the target of the transition.

source

fn memory_usage(&self) -> usize

Returns the heap memory usage, in bytes, of this state.

Trait Implementations§

source§

impl Clone for State

source§

fn clone(&self) -> State

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for State

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for State

source§

fn eq(&self, other: &State) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for State

source§

impl StructuralPartialEq for State

Auto Trait Implementations§

§

impl Freeze for State

§

impl RefUnwindSafe for State

§

impl Send for State

§

impl Sync for State

§

impl Unpin for State

§

impl UnwindSafe for State

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.