regex_automata::util::determinize::state

Struct Repr

Help

struct Repr<'a>(&'a [u8]);

Expand description

Repr is a read-only view into the representation of a DFA state.

Primarily, a Repr is how we achieve DRY: we implement decoding the format in one place, and then use a Repr to implement the various methods on the public state types.

The format is as follows:

The first three bytes correspond to bitsets.

Byte 0 is a bitset corresponding to miscellaneous flags associated with the state. Bit 0 is set to 1 if the state is a match state. Bit 1 is set to 1 if the state has pattern IDs explicitly written to it. (This is a flag that is not meant to be set by determinization, but rather, is used as part of an internal space-saving optimization.) Bit 2 is set to 1 if the state was generated by a transition over a “word” byte. (Callers may not always set this. For example, if the NFA has no word boundary assertion, then needing to track whether a state came from a word byte or not is superfluous and wasteful.) Bit 3 is set to 1 if the state was generated by a transition from a \r (forward search) or a \n (reverse search) when CRLF mode is enabled.

Bytes 1..5 correspond to the look-behind assertions that were satisfied by the transition that created this state. (Look-ahead assertions are not tracked as part of states. Instead, these are applied by re-computing the epsilon closure of a state when computing the transition function. See next in the parent module.)

Bytes 5..9 correspond to the set of look-around assertions (including both look-behind and look-ahead) that appear somewhere in this state’s set of NFA state IDs. This is used to determine whether this state’s epsilon closure should be re-computed when computing the transition function. Namely, look-around assertions are “just” conditional epsilon transitions, so if there are new assertions available when computing the transition function, we should only re-compute the epsilon closure if those new assertions are relevant to this particular state.

Bytes 9..13 correspond to a 32-bit native-endian encoded integer corresponding to the number of patterns encoded in this state. If the state is not a match state (byte 0 bit 0 is 0) or if it’s only pattern ID is PatternID::ZERO, then no integer is encoded at this position. Instead, byte offset 3 is the position at which the first NFA state ID is encoded.

For a match state with at least one non-ZERO pattern ID, the next bytes correspond to a sequence of 32-bit native endian encoded integers that represent each pattern ID, in order, that this match state represents.

After the pattern IDs (if any), NFA state IDs are delta encoded as varints.[1] The first NFA state ID is encoded as itself, and each subsequent NFA state ID is encoded as the difference between itself and the previous NFA state ID.

[1] - https://developers.google.com/protocol-buffers/docs/encoding#varints

Tuple Fields§

§0: &'a [u8]

Struct ReprCopy item path

Tuple Fields§

Implementations§

impl<'a> Repr<'a>

fn is_match(&self) -> bool

fn has_pattern_ids(&self) -> bool

fn is_from_word(&self) -> bool

fn is_half_crlf(&self) -> bool

fn look_have(&self) -> LookSet

fn look_need(&self) -> LookSet

fn match_len(&self) -> usize

fn match_pattern(&self, index: usize) -> PatternID

fn match_pattern_ids(&self) -> Option<Vec<PatternID>>

fn iter_match_pattern_ids<F: FnMut(PatternID)>(&self, f: F)

fn iter_nfa_state_ids<F: FnMut(StateID)>(&self, f: F)

fn pattern_offset_end(&self) -> usize

fn encoded_pattern_len(&self) -> usize

Trait Implementations§

impl<'a> Debug for Repr<'a>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl<'a> Freeze for Repr<'a>

impl<'a> RefUnwindSafe for Repr<'a>

impl<'a> Send for Repr<'a>

impl<'a> Sync for Repr<'a>

impl<'a> Unpin for Repr<'a>

impl<'a> UnwindSafe for Repr<'a>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Repr

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,