Struct regex_automata::nfa::thompson::range_trie::RangeTrie
source · pub struct RangeTrie {
states: Vec<State>,
free: Vec<State>,
iter_stack: RefCell<Vec<NextIter>>,
iter_ranges: RefCell<Vec<Utf8Range>>,
dupe_stack: Vec<NextDupe>,
insert_stack: Vec<NextInsert>,
}
Expand description
A range trie represents an ordered set of sequences of bytes.
A range trie accepts as input a sequence of byte ranges and merges them into the existing set such that the trie can produce a sorted non-overlapping sequence of byte ranges. The sequence emitted corresponds precisely to the sequence of bytes matched by the given keys, although the byte ranges themselves may be split at different boundaries.
The order complexity of this data structure seems difficult to analyze. If the size of a byte is held as a constant, then insertion is clearly O(n) where n is the number of byte ranges in the input key. However, if k=256 is our alphabet size, then insertion could be O(k^2 * n). In particular it seems possible for pathological inputs to cause insertion to do a lot of work. However, for what we use this data structure for, there should be no pathological inputs since the ultimate source is always a sorted set of Unicode scalar value ranges.
Internally, this trie is setup like a finite state machine. Note though that it is acyclic.
Fields§
§states: Vec<State>
The states in this trie. The first is always the shared final state. The second is always the root state. Otherwise, there is no particular order.
free: Vec<State>
A free-list of states. When a range trie is cleared, all of its states are added to this list. Creating a new state reuses states from this list before allocating a new one.
iter_stack: RefCell<Vec<NextIter>>
A stack for traversing this trie to yield sequences of byte ranges in lexicographic order.
iter_ranges: RefCell<Vec<Utf8Range>>
A buffer that stores the current sequence during iteration.
dupe_stack: Vec<NextDupe>
A stack used for traversing the trie in order to (deeply) duplicate a state. States are recursively duplicated when ranges are split.
insert_stack: Vec<NextInsert>
A stack used for traversing the trie during insertion of a new sequence of byte ranges.
Implementations§
source§impl RangeTrie
impl RangeTrie
sourcepub fn clear(&mut self)
pub fn clear(&mut self)
Clear this range trie such that it is empty. Clearing a range trie and reusing it can beneficial because this may reuse allocations.
sourcepub fn iter<E, F: FnMut(&[Utf8Range]) -> Result<(), E>>(
&self,
f: F,
) -> Result<(), E>
pub fn iter<E, F: FnMut(&[Utf8Range]) -> Result<(), E>>( &self, f: F, ) -> Result<(), E>
Iterate over all of the sequences of byte ranges in this trie, and call the provided function for each sequence. Iteration occurs in lexicographic order.
sourcepub fn insert(&mut self, ranges: &[Utf8Range])
pub fn insert(&mut self, ranges: &[Utf8Range])
Inserts a new sequence of ranges into this trie.
The sequence given must be non-empty and must not have a length exceeding 4.
pub fn add_empty(&mut self) -> StateID
sourcefn duplicate(&mut self, old_id: StateID) -> StateID
fn duplicate(&mut self, old_id: StateID) -> StateID
Performs a deep clone of the given state and returns the duplicate’s state ID.
A “deep clone” in this context means that the state given along with recursively all states that it points to are copied. Once complete, the given state ID and the returned state ID share nothing.
This is useful during range trie insertion when a new range overlaps with an existing range that is bigger than the new one. The part of the existing range that does not overlap with the new one is duplicated so that adding the new range to the overlap doesn’t disturb the non-overlapping portion.
There’s one exception: if old_id is the final state, then it is not duplicated and the same final state is returned. This is because all final states in this trie are equivalent.
sourcefn add_transition(
&mut self,
from_id: StateID,
range: Utf8Range,
next_id: StateID,
)
fn add_transition( &mut self, from_id: StateID, range: Utf8Range, next_id: StateID, )
Adds the given transition to the given state.
Callers must ensure that all previous transitions in this state are lexicographically smaller than the given range.
sourcefn add_transition_at(
&mut self,
i: usize,
from_id: StateID,
range: Utf8Range,
next_id: StateID,
)
fn add_transition_at( &mut self, i: usize, from_id: StateID, range: Utf8Range, next_id: StateID, )
Like add_transition
, except this inserts the transition just before
the ith transition.
sourcefn set_transition_at(
&mut self,
i: usize,
from_id: StateID,
range: Utf8Range,
next_id: StateID,
)
fn set_transition_at( &mut self, i: usize, from_id: StateID, range: Utf8Range, next_id: StateID, )
Overwrites the transition at position i with the given transition.