Struct regex_syntax::hir::literal::PreferenceTrie
source · struct PreferenceTrie {
states: Vec<State>,
matches: Vec<Option<NonZeroUsize>>,
next_literal_index: usize,
}
Expand description
A “preference” trie that rejects literals that will never match when executing a leftmost first or “preference” search.
For example, if ‘sam’ is inserted, then trying to insert ‘samwise’ will be rejected because ‘samwise’ can never match since ‘sam’ will always take priority. However, if ‘samwise’ is inserted first, then inserting ‘sam’ after it is accepted. In this case, either ‘samwise’ or ‘sam’ can match in a “preference” search.
Note that we only use this trie as a “set.” That is, given a sequence of
literals, we insert each one in order. An insert
will reject a literal
if a prefix of that literal already exists in the trie. Thus, to rebuild
the “minimal” sequence, we simply only keep literals that were successfully
inserted. (Since we don’t need traversal, one wonders whether we can make
some simplifications here, but I haven’t given it a ton of thought and I’ve
never seen this show up on a profile. Because of the heuristic limits
imposed on literal extractions, the size of the inputs here is usually
very small.)
Fields§
§states: Vec<State>
The states in this trie. The index of a state in this vector is its ID.
matches: Vec<Option<NonZeroUsize>>
This vec indicates which states are match states. It always has
the same length as states
and is indexed by the same state ID.
A state with identifier sid
is a match state if and only if
matches[sid].is_some()
. The option contains the index of the literal
corresponding to the match. The index is offset by 1 so that it fits in
a NonZeroUsize.
next_literal_index: usize
The index to allocate to the next literal added to this trie. Starts at 1 and increments by 1 for every literal successfully added to the trie.
Implementations§
source§impl PreferenceTrie
impl PreferenceTrie
sourcefn minimize(literals: &mut Vec<Literal>, keep_exact: bool)
fn minimize(literals: &mut Vec<Literal>, keep_exact: bool)
Minimizes the given sequence of literals while preserving preference order semantics.
When keep_exact
is true, the exactness of every literal retained is
kept. This is useful when dealing with a fully extracted Seq
that
only contains exact literals. In that case, we can keep all retained
literals as exact because we know we’ll never need to match anything
after them and because any removed literals are guaranteed to never
match.
sourcefn insert(&mut self, bytes: &[u8]) -> Result<usize, usize>
fn insert(&mut self, bytes: &[u8]) -> Result<usize, usize>
Returns Ok
if the given byte string is accepted into this trie and
Err
otherwise. The index for the success case corresponds to the
index of the literal added. The index for the error case corresponds to
the index of the literal already in the trie that prevented the given
byte string from being added. (Which implies it is a prefix of the one
given.)
In short, the byte string given is accepted into the trie if and only if it is possible for it to match when executing a preference order search.
sourcefn create_state(&mut self) -> usize
fn create_state(&mut self) -> usize
Creates a new empty state and returns its ID.