Module regex_automata::util::determinize
source · Expand description
This module contains types and routines for implementing determinization.
In this crate, there are at least two places where we implement
determinization: fully ahead-of-time compiled DFAs in the dfa
module and
lazily compiled DFAs in the hybrid
module. The stuff in this module
corresponds to the things that are in common between these implementations.
There are three broad things that our implementations of determinization have in common, as defined by this module:
- The classification of start states. That is, whether we’re dealing with word boundaries, line boundaries, etc., is all the same. This also includes the look-behind assertions that are satisfied by each starting state classification.
- The representation of DFA states as sets of NFA states, including convenience types for building these DFA states that are amenable to reusing allocations.
- Routines for the “classical” parts of determinization: computing the epsilon closure, tracking match states (with corresponding pattern IDs, since we support multi-pattern finite automata) and, of course, computing the transition function between states for units of input.
I did consider a couple of alternatives to this particular form of code reuse:
-
Don’t do any code reuse. The problem here is that we really want both forms of determinization to do exactly identical things when it comes to their handling of NFA states. While our tests generally ensure this, the code is tricky and large enough where not reusing code is a pretty big bummer.
-
Implement all of determinization once and make it generic over fully compiled DFAs and lazily compiled DFAs. While I didn’t actually try this approach, my instinct is that it would be more complex than is needed here. And the interface required would be pretty hairy. Instead, I think splitting it into logical sub-components works better.
Modules§
- state 🔒This module defines a DFA state representation and builders for constructing DFA states.
Functions§
- Add the NFA state IDs in the given
set
to the given DFA builder state. The order in which states are added corresponds to the order in which they were added toset
. - Compute the epsilon closure for the given NFA state. The epsilon closure consists of all NFA state IDs, including
start_nfa_id
, that can be reached fromstart_nfa_id
without consuming any input. These state IDs are written toset
in the order they are visited, but only if they are not already inset
.start_nfa_id
must be a valid state ID for the NFA given. - next 🔒Compute the set of all reachable NFA states, including the full epsilon closure, from a DFA state for a single unit of input. The set of reachable states is returned as a
StateBuilderNFA
. TheStateBuilderNFA
returned also includes any look-behind assertions satisfied byunit
, in addition to whether it is a match state. For multi-pattern DFAs, the builder will also include the pattern IDs that match (in the order seen). - Sets the appropriate look-behind assertions on the given state based on this starting configuration.