Struct regex_automata::dfa::onepass::Builder
source · pub struct Builder {
config: Config,
thompson: Compiler,
}
Expand description
A builder for a one-pass DFA.
This builder permits configuring options for the syntax of a pattern, the NFA construction and the DFA construction. This builder is different from a general purpose regex builder in that it permits fine grain configuration of the construction process. The trade off for this is complexity, and the possibility of setting a configuration that might not make sense. For example, there are two different UTF-8 modes:
syntax::Config::utf8
controls whether the pattern itself can contain sub-expressions that match invalid UTF-8.thompson::Config::utf8
controls whether empty matches that split a Unicode codepoint are reported or not.
Generally speaking, callers will want to either enable all of these or disable all of these.
§Example
This example shows how to disable UTF-8 mode in the syntax and the NFA. This is generally what you want for matching on arbitrary bytes.
use regex_automata::{
dfa::onepass::DFA,
nfa::thompson,
util::syntax,
Match,
};
let re = DFA::builder()
.syntax(syntax::Config::new().utf8(false))
.thompson(thompson::Config::new().utf8(false))
.build(r"foo(?-u:[^b])ar.*")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = b"foo\xFFarzz\xE2\x98\xFF\n";
re.captures(&mut cache, haystack, &mut caps);
// Notice that `(?-u:[^b])` matches invalid UTF-8,
// but the subsequent `.*` does not! Disabling UTF-8
// on the syntax permits this.
//
// N.B. This example does not show the impact of
// disabling UTF-8 mode on a one-pass DFA Config,
// since that only impacts regexes that can
// produce matches of length 0.
assert_eq!(Some(Match::must(0, 0..8)), caps.get_match());
Fields§
§config: Config
§thompson: Compiler
Implementations§
source§impl Builder
impl Builder
sourcepub fn build(&self, pattern: &str) -> Result<DFA, BuildError>
pub fn build(&self, pattern: &str) -> Result<DFA, BuildError>
Build a one-pass DFA from the given pattern.
If there was a problem parsing or compiling the pattern, then an error is returned.
sourcepub fn build_many<P: AsRef<str>>(
&self,
patterns: &[P],
) -> Result<DFA, BuildError>
pub fn build_many<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<DFA, BuildError>
Build a one-pass DFA from the given patterns.
When matches are returned, the pattern ID corresponds to the index of the pattern in the slice given.
sourcepub fn build_from_nfa(&self, nfa: NFA) -> Result<DFA, BuildError>
pub fn build_from_nfa(&self, nfa: NFA) -> Result<DFA, BuildError>
Build a DFA from the given NFA.
§Example
This example shows how to build a DFA if you already have an NFA in hand.
use regex_automata::{dfa::onepass::DFA, nfa::thompson::NFA, Match};
// This shows how to set non-default options for building an NFA.
let nfa = NFA::compiler()
.configure(NFA::config().shrink(true))
.build(r"[a-z0-9]+")?;
let re = DFA::builder().build_from_nfa(nfa)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "foo123bar", &mut caps);
assert_eq!(Some(Match::must(0, 0..9)), caps.get_match());
sourcepub fn configure(&mut self, config: Config) -> &mut Builder
pub fn configure(&mut self, config: Config) -> &mut Builder
Apply the given one-pass DFA configuration options to this builder.
sourcepub fn syntax(&mut self, config: Config) -> &mut Builder
pub fn syntax(&mut self, config: Config) -> &mut Builder
Set the syntax configuration for this builder using
syntax::Config
.
This permits setting things like case insensitivity, Unicode and multi line mode.
These settings only apply when constructing a one-pass DFA directly from a pattern.
sourcepub fn thompson(&mut self, config: Config) -> &mut Builder
pub fn thompson(&mut self, config: Config) -> &mut Builder
Set the Thompson NFA configuration for this builder using
nfa::thompson::Config
.
This permits setting things like whether additional time should be spent shrinking the size of the NFA.
These settings only apply when constructing a DFA directly from a pattern.