Struct Compiler

Help

pub struct Compiler {
    parser: ParserBuilder,
    config: Config,
    builder: RefCell<Builder>,
    utf8_state: RefCell<Utf8State>,
    trie_state: RefCell<RangeTrie>,
    utf8_suffix: RefCell<Utf8SuffixMap>,
}

Expand description

A builder for compiling an NFA from a regex’s high-level intermediate representation (HIR).

This compiler provides a way to translate a parsed regex pattern into an NFA state graph. The NFA state graph can either be used directly to execute a search (e.g., with a Pike VM), or it can be further used to build a DFA.

This compiler provides APIs both for compiling regex patterns directly from their concrete syntax, or via a regex_syntax::hir::Hir.

This compiler has various options that may be configured via thompson::Config.

Note that a compiler is not the same as a thompson::Builder. A Builder provides a lower level API that is uncoupled from a regex pattern’s concrete syntax or even its HIR. Instead, it permits stitching together an NFA by hand. See its docs for examples.

§Example: compilation from concrete syntax

This shows how to compile an NFA from a pattern string while setting a size limit on how big the NFA is allowed to be (in terms of bytes of heap used).

use regex_automata::{
    nfa::thompson::{NFA, pikevm::PikeVM},
    Match,
};

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build(r"(?-u)\w")?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(0, 3..4));
re.captures(&mut cache, "!@#A#@!", &mut caps);
assert_eq!(expected, caps.get_match());

§Example: compilation from HIR

This shows how to hand assemble a regular expression via its HIR, and then compile an NFA directly from it.

use regex_automata::{nfa::thompson::{NFA, pikevm::PikeVM}, Match};
use regex_syntax::hir::{Hir, Class, ClassBytes, ClassBytesRange};

let hir = Hir::class(Class::Bytes(ClassBytes::new(vec![
    ClassBytesRange::new(b'0', b'9'),
    ClassBytesRange::new(b'A', b'Z'),
    ClassBytesRange::new(b'_', b'_'),
    ClassBytesRange::new(b'a', b'z'),
])));

let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build_from_hir(&hir)?;

let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(0, 3..4));
re.captures(&mut cache, "!@#A#@!", &mut caps);
assert_eq!(expected, caps.get_match());

Fields§

§parser: ParserBuilder

A regex parser, used when compiling an NFA directly from a pattern string.

§config: Config

The compiler configuration.

§builder: RefCell<Builder>

The builder for actually constructing an NFA. This provides a convenient abstraction for writing a compiler.

§utf8_state: RefCell<Utf8State>

State used for compiling character classes to UTF-8 byte automata. State is not retained between character class compilations. This just serves to amortize allocation to the extent possible.

§trie_state: RefCell<RangeTrie>

State used for arranging character classes in reverse into a trie.

§utf8_suffix: RefCell<Utf8SuffixMap>

State used for caching common suffixes when compiling reverse UTF-8 automata (for Unicode character classes).

Struct CompilerCopy item path

§Example: compilation from concrete syntax

§Example: compilation from HIR

Fields§

Implementations§

impl Compiler

pub fn new() -> Compiler

pub fn build(&self, pattern: &str) -> Result<NFA, BuildError>

§Example

pub fn build_many<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<NFA, BuildError>

§Example

pub fn build_from_hir(&self, expr: &Hir) -> Result<NFA, BuildError>

§Example

pub fn build_many_from_hir<H: Borrow<Hir>>( &self, exprs: &[H], ) -> Result<NFA, BuildError>

§Example

pub fn configure(&mut self, config: Config) -> &mut Compiler

§Example

pub fn syntax(&mut self, config: Config) -> &mut Compiler

§Example

impl Compiler

fn compile<H: Borrow<Hir>>(&self, exprs: &[H]) -> Result<NFA, BuildError>

fn c(&self, expr: &Hir) -> Result<ThompsonRef, BuildError>

fn c_concat<I>(&self, it: I) -> Result<ThompsonRef, BuildError>where I: DoubleEndedIterator<Item = Result<ThompsonRef, BuildError>>,

fn c_alt_slice(&self, exprs: &[Hir]) -> Result<ThompsonRef, BuildError>

fn c_alt_iter<I>(&self, it: I) -> Result<ThompsonRef, BuildError>where I: Iterator<Item = Result<ThompsonRef, BuildError>>,

fn c_cap( &self, index: u32, name: Option<&str>, expr: &Hir, ) -> Result<ThompsonRef, BuildError>

fn c_repetition(&self, rep: &Repetition) -> Result<ThompsonRef, BuildError>

fn c_bounded( &self, expr: &Hir, greedy: bool, min: u32, max: u32, ) -> Result<ThompsonRef, BuildError>

fn c_at_least( &self, expr: &Hir, greedy: bool, n: u32, ) -> Result<ThompsonRef, BuildError>

fn c_zero_or_one( &self, expr: &Hir, greedy: bool, ) -> Result<ThompsonRef, BuildError>

fn c_exactly(&self, expr: &Hir, n: u32) -> Result<ThompsonRef, BuildError>

fn c_byte_class(&self, cls: &ClassBytes) -> Result<ThompsonRef, BuildError>

fn c_unicode_class(&self, cls: &ClassUnicode) -> Result<ThompsonRef, BuildError>

fn c_unicode_class_reverse_with_suffix( &self, cls: &ClassUnicode, ) -> Result<ThompsonRef, BuildError>

fn c_look(&self, anchor: &Look) -> Result<ThompsonRef, BuildError>

fn c_literal(&self, bytes: &[u8]) -> Result<ThompsonRef, BuildError>

fn c_range(&self, start: u8, end: u8) -> Result<ThompsonRef, BuildError>

fn c_empty(&self) -> Result<ThompsonRef, BuildError>

fn c_fail(&self) -> Result<ThompsonRef, BuildError>

fn patch(&self, from: StateID, to: StateID) -> Result<(), BuildError>

fn start_pattern(&self) -> Result<PatternID, BuildError>

fn finish_pattern(&self, start_id: StateID) -> Result<PatternID, BuildError>

fn add_empty(&self) -> Result<StateID, BuildError>

fn add_range(&self, start: u8, end: u8) -> Result<StateID, BuildError>

fn add_sparse(&self, ranges: Vec<Transition>) -> Result<StateID, BuildError>

fn add_look(&self, look: Look) -> Result<StateID, BuildError>

fn add_union(&self) -> Result<StateID, BuildError>

fn add_union_reverse(&self) -> Result<StateID, BuildError>

fn add_capture_start( &self, capture_index: u32, name: Option<&str>, ) -> Result<StateID, BuildError>

fn add_capture_end(&self, capture_index: u32) -> Result<StateID, BuildError>

fn add_fail(&self) -> Result<StateID, BuildError>

fn add_match(&self) -> Result<StateID, BuildError>

fn is_reverse(&self) -> bool

Trait Implementations§

impl Clone for Compiler

fn clone(&self) -> Compiler

fn clone_from(&mut self, source: &Self)

impl Debug for Compiler

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl !Freeze for Compiler

impl !RefUnwindSafe for Compiler

impl Send for Compiler

impl !Sync for Compiler

impl Unpin for Compiler

impl UnwindSafe for Compiler

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

Struct Compiler

fn c_concat<I>(&self, it: I) -> Result<ThompsonRef, BuildError>
where I: DoubleEndedIterator<Item = Result<ThompsonRef, BuildError>>,

fn c_alt_iter<I>(&self, it: I) -> Result<ThompsonRef, BuildError>
where I: Iterator<Item = Result<ThompsonRef, BuildError>>,

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,