Struct Builder

Source

pub struct Builder {
    pattern_id: Option<PatternID>,
    states: Vec<State>,
    start_pattern: Vec<StateID>,
    captures: Vec<Vec<Option<Arc<str>>>>,
    memory_states: usize,
    utf8: bool,
    reverse: bool,
    look_matcher: LookMatcher,
    size_limit: Option<usize>,
}

Expand description

An abstraction for building Thompson NFAs by hand.

A builder is what a thompson::Compiler uses internally to translate a regex’s high-level intermediate representation into an NFA.

The primary function of this builder is to abstract away the internal representation of an NFA and make it difficult to produce NFAs are that internally invalid or inconsistent. This builder also provides a way to add “empty” states (which can be thought of as unconditional epsilon transitions), despite the fact that thompson::State does not have any “empty” representation. The advantage of “empty” states is that they make the code for constructing a Thompson NFA logically simpler.

Many of the routines on this builder may panic or return errors. Generally speaking, panics occur when an invalid sequence of method calls were made, where as an error occurs if things get too big. (Where “too big” might mean exhausting identifier space or using up too much heap memory in accordance with the configured size_limit.)

§Overview

§Adding multiple patterns

Each pattern you add to an NFA should correspond to a pair of Builder::start_pattern and Builder::finish_pattern calls, with calls inbetween that add NFA states for that pattern. NFA states may be added without first calling start_pattern, with the exception of adding capturing states.

§Adding NFA states

Here is a very brief overview of each of the methods that add NFA states. Every method adds a single state.

add_empty: Add a state with a single unconditional epsilon transition to another state.
add_union: Adds a state with unconditional epsilon transitions to two or more states, with earlier transitions preferred over later ones.
add_union_reverse: Adds a state with unconditional epsilon transitions to two or more states, with later transitions preferred over earlier ones.
add_range: Adds a state with a single transition to another state that can only be followed if the current input byte is within the range given.
add_sparse: Adds a state with two or more range transitions to other states, where a transition is only followed if the current input byte is within one of the ranges. All transitions in this state have equal priority, and the corresponding ranges must be non-overlapping.
add_look: Adds a state with a single conditional epsilon transition to another state, where the condition depends on a limited look-around property.
add_capture_start: Adds a state with a single unconditional epsilon transition that also instructs an NFA simulation to record the current input position to a specific location in memory. This is intended to represent the starting location of a capturing group.
add_capture_end: Adds a state with a single unconditional epsilon transition that also instructs an NFA simulation to record the current input position to a specific location in memory. This is intended to represent the ending location of a capturing group.
add_fail: Adds a state that never transitions to another state.
add_match: Add a state that indicates a match has been found for a particular pattern. A match state is a final state with no outgoing transitions.

§Setting transitions between NFA states

The Builder::patch method creates a transition from one state to the next. If the from state corresponds to a state that supports multiple outgoing transitions (such as “union”), then this adds the corresponding transition. Otherwise, it sets the single transition. (This routine panics if from corresponds to a state added by add_sparse, since sparse states need more specialized handling.)

§Example

This annotated example shows how to hand construct the regex [a-z]+ (without an unanchored prefix).

use regex_automata::{
    nfa::thompson::{pikevm::PikeVM, Builder, Transition},
    util::primitives::StateID,
    Match,
};

let mut builder = Builder::new();
// Before adding NFA states for our pattern, we need to tell the builder
// that we are starting the pattern.
builder.start_pattern()?;
// Since we use the Pike VM below for searching, we need to add capturing
// states. If you're just going to build a DFA from the NFA, then capturing
// states do not need to be added.
let start = builder.add_capture_start(StateID::ZERO, 0, None)?;
let range = builder.add_range(Transition {
    // We don't know the state ID of the 'next' state yet, so we just fill
    // in a dummy 'ZERO' value.
    start: b'a', end: b'z', next: StateID::ZERO,
})?;
// This state will point back to 'range', but also enable us to move ahead.
// That is, this implements the '+' repetition operator. We add 'range' and
// then 'end' below to this alternation.
let alt = builder.add_union(vec![])?;
// The final state before the match state, which serves to capture the
// end location of the match.
let end = builder.add_capture_end(StateID::ZERO, 0)?;
// The match state for our pattern.
let mat = builder.add_match()?;
// Now we fill in the transitions between states.
builder.patch(start, range)?;
builder.patch(range, alt)?;
// If we added 'end' before 'range', then we'd implement non-greedy
// matching, i.e., '+?'.
builder.patch(alt, range)?;
builder.patch(alt, end)?;
builder.patch(end, mat)?;
// We must explicitly finish pattern and provide the starting state ID for
// this particular pattern.
builder.finish_pattern(start)?;
// Finally, when we build the NFA, we provide the anchored and unanchored
// starting state IDs. Since we didn't bother with an unanchored prefix
// here, we only support anchored searching. Thus, both starting states are
// the same.
let nfa = builder.build(start, start)?;

// Now build a Pike VM from our NFA, and use it for searching. This shows
// how we can use a regex engine without ever worrying about syntax!
let re = PikeVM::new_from_nfa(nfa)?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let expected = Some(Match::must(0, 0..3));
re.captures(&mut cache, "foo0", &mut caps);
assert_eq!(expected, caps.get_match());

Fields§

§pattern_id: Option<PatternID>

The ID of the pattern that we’re currently building.

Callers are required to set (and unset) this by calling {start,finish}_pattern. Otherwise, most methods will panic.

§states: Vec<State>

A sequence of intermediate NFA states. Once a state is added to this sequence, it is assigned a state ID equivalent to its index. Once a state is added, it is still expected to be mutated, e.g., to set its transition to a state that didn’t exist at the time it was added.

§start_pattern: Vec<StateID>

The starting states for each individual pattern. Starting at any of these states will result in only an anchored search for the corresponding pattern. The vec is indexed by pattern ID. When the NFA contains a single regex, then start_pattern[0] and start_anchored are always equivalent.

§captures: Vec<Vec<Option<Arc<str>>>>

A map from pattern ID to capture group index to name. (If no name exists, then a None entry is present. Thus, all capturing groups are present in this mapping.)

The outer vec is indexed by pattern ID, while the inner vec is indexed by capture index offset for the corresponding pattern.

The first capture group for each pattern is always unnamed and is thus always None.

§memory_states: usize

The combined memory used by each of the ’State’s in ‘states’. This only includes heap usage by each state, and not the size of the state itself. In other words, this tracks heap memory used that isn’t captured via size_of::<State>() * states.len().

§utf8: bool

Whether this NFA only matches UTF-8 and whether regex engines using this NFA for searching should report empty matches that split a codepoint.

§reverse: bool

Whether this NFA should be matched in reverse or not.

§look_matcher: LookMatcher

The matcher to use for look-around assertions.

§size_limit: Option<usize>

A size limit to respect when building an NFA. If the total heap memory of the intermediate NFA states exceeds (or would exceed) this amount, then an error is returned.

Implementations§

Source §

impl Builder

Source

pub fn new() -> Builder

Create a new builder for hand-assembling NFAs.

Source

pub fn clear(&mut self)

Clear this builder.

Clearing removes all state associated with building an NFA, but does not reset configuration (such as size limits and whether the NFA should only match UTF-8). After clearing, the builder can be reused to assemble an entirely new NFA.

Source

pub fn build( &self, start_anchored: StateID, start_unanchored: StateID, ) -> Result<NFA, BuildError>

Assemble a NFA from the states added so far.

After building an NFA, more states may be added and build may be called again. To reuse a builder to produce an entirely new NFA from scratch, call the clear method first.

start_anchored refers to the ID of the starting state that anchored searches should use. That is, searches who matches are limited to the starting position of the search.

start_unanchored refers to the ID of the starting state that unanchored searches should use. This permits searches to report matches that start after the beginning of the search. In cases where unanchored searches are not supported, the unanchored starting state ID must be the same as the anchored starting state ID.

§Errors

This returns an error if there was a problem producing the final NFA. In particular, this might include an error if the capturing groups added to this builder violate any of the invariants documented on GroupInfo.

§Panics

If start_pattern was called, then finish_pattern must be called before build, otherwise this panics.

This may panic for other invalid uses of a builder. For example, if a “start capture” state was added without a corresponding “end capture” state.

Source

pub fn start_pattern(&mut self) -> Result<PatternID, BuildError>

Start the assembly of a pattern in this NFA.

Upon success, this returns the identifier for the new pattern. Identifiers start at 0 and are incremented by 1 for each new pattern.

It is necessary to call this routine before adding capturing states. Otherwise, any other NFA state may be added before starting a pattern.

§Errors

If the pattern identifier space is exhausted, then this returns an error.

§Panics

If this is called while assembling another pattern (i.e., before finish_pattern is called), then this panics.

Source

pub fn finish_pattern( &mut self, start_id: StateID, ) -> Result<PatternID, BuildError>

Finish the assembly of a pattern in this NFA.

Upon success, this returns the identifier for the new pattern. Identifiers start at 0 and are incremented by 1 for each new pattern. This is the same identifier returned by the corresponding start_pattern call.

Note that start_pattern and finish_pattern pairs cannot be interleaved or nested. A correct finish_pattern call always corresponds to the most recently called start_pattern routine.

§Errors

This currently never returns an error, but this is subject to change.

§Panics

If this is called without a corresponding start_pattern call, then this panics.

Source

pub fn current_pattern_id(&self) -> PatternID

Returns the pattern identifier of the current pattern.

§Panics

If this doesn’t occur after a start_pattern call and before the corresponding finish_pattern call, then this panics.

Source

pub fn pattern_len(&self) -> usize

Returns the number of patterns added to this builder so far.

This only includes patterns that have had finish_pattern called for them.

Source

pub fn add_empty(&mut self) -> Result<StateID, BuildError>

Add an “empty” NFA state.

An “empty” NFA state is a state with a single unconditional epsilon transition to another NFA state. Such empty states are removed before building the final NFA (which has no such “empty” states), but they can be quite useful in the construction process of an NFA.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

Source

pub fn add_union( &mut self, alternates: Vec<StateID>, ) -> Result<StateID, BuildError>

Add a “union” NFA state.

A “union” NFA state that contains zero or more unconditional epsilon transitions to other NFA states. The order of these transitions reflects a priority order where earlier transitions are preferred over later transitions.

Callers may provide an empty set of alternates to this method call, and then later add transitions via patch. At final build time, a “union” state with no alternates is converted to a “fail” state, and a “union” state with exactly one alternate is treated as if it were an “empty” state.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

Source

pub fn add_union_reverse( &mut self, alternates: Vec<StateID>, ) -> Result<StateID, BuildError>

Add a “reverse union” NFA state.

A “reverse union” NFA state contains zero or more unconditional epsilon transitions to other NFA states. The order of these transitions reflects a priority order where later transitions are preferred over earlier transitions. This is an inverted priority order when compared to add_union. This is useful, for example, for implementing non-greedy repetition operators.

Callers may provide an empty set of alternates to this method call, and then later add transitions via patch. At final build time, a “reverse union” state with no alternates is converted to a “fail” state, and a “reverse union” state with exactly one alternate is treated as if it were an “empty” state.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

Source

pub fn add_range(&mut self, trans: Transition) -> Result<StateID, BuildError>

Add a “range” NFA state.

A “range” NFA state is a state with one outgoing transition to another state, where that transition may only be followed if the current input byte falls between a range of bytes given.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

Source

pub fn add_sparse( &mut self, transitions: Vec<Transition>, ) -> Result<StateID, BuildError>

Add a “sparse” NFA state.

A “sparse” NFA state contains zero or more outgoing transitions, where the transition to be followed (if any) is chosen based on whether the current input byte falls in the range of one such transition. The transitions given must be non-overlapping and in ascending order. (A “sparse” state with no transitions is equivalent to a “fail” state.)

A “sparse” state is like adding a “union” state and pointing it at a bunch of “range” states, except that the different alternates have equal priority.

Note that a “sparse” state is the only state that cannot be patched. This is because a “sparse” state has many transitions, each of which may point to a different NFA state. Moreover, adding more such transitions requires more than just an NFA state ID to point to. It also requires a byte range. The patch routine does not support the additional information required. Therefore, callers must ensure that all outgoing transitions for this state are included when add_sparse is called. There is no way to add more later.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

§Panics

This routine may panic if the transitions given overlap or are not in ascending order.

Source

pub fn add_look( &mut self, next: StateID, look: Look, ) -> Result<StateID, BuildError>

Add a “look” NFA state.

A “look” NFA state corresponds to a state with exactly one conditional epsilon transition to another NFA state. Namely, it represents one of a small set of simplistic look-around operators.

Callers may provide a “dummy” state ID (typically StateID::ZERO), and then change it later with patch.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

Source

pub fn add_capture_start( &mut self, next: StateID, group_index: u32, name: Option<Arc<str>>, ) -> Result<StateID, BuildError>

Add a “start capture” NFA state.

A “start capture” NFA state corresponds to a state with exactly one outgoing unconditional epsilon transition to another state. Unlike “empty” states, a “start capture” state also carries with it an instruction for saving the current position of input to a particular location in memory. NFA simulations, like the Pike VM, may use this information to report the match locations of capturing groups in a regex pattern.

If the corresponding capturing group has a name, then callers should include it here.

Callers may provide a “dummy” state ID (typically StateID::ZERO), and then change it later with patch.

Note that unlike start_pattern/finish_pattern, capturing start and end states may be interleaved. Indeed, it is typical for many “start capture” NFA states to appear before the first “end capture” state.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded or if the given capture index overflows usize.

While the above are the only conditions in which this routine can currently return an error, it is possible to call this method with an inputs that results in the final build() step failing to produce an NFA. For example, if one adds two distinct capturing groups with the same name, then that will result in build() failing with an error.

See the GroupInfo type for more information on what qualifies as valid capturing groups.

§Example

This example shows that an error occurs when one tries to add multiple capturing groups with the same name to the same pattern.

use regex_automata::{
    nfa::thompson::Builder,
    util::primitives::StateID,
};

let name = Some(std::sync::Arc::from("foo"));
let mut builder = Builder::new();
builder.start_pattern()?;
// 0th capture group should always be unnamed.
let start = builder.add_capture_start(StateID::ZERO, 0, None)?;
// OK
builder.add_capture_start(StateID::ZERO, 1, name.clone())?;
// This is not OK, but 'add_capture_start' still succeeds. We don't
// get an error until we call 'build' below. Without this call, the
// call to 'build' below would succeed.
builder.add_capture_start(StateID::ZERO, 2, name.clone())?;
// Finish our pattern so we can try to build the NFA.
builder.finish_pattern(start)?;
let result = builder.build(start, start);
assert!(result.is_err());

However, adding multiple capturing groups with the same name to distinct patterns is okay:

use std::sync::Arc;

use regex_automata::{
    nfa::thompson::{pikevm::PikeVM, Builder, Transition},
    util::{
        captures::Captures,
        primitives::{PatternID, StateID},
    },
    Span,
};

// Hand-compile the patterns '(?P<foo>[a-z])' and '(?P<foo>[A-Z])'.
let mut builder = Builder::new();
// We compile them to support an unanchored search, which requires
// adding an implicit '(?s-u:.)*?' prefix before adding either pattern.
let unanchored_prefix = builder.add_union_reverse(vec![])?;
let any = builder.add_range(Transition {
    start: b'\x00', end: b'\xFF', next: StateID::ZERO,
})?;
builder.patch(unanchored_prefix, any)?;
builder.patch(any, unanchored_prefix)?;

// Compile an alternation that permits matching multiple patterns.
let alt = builder.add_union(vec![])?;
builder.patch(unanchored_prefix, alt)?;

// Compile '(?P<foo>[a-z]+)'.
builder.start_pattern()?;
let start0 = builder.add_capture_start(StateID::ZERO, 0, None)?;
// N.B. 0th capture group must always be unnamed.
let foo_start0 = builder.add_capture_start(
    StateID::ZERO, 1, Some(Arc::from("foo")),
)?;
let lowercase = builder.add_range(Transition {
    start: b'a', end: b'z', next: StateID::ZERO,
})?;
let foo_end0 = builder.add_capture_end(StateID::ZERO, 1)?;
let end0 = builder.add_capture_end(StateID::ZERO, 0)?;
let match0 = builder.add_match()?;
builder.patch(start0, foo_start0)?;
builder.patch(foo_start0, lowercase)?;
builder.patch(lowercase, foo_end0)?;
builder.patch(foo_end0, end0)?;
builder.patch(end0, match0)?;
builder.finish_pattern(start0)?;

// Compile '(?P<foo>[A-Z]+)'.
builder.start_pattern()?;
let start1 = builder.add_capture_start(StateID::ZERO, 0, None)?;
// N.B. 0th capture group must always be unnamed.
let foo_start1 = builder.add_capture_start(
    StateID::ZERO, 1, Some(Arc::from("foo")),
)?;
let uppercase = builder.add_range(Transition {
    start: b'A', end: b'Z', next: StateID::ZERO,
})?;
let foo_end1 = builder.add_capture_end(StateID::ZERO, 1)?;
let end1 = builder.add_capture_end(StateID::ZERO, 0)?;
let match1 = builder.add_match()?;
builder.patch(start1, foo_start1)?;
builder.patch(foo_start1, uppercase)?;
builder.patch(uppercase, foo_end1)?;
builder.patch(foo_end1, end1)?;
builder.patch(end1, match1)?;
builder.finish_pattern(start1)?;

// Now add the patterns to our alternation that we started above.
builder.patch(alt, start0)?;
builder.patch(alt, start1)?;

// Finally build the NFA. The first argument is the anchored starting
// state (the pattern alternation) where as the second is the
// unanchored starting state (the unanchored prefix).
let nfa = builder.build(alt, unanchored_prefix)?;

// Now build a Pike VM from our NFA and access the 'foo' capture
// group regardless of which pattern matched, since it is defined
// for both patterns.
let vm = PikeVM::new_from_nfa(nfa)?;
let mut cache = vm.create_cache();
let caps: Vec<Captures> =
    vm.captures_iter(&mut cache, "0123aAaAA").collect();
assert_eq!(5, caps.len());

assert_eq!(Some(PatternID::must(0)), caps[0].pattern());
assert_eq!(Some(Span::from(4..5)), caps[0].get_group_by_name("foo"));

assert_eq!(Some(PatternID::must(1)), caps[1].pattern());
assert_eq!(Some(Span::from(5..6)), caps[1].get_group_by_name("foo"));

assert_eq!(Some(PatternID::must(0)), caps[2].pattern());
assert_eq!(Some(Span::from(6..7)), caps[2].get_group_by_name("foo"));

assert_eq!(Some(PatternID::must(1)), caps[3].pattern());
assert_eq!(Some(Span::from(7..8)), caps[3].get_group_by_name("foo"));

assert_eq!(Some(PatternID::must(1)), caps[4].pattern());
assert_eq!(Some(Span::from(8..9)), caps[4].get_group_by_name("foo"));

Source

pub fn add_capture_end( &mut self, next: StateID, group_index: u32, ) -> Result<StateID, BuildError>

Add a “end capture” NFA state.

A “end capture” NFA state corresponds to a state with exactly one outgoing unconditional epsilon transition to another state. Unlike “empty” states, a “end capture” state also carries with it an instruction for saving the current position of input to a particular location in memory. NFA simulations, like the Pike VM, may use this information to report the match locations of capturing groups in a

Callers may provide a “dummy” state ID (typically StateID::ZERO), and then change it later with patch.

Note that unlike start_pattern/finish_pattern, capturing start and end states may be interleaved. Indeed, it is typical for many “start capture” NFA states to appear before the first “end capture” state.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded or if the given capture index overflows usize.

While the above are the only conditions in which this routine can currently return an error, it is possible to call this method with an inputs that results in the final build() step failing to produce an NFA. For example, if one adds two distinct capturing groups with the same name, then that will result in build() failing with an error.

See the GroupInfo type for more information on what qualifies as valid capturing groups.

Source

pub fn add_fail(&mut self) -> Result<StateID, BuildError>

Adds a “fail” NFA state.

A “fail” state is simply a state that has no outgoing transitions. It acts as a way to cause a search to stop without reporting a match. For example, one way to represent an NFA with zero patterns is with a single “fail” state.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

Source

pub fn add_match(&mut self) -> Result<StateID, BuildError>

Adds a “match” NFA state.

A “match” state has no outgoing transitions (just like a “fail” state), but it has special significance in that if a search enters this state, then a match has been found. The match state that is added automatically has the current pattern ID associated with it. This is used to report the matching pattern ID at search time.

§Errors

This returns an error if the state identifier space is exhausted, or if the configured heap size limit has been exceeded.

§Panics

This must be called after a start_pattern call but before the corresponding finish_pattern call. Otherwise, it panics.

Source

fn add(&mut self, state: State) -> Result<StateID, BuildError>

The common implementation of “add a state.” It handles the common error cases of state ID exhausting (by owning state ID allocation) and whether the size limit has been exceeded.

Source

pub fn patch(&mut self, from: StateID, to: StateID) -> Result<(), BuildError>

Add a transition from one state to another.

This routine is called “patch” since it is very common to add the states you want, typically with “dummy” state ID transitions, and then “patch” in the real state IDs later. This is because you don’t always know all of the necessary state IDs to add because they might not exist yet.

§Errors

This may error if patching leads to an increase in heap usage beyond the configured size limit. Heap usage only grows when patching adds a new transition (as in the case of a “union” state).

§Panics

This panics if from corresponds to a “sparse” state. When “sparse” states are added, there is no way to patch them after-the-fact. (If you have a use case where this would be helpful, please file an issue. It will likely require a new API.)

Source

pub fn set_utf8(&mut self, yes: bool)

Set whether the NFA produced by this builder should only match UTF-8.

This should be set when both of the following are true:

The caller guarantees that the NFA created by this build will only report non-empty matches with spans that are valid UTF-8.
The caller desires regex engines using this NFA to avoid reporting empty matches with a span that splits a valid UTF-8 encoded codepoint.

Property (1) is not checked. Instead, this requires the caller to promise that it is true. Property (2) corresponds to the behavior of regex engines using the NFA created by this builder. Namely, there is no way in the NFA’s graph itself to say that empty matches found by, for example, the regex a* will fall on valid UTF-8 boundaries. Instead, this option is used to communicate the UTF-8 semantic to regex engines that will typically implement it as a post-processing step by filtering out empty matches that don’t fall on UTF-8 boundaries.

If you’re building an NFA from an HIR (and not using a thompson::Compiler), then you can use the syntax::Config::utf8 option to guarantee that if the HIR detects a non-empty match, then it is guaranteed to be valid UTF-8.

Note that property (2) does not specify the behavior of executing a search on a haystack that is not valid UTF-8. Therefore, if you’re not running this NFA on strings that are guaranteed to be valid UTF-8, you almost certainly do not want to enable this option. Similarly, if you are running the NFA on strings that are guaranteed to be valid UTF-8, then you almost certainly want to enable this option unless you can guarantee that your NFA will never produce a zero-width match.

It is disabled by default.

Source

pub fn get_utf8(&self) -> bool

Returns whether UTF-8 mode is enabled for this builder.

See Builder::set_utf8 for more details about what “UTF-8 mode” is.

Source

pub fn set_reverse(&mut self, yes: bool)

Sets whether the NFA produced by this builder should be matched in reverse or not. Generally speaking, when enabled, the NFA produced should be matched by moving backwards through a haystack, from a higher memory address to a lower memory address.

See also NFA::is_reverse for more details.

This is disabled by default, which means NFAs are by default matched in the forward direction.

Source

pub fn get_reverse(&self) -> bool

Returns whether reverse mode is enabled for this builder.

See Builder::set_reverse for more details about what “reverse mode” is.

Source

pub fn set_look_matcher(&mut self, m: LookMatcher)

Sets the look-around matcher that should be used for the resulting NFA.

A look-around matcher can be used to configure how look-around assertions are matched. For example, a matcher might carry configuration that changes the line terminator used for (?m:^) and (?m:$) assertions.

Source

pub fn get_look_matcher(&self) -> &LookMatcher

Returns the look-around matcher used for this builder.

If a matcher was not explicitly set, then LookMatcher::default() is returned.

Source

pub fn set_size_limit(&mut self, limit: Option<usize>) -> Result<(), BuildError>

Set the size limit on this builder.

Setting the size limit will also check whether the NFA built so far fits within the given size limit. If it doesn’t, then an error is returned.

By default, there is no configured size limit.

Source

pub fn get_size_limit(&self) -> Option<usize>

Return the currently configured size limit.

By default, this returns None, which corresponds to no configured size limit.

Source

pub fn memory_usage(&self) -> usize

Returns the heap memory usage, in bytes, used by the NFA states added so far.

Note that this is an approximation of how big the final NFA will be. In practice, the final NFA will likely be a bit smaller because of its simpler state representation. (For example, using things like Box<[StateID]> instead of Vec<StateID>.)

Source