Struct regex_automata::nfa::thompson::backtrack::BoundedBacktracker
source · pub struct BoundedBacktracker {
config: Config,
nfa: NFA,
}
Expand description
A backtracking regex engine that bounds its execution to avoid exponential blow-up.
This regex engine only implements leftmost-first match semantics and
only supports leftmost searches. It effectively does the same thing as a
PikeVM
, but typically does it faster because
it doesn’t have to worry about copying capturing group spans for most NFA
states. Instead, the backtracker can maintain one set of captures (provided
by the caller) and never needs to copy them. In exchange, the backtracker
bounds itself to ensure it doesn’t exhibit worst case exponential time.
This results in the backtracker only being able to handle short haystacks
given reasonable memory usage.
§Searches may return an error!
By design, this backtracking regex engine is bounded. This bound is
implemented by not visiting any combination of NFA state ID and position
in a haystack more than once. Thus, the total memory required to bound
backtracking is proportional to haystack.len() * nfa.states().len()
.
This can obviously get quite large, since large haystacks aren’t terribly
uncommon. To avoid using exorbitant memory, the capacity is bounded by
a fixed limit set via Config::visited_capacity
. Thus, if the total
capacity required for a particular regex and a haystack exceeds this
capacity, then the search routine will return an error.
Unlike other regex engines that may return an error at search time (like the DFA or the hybrid NFA/DFA), there is no way to guarantee that a bounded backtracker will work for every haystack. Therefore, this regex engine only exposes fallible search routines to avoid the footgun of panicking when running a search on a haystack that is too big.
If one wants to use the fallible search APIs without handling the
error, the only way to guarantee an error won’t occur from the
haystack length is to ensure the haystack length does not exceed
BoundedBacktracker::max_haystack_len
.
§Example: Unicode word boundaries
This example shows that the bounded backtracker implements Unicode word boundaries correctly by default.
use regex_automata::{nfa::thompson::backtrack::BoundedBacktracker, Match};
let re = BoundedBacktracker::new(r"\b\w+\b")?;
let mut cache = re.create_cache();
let mut it = re.try_find_iter(&mut cache, "Шерлок Холмс");
assert_eq!(Some(Ok(Match::must(0, 0..12))), it.next());
assert_eq!(Some(Ok(Match::must(0, 13..23))), it.next());
assert_eq!(None, it.next());
§Example: multiple regex patterns
The bounded backtracker supports searching for multiple patterns simultaneously, just like other regex engines. Note though that because it uses a backtracking strategy, this regex engine is unlikely to scale well as more patterns are added. But then again, as more patterns are added, the maximum haystack length allowed will also shorten (assuming the visited capacity remains invariant).
use regex_automata::{nfa::thompson::backtrack::BoundedBacktracker, Match};
let re = BoundedBacktracker::new_many(&["[a-z]+", "[0-9]+"])?;
let mut cache = re.create_cache();
let mut it = re.try_find_iter(&mut cache, "abc 1 foo 4567 0 quux");
assert_eq!(Some(Ok(Match::must(0, 0..3))), it.next());
assert_eq!(Some(Ok(Match::must(1, 4..5))), it.next());
assert_eq!(Some(Ok(Match::must(0, 6..9))), it.next());
assert_eq!(Some(Ok(Match::must(1, 10..14))), it.next());
assert_eq!(Some(Ok(Match::must(1, 15..16))), it.next());
assert_eq!(Some(Ok(Match::must(0, 17..21))), it.next());
assert_eq!(None, it.next());
Fields§
§config: Config
§nfa: NFA
Implementations§
source§impl BoundedBacktracker
impl BoundedBacktracker
sourcepub fn new(pattern: &str) -> Result<BoundedBacktracker, BuildError>
pub fn new(pattern: &str) -> Result<BoundedBacktracker, BuildError>
Parse the given regular expression using the default configuration and
return the corresponding BoundedBacktracker
.
If you want a non-default configuration, then use the Builder
to
set your own configuration.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match,
};
let re = BoundedBacktracker::new("foo[0-9]+bar")?;
let mut cache = re.create_cache();
assert_eq!(
Some(Ok(Match::must(0, 3..14))),
re.try_find_iter(&mut cache, "zzzfoo12345barzzz").next(),
);
sourcepub fn new_many<P: AsRef<str>>(
patterns: &[P],
) -> Result<BoundedBacktracker, BuildError>
pub fn new_many<P: AsRef<str>>( patterns: &[P], ) -> Result<BoundedBacktracker, BuildError>
Like new
, but parses multiple patterns into a single “multi regex.”
This similarly uses the default regex configuration.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match,
};
let re = BoundedBacktracker::new_many(&["[a-z]+", "[0-9]+"])?;
let mut cache = re.create_cache();
let mut it = re.try_find_iter(&mut cache, "abc 1 foo 4567 0 quux");
assert_eq!(Some(Ok(Match::must(0, 0..3))), it.next());
assert_eq!(Some(Ok(Match::must(1, 4..5))), it.next());
assert_eq!(Some(Ok(Match::must(0, 6..9))), it.next());
assert_eq!(Some(Ok(Match::must(1, 10..14))), it.next());
assert_eq!(Some(Ok(Match::must(1, 15..16))), it.next());
assert_eq!(Some(Ok(Match::must(0, 17..21))), it.next());
assert_eq!(None, it.next());
sourcepub fn new_from_nfa(nfa: NFA) -> Result<BoundedBacktracker, BuildError>
pub fn new_from_nfa(nfa: NFA) -> Result<BoundedBacktracker, BuildError>
§Example
This shows how to hand assemble a regular expression via its HIR, compile an NFA from it and build a BoundedBacktracker from the NFA.
use regex_automata::{
nfa::thompson::{NFA, backtrack::BoundedBacktracker},
Match,
};
use regex_syntax::hir::{Hir, Class, ClassBytes, ClassBytesRange};
let hir = Hir::class(Class::Bytes(ClassBytes::new(vec![
ClassBytesRange::new(b'0', b'9'),
ClassBytesRange::new(b'A', b'Z'),
ClassBytesRange::new(b'_', b'_'),
ClassBytesRange::new(b'a', b'z'),
])));
let config = NFA::config().nfa_size_limit(Some(1_000));
let nfa = NFA::compiler().configure(config).build_from_hir(&hir)?;
let re = BoundedBacktracker::new_from_nfa(nfa)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let expected = Some(Match::must(0, 3..4));
re.try_captures(&mut cache, "!@#A#@!", &mut caps)?;
assert_eq!(expected, caps.get_match());
sourcepub fn always_match() -> Result<BoundedBacktracker, BuildError>
pub fn always_match() -> Result<BoundedBacktracker, BuildError>
Create a new BoundedBacktracker
that matches every input.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match,
};
let re = BoundedBacktracker::always_match()?;
let mut cache = re.create_cache();
let expected = Some(Ok(Match::must(0, 0..0)));
assert_eq!(expected, re.try_find_iter(&mut cache, "").next());
assert_eq!(expected, re.try_find_iter(&mut cache, "foo").next());
sourcepub fn never_match() -> Result<BoundedBacktracker, BuildError>
pub fn never_match() -> Result<BoundedBacktracker, BuildError>
Create a new BoundedBacktracker
that never matches any input.
§Example
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::never_match()?;
let mut cache = re.create_cache();
assert_eq!(None, re.try_find_iter(&mut cache, "").next());
assert_eq!(None, re.try_find_iter(&mut cache, "foo").next());
sourcepub fn config() -> Config
pub fn config() -> Config
Return a default configuration for a BoundedBacktracker
.
This is a convenience routine to avoid needing to import the Config
type when customizing the construction of a BoundedBacktracker
.
§Example
This example shows how to disable UTF-8 mode. When UTF-8 mode is disabled, zero-width matches that split a codepoint are allowed. Otherwise they are never reported.
In the code below, notice that ""
is permitted to match positions
that split the encoding of a codepoint.
use regex_automata::{
nfa::thompson::{self, backtrack::BoundedBacktracker},
Match,
};
let re = BoundedBacktracker::builder()
.thompson(thompson::Config::new().utf8(false))
.build(r"")?;
let mut cache = re.create_cache();
let haystack = "a☃z";
let mut it = re.try_find_iter(&mut cache, haystack);
assert_eq!(Some(Ok(Match::must(0, 0..0))), it.next());
assert_eq!(Some(Ok(Match::must(0, 1..1))), it.next());
assert_eq!(Some(Ok(Match::must(0, 2..2))), it.next());
assert_eq!(Some(Ok(Match::must(0, 3..3))), it.next());
assert_eq!(Some(Ok(Match::must(0, 4..4))), it.next());
assert_eq!(Some(Ok(Match::must(0, 5..5))), it.next());
assert_eq!(None, it.next());
sourcepub fn builder() -> Builder
pub fn builder() -> Builder
Return a builder for configuring the construction of a
BoundedBacktracker
.
This is a convenience routine to avoid needing to import the
Builder
type in common cases.
§Example
This example shows how to use the builder to disable UTF-8 mode everywhere.
use regex_automata::{
nfa::thompson::{self, backtrack::BoundedBacktracker},
util::syntax,
Match,
};
let re = BoundedBacktracker::builder()
.syntax(syntax::Config::new().utf8(false))
.thompson(thompson::Config::new().utf8(false))
.build(r"foo(?-u:[^b])ar.*")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 1..9));
re.try_captures(&mut cache, haystack, &mut caps)?;
assert_eq!(expected, caps.get_match());
sourcepub fn create_cache(&self) -> Cache
pub fn create_cache(&self) -> Cache
Create a new cache for this regex.
The cache returned should only be used for searches for this
regex. If you want to reuse the cache for another regex, then you
must call Cache::reset
with that regex (or, equivalently,
BoundedBacktracker::reset_cache
).
sourcepub fn create_captures(&self) -> Captures
pub fn create_captures(&self) -> Captures
Create a new empty set of capturing groups that is guaranteed to be
valid for the search APIs on this BoundedBacktracker
.
A Captures
value created for a specific BoundedBacktracker
cannot
be used with any other BoundedBacktracker
.
This is a convenience function for Captures::all
. See the
Captures
documentation for an explanation of its alternative
constructors that permit the BoundedBacktracker
to do less work
during a search, and thus might make it faster.
sourcepub fn reset_cache(&self, cache: &mut Cache)
pub fn reset_cache(&self, cache: &mut Cache)
Reset the given cache such that it can be used for searching with the
this BoundedBacktracker
(and only this BoundedBacktracker
).
A cache reset permits reusing memory already allocated in this cache
with a different BoundedBacktracker
.
§Example
This shows how to re-purpose a cache for use with a different
BoundedBacktracker
.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match,
};
let re1 = BoundedBacktracker::new(r"\w")?;
let re2 = BoundedBacktracker::new(r"\W")?;
let mut cache = re1.create_cache();
assert_eq!(
Some(Ok(Match::must(0, 0..2))),
re1.try_find_iter(&mut cache, "Δ").next(),
);
// Using 'cache' with re2 is not allowed. It may result in panics or
// incorrect results. In order to re-purpose the cache, we must reset
// it with the BoundedBacktracker we'd like to use it with.
//
// Similarly, after this reset, using the cache with 're1' is also not
// allowed.
cache.reset(&re2);
assert_eq!(
Some(Ok(Match::must(0, 0..3))),
re2.try_find_iter(&mut cache, "☃").next(),
);
sourcepub fn pattern_len(&self) -> usize
pub fn pattern_len(&self) -> usize
Returns the total number of patterns compiled into this
BoundedBacktracker
.
In the case of a BoundedBacktracker
that contains no patterns, this
returns 0
.
§Example
This example shows the pattern length for a BoundedBacktracker
that
never matches:
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::never_match()?;
assert_eq!(re.pattern_len(), 0);
And another example for a BoundedBacktracker
that matches at every
position:
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::always_match()?;
assert_eq!(re.pattern_len(), 1);
And finally, a BoundedBacktracker
that was constructed from multiple
patterns:
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::new_many(&["[0-9]+", "[a-z]+", "[A-Z]+"])?;
assert_eq!(re.pattern_len(), 3);
sourcepub fn get_config(&self) -> &Config
pub fn get_config(&self) -> &Config
Return the config for this BoundedBacktracker
.
sourcepub fn max_haystack_len(&self) -> usize
pub fn max_haystack_len(&self) -> usize
Returns the maximum haystack length supported by this backtracker.
This routine is a function of both Config::visited_capacity
and the
internal size of the backtracker’s NFA.
§Example
This example shows how the maximum haystack length can vary depending on the size of the regex itself. Note though that the specific maximum values here are not an API guarantee. The default visited capacity is subject to change and not covered by semver.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match, MatchError,
};
// If you're only using ASCII, you get a big budget.
let re = BoundedBacktracker::new(r"(?-u)\w+")?;
let mut cache = re.create_cache();
assert_eq!(re.max_haystack_len(), 299_592);
// Things work up to the max.
let mut haystack = "a".repeat(299_592);
let expected = Some(Ok(Match::must(0, 0..299_592)));
assert_eq!(expected, re.try_find_iter(&mut cache, &haystack).next());
// But you'll get an error if you provide a haystack that's too big.
// Notice that we use the 'try_find_iter' routine instead, which
// yields Result<Match, MatchError> instead of Match.
haystack.push('a');
let expected = Some(Err(MatchError::haystack_too_long(299_593)));
assert_eq!(expected, re.try_find_iter(&mut cache, &haystack).next());
// Unicode inflates the size of the underlying NFA quite a bit, and
// thus means that the backtracker can only handle smaller haystacks,
// assuming that the visited capacity remains unchanged.
let re = BoundedBacktracker::new(r"\w+")?;
assert!(re.max_haystack_len() <= 7_000);
// But we can increase the visited capacity to handle bigger haystacks!
let re = BoundedBacktracker::builder()
.configure(BoundedBacktracker::config().visited_capacity(1<<20))
.build(r"\w+")?;
assert!(re.max_haystack_len() >= 25_000);
assert!(re.max_haystack_len() <= 28_000);
source§impl BoundedBacktracker
impl BoundedBacktracker
sourcepub fn try_is_match<'h, I: Into<Input<'h>>>(
&self,
cache: &mut Cache,
input: I,
) -> Result<bool, MatchError>
pub fn try_is_match<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, ) -> Result<bool, MatchError>
Returns true if and only if this regex matches the given haystack.
In the case of a backtracking regex engine, and unlike most other regex engines in this crate, short circuiting isn’t practical. However, this routine may still be faster because it instructs backtracking to not keep track of any capturing groups.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
use regex_automata::nfa::thompson::backtrack::BoundedBacktracker;
let re = BoundedBacktracker::new("foo[0-9]+bar")?;
let mut cache = re.create_cache();
assert!(re.try_is_match(&mut cache, "foo12345bar")?);
assert!(!re.try_is_match(&mut cache, "foobar")?);
§Example: consistency with search APIs
is_match
is guaranteed to return true
whenever find
returns a
match. This includes searches that are executed entirely within a
codepoint:
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Input,
};
let re = BoundedBacktracker::new("a*")?;
let mut cache = re.create_cache();
assert!(!re.try_is_match(&mut cache, Input::new("☃").span(1..2))?);
Notice that when UTF-8 mode is disabled, then the above reports a match because the restriction against zero-width matches that split a codepoint has been lifted:
use regex_automata::{
nfa::thompson::{backtrack::BoundedBacktracker, NFA},
Input,
};
let re = BoundedBacktracker::builder()
.thompson(NFA::config().utf8(false))
.build("a*")?;
let mut cache = re.create_cache();
assert!(re.try_is_match(&mut cache, Input::new("☃").span(1..2))?);
sourcepub fn try_find<'h, I: Into<Input<'h>>>(
&self,
cache: &mut Cache,
input: I,
) -> Result<Option<Match>, MatchError>
pub fn try_find<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, ) -> Result<Option<Match>, MatchError>
Executes a leftmost forward search and returns a Match
if one exists.
This routine only includes the overall match span. To get
access to the individual spans of each capturing group, use
BoundedBacktracker::try_captures
.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match,
};
let re = BoundedBacktracker::new("foo[0-9]+")?;
let mut cache = re.create_cache();
let expected = Match::must(0, 0..8);
assert_eq!(Some(expected), re.try_find(&mut cache, "foo12345")?);
sourcepub fn try_captures<'h, I: Into<Input<'h>>>(
&self,
cache: &mut Cache,
input: I,
caps: &mut Captures,
) -> Result<(), MatchError>
pub fn try_captures<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, caps: &mut Captures, ) -> Result<(), MatchError>
Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided Captures
value. If no match was found, then Captures::is_match
is guaranteed
to return false
.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Span,
};
let re = BoundedBacktracker::new(
r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.try_captures(&mut cache, "2010-03-14", &mut caps)?;
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));
sourcepub fn try_find_iter<'r, 'c, 'h, I: Into<Input<'h>>>(
&'r self,
cache: &'c mut Cache,
input: I,
) -> TryFindMatches<'r, 'c, 'h> ⓘ
pub fn try_find_iter<'r, 'c, 'h, I: Into<Input<'h>>>( &'r self, cache: &'c mut Cache, input: I, ) -> TryFindMatches<'r, 'c, 'h> ⓘ
Returns an iterator over all non-overlapping leftmost matches in the given bytes. If no match exists, then the iterator yields no elements.
If the regex engine returns an error at any point, then the iterator will yield that error.
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match, MatchError,
};
let re = BoundedBacktracker::new("foo[0-9]+")?;
let mut cache = re.create_cache();
let text = "foo1 foo12 foo123";
let result: Result<Vec<Match>, MatchError> = re
.try_find_iter(&mut cache, text)
.collect();
let matches = result?;
assert_eq!(matches, vec![
Match::must(0, 0..4),
Match::must(0, 5..10),
Match::must(0, 11..17),
]);
sourcepub fn try_captures_iter<'r, 'c, 'h, I: Into<Input<'h>>>(
&'r self,
cache: &'c mut Cache,
input: I,
) -> TryCapturesMatches<'r, 'c, 'h> ⓘ
pub fn try_captures_iter<'r, 'c, 'h, I: Into<Input<'h>>>( &'r self, cache: &'c mut Cache, input: I, ) -> TryCapturesMatches<'r, 'c, 'h> ⓘ
Returns an iterator over all non-overlapping Captures
values. If no
match exists, then the iterator yields no elements.
This yields the same matches as BoundedBacktracker::try_find_iter
,
but it includes the spans of all capturing groups that participate in
each match.
If the regex engine returns an error at any point, then the iterator will yield that error.
Tip: See util::iter::Searcher
for
how to correctly iterate over all matches in a haystack while avoiding
the creation of a new Captures
value for every match. (Which you are
forced to do with an Iterator
.)
§Example
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Span,
};
let re = BoundedBacktracker::new("foo(?P<numbers>[0-9]+)")?;
let mut cache = re.create_cache();
let text = "foo1 foo12 foo123";
let mut spans = vec![];
for result in re.try_captures_iter(&mut cache, text) {
let caps = result?;
// The unwrap is OK since 'numbers' matches if the pattern matches.
spans.push(caps.get_group_by_name("numbers").unwrap());
}
assert_eq!(spans, vec![
Span::from(3..4),
Span::from(8..10),
Span::from(14..17),
]);
source§impl BoundedBacktracker
impl BoundedBacktracker
sourcepub fn try_search(
&self,
cache: &mut Cache,
input: &Input<'_>,
caps: &mut Captures,
) -> Result<(), MatchError>
pub fn try_search( &self, cache: &mut Cache, input: &Input<'_>, caps: &mut Captures, ) -> Result<(), MatchError>
Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided Captures
value. If no match was found, then Captures::is_match
is guaranteed
to return false
.
This is like BoundedBacktracker::try_captures
, but it accepts a
concrete &Input
instead of an Into<Input>
.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example: specific pattern search
This example shows how to build a multi bounded backtracker that permits searching for specific patterns.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Anchored, Input, Match, PatternID,
};
let re = BoundedBacktracker::new_many(&[
"[a-z0-9]{6}",
"[a-z][a-z0-9]{5}",
])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "foo123";
// Since we are using the default leftmost-first match and both
// patterns match at the same starting position, only the first pattern
// will be returned in this case when doing a search for any of the
// patterns.
let expected = Some(Match::must(0, 0..6));
re.try_search(&mut cache, &Input::new(haystack), &mut caps)?;
assert_eq!(expected, caps.get_match());
// But if we want to check whether some other pattern matches, then we
// can provide its pattern ID.
let expected = Some(Match::must(1, 0..6));
let input = Input::new(haystack)
.anchored(Anchored::Pattern(PatternID::must(1)));
re.try_search(&mut cache, &input, &mut caps)?;
assert_eq!(expected, caps.get_match());
§Example: specifying the bounds of a search
This example shows how providing the bounds of a search can produce different results than simply sub-slicing the haystack.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
Match, Input,
};
let re = BoundedBacktracker::new(r"\b[0-9]{3}\b")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "foo123bar";
// Since we sub-slice the haystack, the search doesn't know about
// the larger context and assumes that `123` is surrounded by word
// boundaries. And of course, the match position is reported relative
// to the sub-slice as well, which means we get `0..3` instead of
// `3..6`.
let expected = Some(Match::must(0, 0..3));
re.try_search(&mut cache, &Input::new(&haystack[3..6]), &mut caps)?;
assert_eq!(expected, caps.get_match());
// But if we provide the bounds of the search within the context of the
// entire haystack, then the search can take the surrounding context
// into account. (And if we did find a match, it would be reported
// as a valid offset into `haystack` instead of its sub-slice.)
let expected = None;
re.try_search(
&mut cache, &Input::new(haystack).range(3..6), &mut caps,
)?;
assert_eq!(expected, caps.get_match());
sourcepub fn try_search_slots(
&self,
cache: &mut Cache,
input: &Input<'_>,
slots: &mut [Option<NonMaxUsize>],
) -> Result<Option<PatternID>, MatchError>
pub fn try_search_slots( &self, cache: &mut Cache, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>], ) -> Result<Option<PatternID>, MatchError>
Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided slots
, and
returns the matching pattern ID. The contents of the slots for patterns
other than the matching pattern are unspecified. If no match was found,
then None
is returned and the contents of all slots
is unspecified.
This is like BoundedBacktracker::try_search
, but it accepts a raw
slots slice instead of a Captures
value. This is useful in contexts
where you don’t want or need to allocate a Captures
.
It is legal to pass any number of slots to this routine. If the regex engine would otherwise write a slot offset that doesn’t fit in the provided slice, then it is simply skipped. In general though, there are usually three slice lengths you might want to use:
- An empty slice, if you only care about which pattern matched.
- A slice with
pattern_len() * 2
slots, if you only care about the overall match spans for each matching pattern. - A slice with
slot_len()
slots, which permits recording match offsets for every capturing group in every pattern.
§Errors
This routine only errors if the search could not complete. For this
backtracking regex engine, this only occurs when the haystack length
exceeds BoundedBacktracker::max_haystack_len
.
When a search cannot complete, callers cannot know whether a match exists or not.
§Example
This example shows how to find the overall match offsets in a
multi-pattern search without allocating a Captures
value. Indeed, we
can put our slots right on the stack.
use regex_automata::{
nfa::thompson::backtrack::BoundedBacktracker,
PatternID, Input,
};
let re = BoundedBacktracker::new_many(&[
r"\pL+",
r"\d+",
])?;
let mut cache = re.create_cache();
let input = Input::new("!@#123");
// We only care about the overall match offsets here, so we just
// allocate two slots for each pattern. Each slot records the start
// and end of the match.
let mut slots = [None; 4];
let pid = re.try_search_slots(&mut cache, &input, &mut slots)?;
assert_eq!(Some(PatternID::must(1)), pid);
// The overall match offsets are always at 'pid * 2' and 'pid * 2 + 1'.
// See 'GroupInfo' for more details on the mapping between groups and
// slot indices.
let slot_start = pid.unwrap().as_usize() * 2;
let slot_end = slot_start + 1;
assert_eq!(Some(3), slots[slot_start].map(|s| s.get()));
assert_eq!(Some(6), slots[slot_end].map(|s| s.get()));
sourcefn try_search_slots_imp(
&self,
cache: &mut Cache,
input: &Input<'_>,
slots: &mut [Option<NonMaxUsize>],
) -> Result<Option<HalfMatch>, MatchError>
fn try_search_slots_imp( &self, cache: &mut Cache, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>], ) -> Result<Option<HalfMatch>, MatchError>
This is the actual implementation of try_search_slots_imp
that
doesn’t account for the special case when 1) the NFA has UTF-8 mode
enabled, 2) the NFA can match the empty string and 3) the caller has
provided an insufficient number of slots to record match offsets.
sourcefn search_imp(
&self,
cache: &mut Cache,
input: &Input<'_>,
slots: &mut [Option<NonMaxUsize>],
) -> Result<Option<HalfMatch>, MatchError>
fn search_imp( &self, cache: &mut Cache, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>], ) -> Result<Option<HalfMatch>, MatchError>
The implementation of standard leftmost backtracking search.
Capturing group spans are written to ‘caps’, but only if requested. ‘caps’ can be one of three things: 1) totally empty, in which case, we only report the pattern that matched or 2) only has slots for recording the overall match offsets for any pattern or 3) has all slots available for recording the spans of any groups participating in a match.
sourcefn backtrack(
&self,
cache: &mut Cache,
input: &Input<'_>,
at: usize,
start_id: StateID,
slots: &mut [Option<NonMaxUsize>],
) -> Option<HalfMatch>
fn backtrack( &self, cache: &mut Cache, input: &Input<'_>, at: usize, start_id: StateID, slots: &mut [Option<NonMaxUsize>], ) -> Option<HalfMatch>
Look for a match starting at at
in input
and write the matching
pattern ID and group spans to caps
. The search uses start_id
as its
starting state in the underlying NFA.
If no match was found, then the caller should increment at
and try
at the next position.
sourcefn step(
&self,
cache: &mut Cache,
input: &Input<'_>,
sid: StateID,
at: usize,
slots: &mut [Option<NonMaxUsize>],
) -> Option<HalfMatch>
fn step( &self, cache: &mut Cache, input: &Input<'_>, sid: StateID, at: usize, slots: &mut [Option<NonMaxUsize>], ) -> Option<HalfMatch>
Execute a “step” in the backtracing algorithm.
A “step” is somewhat of a misnomer, because this routine keeps going until it either runs out of things to try or fins a match. In the former case, it may have pushed some things on to the backtracking stack, in which case, those will be tried next as part of the ‘backtrack’ routine above.
Trait Implementations§
source§impl Clone for BoundedBacktracker
impl Clone for BoundedBacktracker
source§fn clone(&self) -> BoundedBacktracker
fn clone(&self) -> BoundedBacktracker
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more