Struct regex_automata::meta::regex::Regex
source · pub struct Regex {
imp: Arc<RegexI>,
pool: Pool<Cache, Box<dyn Fn() -> Cache + Send + Sync + UnwindSafe + RefUnwindSafe>>,
}
Expand description
A regex matcher that works by composing several other regex matchers automatically.
In effect, a meta regex papers over a lot of the quirks or performance problems in each of the regex engines in this crate. Its goal is to provide an infallible and simple API that “just does the right thing” in the common case.
A meta regex is the implementation of a Regex
in the regex
crate.
Indeed, the regex
crate API is essentially just a light wrapper over
this type. This includes the regex
crate’s RegexSet
API!
§Composition
This is called a “meta” matcher precisely because it uses other regex matchers to provide a convenient high level regex API. Here are some examples of how other regex matchers are composed:
- When calling
Regex::captures
, instead of immediately running a slower but more capable regex engine like thePikeVM
, the meta regex engine will usually first look for the bounds of a match with a higher throughput regex engine like a lazy DFA. Only when a match is found is a slower engine likePikeVM
used to find the matching span for each capture group. - While higher throughout engines like the lazy DFA cannot handle Unicode word boundaries in general, they can still be used on pure ASCII haystacks by pretending that Unicode word boundaries are just plain ASCII word boundaries. However, if a haystack is not ASCII, the meta regex engine will automatically switch to a (possibly slower) regex engine that supports Unicode word boundaries in general.
- In some cases where a regex pattern is just a simple literal or a small set of literals, an actual regex engine won’t be used at all. Instead, substring or multi-substring search algorithms will be employed.
There are many other forms of composition happening too, but the above should give a general idea. In particular, it may perhaps be surprising that multiple regex engines might get executed for a single search. That is, the decision of what regex engine to use is not just based on the pattern, but also based on the dynamic execution of the search itself.
The primary reason for this composition is performance. The fundamental tension is that the faster engines tend to be less capable, and the more capable engines tend to be slower.
Note that the forms of composition that are allowed are determined by
compile time crate features and configuration. For example, if the hybrid
feature isn’t enabled, or if Config::hybrid
has been disabled, then the
meta regex engine will never use a lazy DFA.
§Synchronization and cloning
Most of the regex engines in this crate require some kind of mutable
“scratch” space to read and write from while performing a search. Since
a meta regex composes these regex engines, a meta regex also requires
mutable scratch space. This scratch space is called a Cache
.
Most regex engines also usually have a read-only component, typically
a Thompson NFA
.
In order to make the Regex
API convenient, most of the routines hide
the fact that a Cache
is needed at all. To achieve this, a memory
pool is used internally to retrieve Cache
values in a thread safe way that also permits reuse. This in turn implies
that every such search call requires some form of synchronization. Usually
this synchronization is fast enough to not notice, but in some cases, it
can be a bottleneck. This typically occurs when all of the following are
true:
- The same
Regex
is shared across multiple threads simultaneously, usually via autil::lazy::Lazy
or something similar from theonce_cell
orlazy_static
crates. - The primary unit of work in each thread is a regex search.
- Searches are run on very short haystacks.
This particular case can lead to high contention on the pool used by a
Regex
internally, which can in turn increase latency to a noticeable
effect. This cost can be mitigated in one of the following ways:
- Use a distinct copy of a
Regex
in each thread, usually by cloning it. Cloning aRegex
does not do a deep copy of its read-only component. But it does lead to eachRegex
having its own memory pool, which in turn eliminates the problem of contention. In general, this technique should not result in any additional memory usage when compared to sharing the sameRegex
across multiple threads simultaneously. - Use lower level APIs, like
Regex::search_with
, which permit passing aCache
explicitly. In this case, it is up to you to determine how best to provide aCache
. For example, you might put aCache
in thread-local storage if your use case allows for it.
Overall, this is an issue that happens rarely in practice, but it can happen.
§Warning: spin-locks may be used in alloc-only mode
When this crate is built without the std
feature and the high level APIs
on a Regex
are used, then a spin-lock will be used to synchronize access
to an internal pool of Cache
values. This may be undesirable because
a spin-lock is effectively impossible to implement correctly in user
space. That is, more concretely, the spin-lock could
result in a deadlock.
If one wants to avoid the use of spin-locks when the std
feature is
disabled, then you must use APIs that accept a Cache
value explicitly.
For example, Regex::search_with
.
§Example
use regex_automata::meta::Regex;
let re = Regex::new(r"^[0-9]{4}-[0-9]{2}-[0-9]{2}$")?;
assert!(re.is_match("2010-03-14"));
§Example: anchored search
This example shows how to use Input::anchored
to run an anchored
search, even when the regex pattern itself isn’t anchored. An anchored
search guarantees that if a match is found, then the start offset of the
match corresponds to the offset at which the search was started.
use regex_automata::{meta::Regex, Anchored, Input, Match};
let re = Regex::new(r"\bfoo\b")?;
let input = Input::new("xx foo xx").range(3..).anchored(Anchored::Yes);
// The offsets are in terms of the original haystack.
assert_eq!(Some(Match::must(0, 3..6)), re.find(input));
// Notice that no match occurs here, because \b still takes the
// surrounding context into account, even if it means looking back
// before the start of your search.
let hay = "xxfoo xx";
let input = Input::new(hay).range(2..).anchored(Anchored::Yes);
assert_eq!(None, re.find(input));
// Indeed, you cannot achieve the above by simply slicing the
// haystack itself, since the regex engine can't see the
// surrounding context. This is why 'Input' permits setting
// the bounds of a search!
let input = Input::new(&hay[2..]).anchored(Anchored::Yes);
// WRONG!
assert_eq!(Some(Match::must(0, 0..3)), re.find(input));
§Example: earliest search
This example shows how to use Input::earliest
to run a search that
might stop before finding the typical leftmost match.
use regex_automata::{meta::Regex, Anchored, Input, Match};
let re = Regex::new(r"[a-z]{3}|b")?;
let input = Input::new("abc").earliest(true);
assert_eq!(Some(Match::must(0, 1..2)), re.find(input));
// Note that "earliest" isn't really a match semantic unto itself.
// Instead, it is merely an instruction to whatever regex engine
// gets used internally to quit as soon as it can. For example,
// this regex uses a different search technique, and winds up
// producing a different (but valid) match!
let re = Regex::new(r"abc|b")?;
let input = Input::new("abc").earliest(true);
assert_eq!(Some(Match::must(0, 0..3)), re.find(input));
§Example: change the line terminator
This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().multi_line(true))
.configure(Regex::config().line_terminator(b'\x00'))
.build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));
Fields§
§imp: Arc<RegexI>
The actual regex implementation.
pool: Pool<Cache, Box<dyn Fn() -> Cache + Send + Sync + UnwindSafe + RefUnwindSafe>>
A thread safe pool of caches.
For the higher level search APIs, a Cache
is automatically plucked
from this pool before running a search. The lower level with
methods
permit the caller to provide their own cache, thereby bypassing
accesses to this pool.
Note that we put this outside the Arc
so that cloning a Regex
results in creating a fresh CachePool
. This in turn permits callers
to clone regexes into separate threads where each such regex gets
the pool’s “thread owner” optimization. Otherwise, if one shares the
Regex
directly, then the pool will go through a slower mutex path for
all threads except for the “owner.”
Implementations§
source§impl Regex
impl Regex
Convenience constructors for a Regex
using the default configuration.
sourcepub fn new(pattern: &str) -> Result<Regex, BuildError>
pub fn new(pattern: &str) -> Result<Regex, BuildError>
Builds a Regex
from a single pattern string using the default
configuration.
If there was a problem parsing the pattern or a problem turning it into a regex matcher, then an error is returned.
If you want to change the configuration of a Regex
, use a Builder
with a Config
.
§Example
use regex_automata::{meta::Regex, Match};
let re = Regex::new(r"(?Rm)^foo$")?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));
sourcepub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, BuildError>
pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, BuildError>
Builds a Regex
from many pattern strings using the default
configuration.
If there was a problem parsing any of the patterns or a problem turning them into a regex matcher, then an error is returned.
If you want to change the configuration of a Regex
, use a Builder
with a Config
.
§Example: simple lexer
This simplistic example leverages the multi-pattern support to build a simple little lexer. The pattern ID in the match tells you which regex matched, which in turn might be used to map back to the “type” of the token returned by the lexer.
use regex_automata::{meta::Regex, Match};
let re = Regex::new_many(&[
r"[[:space:]]",
r"[A-Za-z0-9][A-Za-z0-9_]+",
r"->",
r".",
])?;
let haystack = "fn is_boss(bruce: i32, springsteen: String) -> bool;";
let matches: Vec<Match> = re.find_iter(haystack).collect();
assert_eq!(matches, vec![
Match::must(1, 0..2), // 'fn'
Match::must(0, 2..3), // ' '
Match::must(1, 3..10), // 'is_boss'
Match::must(3, 10..11), // '('
Match::must(1, 11..16), // 'bruce'
Match::must(3, 16..17), // ':'
Match::must(0, 17..18), // ' '
Match::must(1, 18..21), // 'i32'
Match::must(3, 21..22), // ','
Match::must(0, 22..23), // ' '
Match::must(1, 23..34), // 'springsteen'
Match::must(3, 34..35), // ':'
Match::must(0, 35..36), // ' '
Match::must(1, 36..42), // 'String'
Match::must(3, 42..43), // ')'
Match::must(0, 43..44), // ' '
Match::must(2, 44..46), // '->'
Match::must(0, 46..47), // ' '
Match::must(1, 47..51), // 'bool'
Match::must(3, 51..52), // ';'
]);
One can write a lexer like the above using a regex like
(?P<space>[[:space:]])|(?P<ident>[A-Za-z0-9][A-Za-z0-9_]+)|...
,
but then you need to ask whether capture group matched to determine
which branch in the regex matched, and thus, which token the match
corresponds to. In contrast, the above example includes the pattern ID
in the match. There’s no need to use capture groups at all.
§Example: finding the pattern that caused an error
When a syntax error occurs, it is possible to ask which pattern caused the syntax error.
use regex_automata::{meta::Regex, PatternID};
let err = Regex::new_many(&["a", "b", r"\p{Foo}", "c"]).unwrap_err();
assert_eq!(Some(PatternID::must(2)), err.pattern());
§Example: zero patterns is valid
Building a regex with zero patterns results in a regex that never matches anything. Because this routine is generic, passing an empty slice usually requires a turbo-fish (or something else to help type inference).
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::new_many::<&str>(&[])?;
assert_eq!(None, re.find(""));
sourcepub fn config() -> Config
pub fn config() -> Config
Return a default configuration for a Regex
.
This is a convenience routine to avoid needing to import the Config
type when customizing the construction of a Regex
.
§Example: lower the NFA size limit
In some cases, the default size limit might be too big. The size limit can be lowered, which will prevent large regex patterns from compiling.
use regex_automata::meta::Regex;
let result = Regex::builder()
.configure(Regex::config().nfa_size_limit(Some(20 * (1<<10))))
// Not even 20KB is enough to build a single large Unicode class!
.build(r"\pL");
assert!(result.is_err());
sourcepub fn builder() -> Builder
pub fn builder() -> Builder
Return a builder for configuring the construction of a Regex
.
This is a convenience routine to avoid needing to import the
Builder
type in common cases.
§Example: change the line terminator
This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:
use regex_automata::{meta::Regex, util::syntax, Match};
let re = Regex::builder()
.syntax(syntax::Config::new().multi_line(true))
.configure(Regex::config().line_terminator(b'\x00'))
.build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));
source§impl Regex
impl Regex
High level convenience routines for using a regex to search a haystack.
sourcepub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
pub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
Returns true if and only if this regex matches the given haystack.
This routine may short circuit if it knows that scanning future input
will never lead to a different result. (Consider how this might make
a difference given the regex a+
on the haystack aaaaaaaaaaaaaaa
.
This routine may stop after it sees the first a
, but routines like
find
need to continue searching because +
is greedy by default.)
§Example
use regex_automata::meta::Regex;
let re = Regex::new("foo[0-9]+bar")?;
assert!(re.is_match("foo12345bar"));
assert!(!re.is_match("foobar"));
§Example: consistency with search APIs
is_match
is guaranteed to return true
whenever find
returns a
match. This includes searches that are executed entirely within a
codepoint:
use regex_automata::{meta::Regex, Input};
let re = Regex::new("a*")?;
// This doesn't match because the default configuration bans empty
// matches from splitting a codepoint.
assert!(!re.is_match(Input::new("☃").span(1..2)));
assert_eq!(None, re.find(Input::new("☃").span(1..2)));
Notice that when UTF-8 mode is disabled, then the above reports a match because the restriction against zero-width matches that split a codepoint has been lifted:
use regex_automata::{meta::Regex, Input, Match};
let re = Regex::builder()
.configure(Regex::config().utf8_empty(false))
.build("a*")?;
assert!(re.is_match(Input::new("☃").span(1..2)));
assert_eq!(
Some(Match::must(0, 1..1)),
re.find(Input::new("☃").span(1..2)),
);
A similar idea applies when using line anchors with CRLF mode enabled,
which prevents them from matching between a \r
and a \n
.
use regex_automata::{meta::Regex, Input, Match};
let re = Regex::new(r"(?Rm:$)")?;
assert!(!re.is_match(Input::new("\r\n").span(1..1)));
// A regular line anchor, which only considers \n as a
// line terminator, will match.
let re = Regex::new(r"(?m:$)")?;
assert!(re.is_match(Input::new("\r\n").span(1..1)));
sourcepub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
pub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
Executes a leftmost search and returns the first match that is found, if one exists.
§Example
use regex_automata::{meta::Regex, Match};
let re = Regex::new("foo[0-9]+")?;
assert_eq!(Some(Match::must(0, 0..8)), re.find("foo12345"));
sourcepub fn captures<'h, I: Into<Input<'h>>>(&self, input: I, caps: &mut Captures)
pub fn captures<'h, I: Into<Input<'h>>>(&self, input: I, caps: &mut Captures)
Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided Captures
value. If no match was found, then Captures::is_match
is guaranteed
to return false
.
§Example
use regex_automata::{meta::Regex, Span};
let re = Regex::new(r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$")?;
let mut caps = re.create_captures();
re.captures("2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));
sourcepub fn find_iter<'r, 'h, I: Into<Input<'h>>>(
&'r self,
input: I,
) -> FindMatches<'r, 'h> ⓘ
pub fn find_iter<'r, 'h, I: Into<Input<'h>>>( &'r self, input: I, ) -> FindMatches<'r, 'h> ⓘ
Returns an iterator over all non-overlapping leftmost matches in the given haystack. If no match exists, then the iterator yields no elements.
§Example
use regex_automata::{meta::Regex, Match};
let re = Regex::new("foo[0-9]+")?;
let haystack = "foo1 foo12 foo123";
let matches: Vec<Match> = re.find_iter(haystack).collect();
assert_eq!(matches, vec![
Match::must(0, 0..4),
Match::must(0, 5..10),
Match::must(0, 11..17),
]);
sourcepub fn captures_iter<'r, 'h, I: Into<Input<'h>>>(
&'r self,
input: I,
) -> CapturesMatches<'r, 'h> ⓘ
pub fn captures_iter<'r, 'h, I: Into<Input<'h>>>( &'r self, input: I, ) -> CapturesMatches<'r, 'h> ⓘ
Returns an iterator over all non-overlapping Captures
values. If no
match exists, then the iterator yields no elements.
This yields the same matches as Regex::find_iter
, but it includes
the spans of all capturing groups that participate in each match.
Tip: See util::iter::Searcher
for
how to correctly iterate over all matches in a haystack while avoiding
the creation of a new Captures
value for every match. (Which you are
forced to do with an Iterator
.)
§Example
use regex_automata::{meta::Regex, Span};
let re = Regex::new("foo(?P<numbers>[0-9]+)")?;
let haystack = "foo1 foo12 foo123";
let matches: Vec<Span> = re
.captures_iter(haystack)
// The unwrap is OK since 'numbers' matches if the pattern matches.
.map(|caps| caps.get_group_by_name("numbers").unwrap())
.collect();
assert_eq!(matches, vec![
Span::from(3..4),
Span::from(8..10),
Span::from(14..17),
]);
sourcepub fn split<'r, 'h, I: Into<Input<'h>>>(&'r self, input: I) -> Split<'r, 'h> ⓘ
pub fn split<'r, 'h, I: Into<Input<'h>>>(&'r self, input: I) -> Split<'r, 'h> ⓘ
Returns an iterator of spans of the haystack given, delimited by a match of the regex. Namely, each element of the iterator corresponds to a part of the haystack that isn’t matched by the regular expression.
§Example
To split a string delimited by arbitrary amounts of spaces or tabs:
use regex_automata::meta::Regex;
let re = Regex::new(r"[ \t]+")?;
let hay = "a b \t c\td e";
let fields: Vec<&str> = re.split(hay).map(|span| &hay[span]).collect();
assert_eq!(fields, vec!["a", "b", "c", "d", "e"]);
§Example: more cases
Basic usage:
use regex_automata::meta::Regex;
let re = Regex::new(r" ")?;
let hay = "Mary had a little lamb";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["Mary", "had", "a", "little", "lamb"]);
let re = Regex::new(r"X")?;
let hay = "";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec![""]);
let re = Regex::new(r"X")?;
let hay = "lionXXtigerXleopard";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["lion", "", "tiger", "leopard"]);
let re = Regex::new(r"::")?;
let hay = "lion::tiger::leopard";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["lion", "tiger", "leopard"]);
If a haystack contains multiple contiguous matches, you will end up with empty spans yielded by the iterator:
use regex_automata::meta::Regex;
let re = Regex::new(r"X")?;
let hay = "XXXXaXXbXc";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["", "", "", "", "a", "", "b", "c"]);
let re = Regex::new(r"/")?;
let hay = "(///)";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["(", "", "", ")"]);
Separators at the start or end of a haystack are neighbored by empty spans.
use regex_automata::meta::Regex;
let re = Regex::new(r"0")?;
let hay = "010";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["", "1", ""]);
When the empty string is used as a regex, it splits at every valid UTF-8 boundary by default (which includes the beginning and end of the haystack):
use regex_automata::meta::Regex;
let re = Regex::new(r"")?;
let hay = "rust";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["", "r", "u", "s", "t", ""]);
// Splitting by an empty string is UTF-8 aware by default!
let re = Regex::new(r"")?;
let hay = "☃";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["", "☃", ""]);
But note that UTF-8 mode for empty strings can be disabled, which will then result in a match at every byte offset in the haystack, including between every UTF-8 code unit.
use regex_automata::meta::Regex;
let re = Regex::builder()
.configure(Regex::config().utf8_empty(false))
.build(r"")?;
let hay = "☃".as_bytes();
let got: Vec<&[u8]> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec![
// Writing byte string slices is just brutal. The problem is that
// b"foo" has type &[u8; 3] instead of &[u8].
&[][..], &[b'\xE2'][..], &[b'\x98'][..], &[b'\x83'][..], &[][..],
]);
Contiguous separators (commonly shows up with whitespace), can lead to possibly surprising behavior. For example, this code is correct:
use regex_automata::meta::Regex;
let re = Regex::new(r" ")?;
let hay = " a b c";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["", "", "", "", "a", "", "b", "c"]);
It does not give you ["a", "b", "c"]
. For that behavior, you’d want
to match contiguous space characters:
use regex_automata::meta::Regex;
let re = Regex::new(r" +")?;
let hay = " a b c";
let got: Vec<&str> = re.split(hay).map(|sp| &hay[sp]).collect();
// N.B. This does still include a leading empty span because ' +'
// matches at the beginning of the haystack.
assert_eq!(got, vec!["", "a", "b", "c"]);
sourcepub fn splitn<'r, 'h, I: Into<Input<'h>>>(
&'r self,
input: I,
limit: usize,
) -> SplitN<'r, 'h> ⓘ
pub fn splitn<'r, 'h, I: Into<Input<'h>>>( &'r self, input: I, limit: usize, ) -> SplitN<'r, 'h> ⓘ
Returns an iterator of at most limit
spans of the haystack given,
delimited by a match of the regex. (A limit
of 0
will return no
spans.) Namely, each element of the iterator corresponds to a part
of the haystack that isn’t matched by the regular expression. The
remainder of the haystack that is not split will be the last element in
the iterator.
§Example
Get the first two words in some haystack:
use regex_automata::meta::Regex;
let re = Regex::new(r"\W+").unwrap();
let hay = "Hey! How are you?";
let fields: Vec<&str> =
re.splitn(hay, 3).map(|span| &hay[span]).collect();
assert_eq!(fields, vec!["Hey", "How", "are you?"]);
§Examples: more cases
use regex_automata::meta::Regex;
let re = Regex::new(r" ")?;
let hay = "Mary had a little lamb";
let got: Vec<&str> = re.splitn(hay, 3).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["Mary", "had", "a little lamb"]);
let re = Regex::new(r"X")?;
let hay = "";
let got: Vec<&str> = re.splitn(hay, 3).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec![""]);
let re = Regex::new(r"X")?;
let hay = "lionXXtigerXleopard";
let got: Vec<&str> = re.splitn(hay, 3).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["lion", "", "tigerXleopard"]);
let re = Regex::new(r"::")?;
let hay = "lion::tiger::leopard";
let got: Vec<&str> = re.splitn(hay, 2).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["lion", "tiger::leopard"]);
let re = Regex::new(r"X")?;
let hay = "abcXdef";
let got: Vec<&str> = re.splitn(hay, 1).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["abcXdef"]);
let re = Regex::new(r"X")?;
let hay = "abcdef";
let got: Vec<&str> = re.splitn(hay, 2).map(|sp| &hay[sp]).collect();
assert_eq!(got, vec!["abcdef"]);
let re = Regex::new(r"X")?;
let hay = "abcXdef";
let got: Vec<&str> = re.splitn(hay, 0).map(|sp| &hay[sp]).collect();
assert!(got.is_empty());
source§impl Regex
impl Regex
Lower level search routines that give more control.
sourcepub fn search(&self, input: &Input<'_>) -> Option<Match>
pub fn search(&self, input: &Input<'_>) -> Option<Match>
Returns the start and end offset of the leftmost match. If no match
exists, then None
is returned.
This is like Regex::find
but, but it accepts a concrete &Input
instead of an Into<Input>
.
§Example
use regex_automata::{meta::Regex, Input, Match};
let re = Regex::new(r"Samwise|Sam")?;
let input = Input::new(
"one of the chief characters, Samwise the Brave",
);
assert_eq!(Some(Match::must(0, 29..36)), re.search(&input));
sourcepub fn search_half(&self, input: &Input<'_>) -> Option<HalfMatch>
pub fn search_half(&self, input: &Input<'_>) -> Option<HalfMatch>
Returns the end offset of the leftmost match. If no match exists, then
None
is returned.
This is distinct from Regex::search
in that it only returns the end
of a match and not the start of the match. Depending on a variety of
implementation details, this may permit the regex engine to do less
overall work. For example, if a DFA is being used to execute a search,
then the start of a match usually requires running a separate DFA in
reverse to the find the start of a match. If one only needs the end of
a match, then the separate reverse scan to find the start of a match
can be skipped. (Note that the reverse scan is avoided even when using
Regex::search
when possible, for example, in the case of an anchored
search.)
§Example
use regex_automata::{meta::Regex, Input, HalfMatch};
let re = Regex::new(r"Samwise|Sam")?;
let input = Input::new(
"one of the chief characters, Samwise the Brave",
);
assert_eq!(Some(HalfMatch::must(0, 36)), re.search_half(&input));
sourcepub fn search_captures(&self, input: &Input<'_>, caps: &mut Captures)
pub fn search_captures(&self, input: &Input<'_>, caps: &mut Captures)
Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided Captures
value. If no match was found, then Captures::is_match
is guaranteed
to return false
.
This is like Regex::captures
, but it accepts a concrete &Input
instead of an Into<Input>
.
§Example: specific pattern search
This example shows how to build a multi-pattern Regex
that permits
searching for specific patterns.
use regex_automata::{
meta::Regex,
Anchored, Match, PatternID, Input,
};
let re = Regex::new_many(&["[a-z0-9]{6}", "[a-z][a-z0-9]{5}"])?;
let mut caps = re.create_captures();
let haystack = "foo123";
// Since we are using the default leftmost-first match and both
// patterns match at the same starting position, only the first pattern
// will be returned in this case when doing a search for any of the
// patterns.
let expected = Some(Match::must(0, 0..6));
re.search_captures(&Input::new(haystack), &mut caps);
assert_eq!(expected, caps.get_match());
// But if we want to check whether some other pattern matches, then we
// can provide its pattern ID.
let expected = Some(Match::must(1, 0..6));
let input = Input::new(haystack)
.anchored(Anchored::Pattern(PatternID::must(1)));
re.search_captures(&input, &mut caps);
assert_eq!(expected, caps.get_match());
§Example: specifying the bounds of a search
This example shows how providing the bounds of a search can produce different results than simply sub-slicing the haystack.
use regex_automata::{meta::Regex, Match, Input};
let re = Regex::new(r"\b[0-9]{3}\b")?;
let mut caps = re.create_captures();
let haystack = "foo123bar";
// Since we sub-slice the haystack, the search doesn't know about
// the larger context and assumes that `123` is surrounded by word
// boundaries. And of course, the match position is reported relative
// to the sub-slice as well, which means we get `0..3` instead of
// `3..6`.
let expected = Some(Match::must(0, 0..3));
let input = Input::new(&haystack[3..6]);
re.search_captures(&input, &mut caps);
assert_eq!(expected, caps.get_match());
// But if we provide the bounds of the search within the context of the
// entire haystack, then the search can take the surrounding context
// into account. (And if we did find a match, it would be reported
// as a valid offset into `haystack` instead of its sub-slice.)
let expected = None;
let input = Input::new(haystack).range(3..6);
re.search_captures(&input, &mut caps);
assert_eq!(expected, caps.get_match());
sourcepub fn search_slots(
&self,
input: &Input<'_>,
slots: &mut [Option<NonMaxUsize>],
) -> Option<PatternID>
pub fn search_slots( &self, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>], ) -> Option<PatternID>
Executes a leftmost forward search and writes the spans of capturing
groups that participated in a match into the provided slots
, and
returns the matching pattern ID. The contents of the slots for patterns
other than the matching pattern are unspecified. If no match was found,
then None
is returned and the contents of slots
is unspecified.
This is like Regex::search
, but it accepts a raw slots slice
instead of a Captures
value. This is useful in contexts where you
don’t want or need to allocate a Captures
.
It is legal to pass any number of slots to this routine. If the regex engine would otherwise write a slot offset that doesn’t fit in the provided slice, then it is simply skipped. In general though, there are usually three slice lengths you might want to use:
- An empty slice, if you only care about which pattern matched.
- A slice with
pattern_len() * 2
slots, if you only care about the overall match spans for each matching pattern. - A slice with
slot_len()
slots, which permits recording match offsets for every capturing group in every pattern.
§Example
This example shows how to find the overall match offsets in a
multi-pattern search without allocating a Captures
value. Indeed, we
can put our slots right on the stack.
use regex_automata::{meta::Regex, PatternID, Input};
let re = Regex::new_many(&[
r"\pL+",
r"\d+",
])?;
let input = Input::new("!@#123");
// We only care about the overall match offsets here, so we just
// allocate two slots for each pattern. Each slot records the start
// and end of the match.
let mut slots = [None; 4];
let pid = re.search_slots(&input, &mut slots);
assert_eq!(Some(PatternID::must(1)), pid);
// The overall match offsets are always at 'pid * 2' and 'pid * 2 + 1'.
// See 'GroupInfo' for more details on the mapping between groups and
// slot indices.
let slot_start = pid.unwrap().as_usize() * 2;
let slot_end = slot_start + 1;
assert_eq!(Some(3), slots[slot_start].map(|s| s.get()));
assert_eq!(Some(6), slots[slot_end].map(|s| s.get()));
sourcepub fn which_overlapping_matches(
&self,
input: &Input<'_>,
patset: &mut PatternSet,
)
pub fn which_overlapping_matches( &self, input: &Input<'_>, patset: &mut PatternSet, )
Writes the set of patterns that match anywhere in the given search
configuration to patset
. If multiple patterns match at the same
position and this Regex
was configured with MatchKind::All
semantics, then all matching patterns are written to the given set.
Unless all of the patterns in this Regex
are anchored, then generally
speaking, this will scan the entire haystack.
This search routine does not clear the pattern set. This gives some flexibility to the caller (e.g., running multiple searches with the same pattern set), but does make the API bug-prone if you’re reusing the same pattern set for multiple searches but intended them to be independent.
If a pattern ID matched but the given PatternSet
does not have
sufficient capacity to store it, then it is not inserted and silently
dropped.
§Example
This example shows how to find all matching patterns in a haystack,
even when some patterns match at the same position as other patterns.
It is important that we configure the Regex
with MatchKind::All
semantics here, or else overlapping matches will not be reported.
use regex_automata::{meta::Regex, Input, MatchKind, PatternSet};
let patterns = &[
r"\w+", r"\d+", r"\pL+", r"foo", r"bar", r"barfoo", r"foobar",
];
let re = Regex::builder()
.configure(Regex::config().match_kind(MatchKind::All))
.build_many(patterns)?;
let input = Input::new("foobar");
let mut patset = PatternSet::new(re.pattern_len());
re.which_overlapping_matches(&input, &mut patset);
let expected = vec![0, 2, 3, 4, 6];
let got: Vec<usize> = patset.iter().map(|p| p.as_usize()).collect();
assert_eq!(expected, got);
source§impl Regex
impl Regex
Lower level search routines that give more control, and require the caller
to provide an explicit Cache
parameter.
sourcepub fn search_with(&self, cache: &mut Cache, input: &Input<'_>) -> Option<Match>
pub fn search_with(&self, cache: &mut Cache, input: &Input<'_>) -> Option<Match>
This is like Regex::search
, but requires the caller to
explicitly pass a Cache
.
§Why pass a Cache
explicitly?
Passing a Cache
explicitly will bypass the use of an internal memory
pool used by Regex
to get a Cache
for a search. The use of this
pool can be slower in some cases when a Regex
is used from multiple
threads simultaneously. Typically, performance only becomes an issue
when there is heavy contention, which in turn usually only occurs
when each thread’s primary unit of work is a regex search on a small
haystack.
§Example
use regex_automata::{meta::Regex, Input, Match};
let re = Regex::new(r"Samwise|Sam")?;
let mut cache = re.create_cache();
let input = Input::new(
"one of the chief characters, Samwise the Brave",
);
assert_eq!(
Some(Match::must(0, 29..36)),
re.search_with(&mut cache, &input),
);
sourcepub fn search_half_with(
&self,
cache: &mut Cache,
input: &Input<'_>,
) -> Option<HalfMatch>
pub fn search_half_with( &self, cache: &mut Cache, input: &Input<'_>, ) -> Option<HalfMatch>
This is like Regex::search_half
, but requires the caller to
explicitly pass a Cache
.
§Why pass a Cache
explicitly?
Passing a Cache
explicitly will bypass the use of an internal memory
pool used by Regex
to get a Cache
for a search. The use of this
pool can be slower in some cases when a Regex
is used from multiple
threads simultaneously. Typically, performance only becomes an issue
when there is heavy contention, which in turn usually only occurs
when each thread’s primary unit of work is a regex search on a small
haystack.
§Example
use regex_automata::{meta::Regex, Input, HalfMatch};
let re = Regex::new(r"Samwise|Sam")?;
let mut cache = re.create_cache();
let input = Input::new(
"one of the chief characters, Samwise the Brave",
);
assert_eq!(
Some(HalfMatch::must(0, 36)),
re.search_half_with(&mut cache, &input),
);
sourcepub fn search_captures_with(
&self,
cache: &mut Cache,
input: &Input<'_>,
caps: &mut Captures,
)
pub fn search_captures_with( &self, cache: &mut Cache, input: &Input<'_>, caps: &mut Captures, )
This is like Regex::search_captures
, but requires the caller to
explicitly pass a Cache
.
§Why pass a Cache
explicitly?
Passing a Cache
explicitly will bypass the use of an internal memory
pool used by Regex
to get a Cache
for a search. The use of this
pool can be slower in some cases when a Regex
is used from multiple
threads simultaneously. Typically, performance only becomes an issue
when there is heavy contention, which in turn usually only occurs
when each thread’s primary unit of work is a regex search on a small
haystack.
§Example: specific pattern search
This example shows how to build a multi-pattern Regex
that permits
searching for specific patterns.
use regex_automata::{
meta::Regex,
Anchored, Match, PatternID, Input,
};
let re = Regex::new_many(&["[a-z0-9]{6}", "[a-z][a-z0-9]{5}"])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "foo123";
// Since we are using the default leftmost-first match and both
// patterns match at the same starting position, only the first pattern
// will be returned in this case when doing a search for any of the
// patterns.
let expected = Some(Match::must(0, 0..6));
re.search_captures_with(&mut cache, &Input::new(haystack), &mut caps);
assert_eq!(expected, caps.get_match());
// But if we want to check whether some other pattern matches, then we
// can provide its pattern ID.
let expected = Some(Match::must(1, 0..6));
let input = Input::new(haystack)
.anchored(Anchored::Pattern(PatternID::must(1)));
re.search_captures_with(&mut cache, &input, &mut caps);
assert_eq!(expected, caps.get_match());
§Example: specifying the bounds of a search
This example shows how providing the bounds of a search can produce different results than simply sub-slicing the haystack.
use regex_automata::{meta::Regex, Match, Input};
let re = Regex::new(r"\b[0-9]{3}\b")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
let haystack = "foo123bar";
// Since we sub-slice the haystack, the search doesn't know about
// the larger context and assumes that `123` is surrounded by word
// boundaries. And of course, the match position is reported relative
// to the sub-slice as well, which means we get `0..3` instead of
// `3..6`.
let expected = Some(Match::must(0, 0..3));
let input = Input::new(&haystack[3..6]);
re.search_captures_with(&mut cache, &input, &mut caps);
assert_eq!(expected, caps.get_match());
// But if we provide the bounds of the search within the context of the
// entire haystack, then the search can take the surrounding context
// into account. (And if we did find a match, it would be reported
// as a valid offset into `haystack` instead of its sub-slice.)
let expected = None;
let input = Input::new(haystack).range(3..6);
re.search_captures_with(&mut cache, &input, &mut caps);
assert_eq!(expected, caps.get_match());
sourcepub fn search_slots_with(
&self,
cache: &mut Cache,
input: &Input<'_>,
slots: &mut [Option<NonMaxUsize>],
) -> Option<PatternID>
pub fn search_slots_with( &self, cache: &mut Cache, input: &Input<'_>, slots: &mut [Option<NonMaxUsize>], ) -> Option<PatternID>
This is like Regex::search_slots
, but requires the caller to
explicitly pass a Cache
.
§Why pass a Cache
explicitly?
Passing a Cache
explicitly will bypass the use of an internal memory
pool used by Regex
to get a Cache
for a search. The use of this
pool can be slower in some cases when a Regex
is used from multiple
threads simultaneously. Typically, performance only becomes an issue
when there is heavy contention, which in turn usually only occurs
when each thread’s primary unit of work is a regex search on a small
haystack.
§Example
This example shows how to find the overall match offsets in a
multi-pattern search without allocating a Captures
value. Indeed, we
can put our slots right on the stack.
use regex_automata::{meta::Regex, PatternID, Input};
let re = Regex::new_many(&[
r"\pL+",
r"\d+",
])?;
let mut cache = re.create_cache();
let input = Input::new("!@#123");
// We only care about the overall match offsets here, so we just
// allocate two slots for each pattern. Each slot records the start
// and end of the match.
let mut slots = [None; 4];
let pid = re.search_slots_with(&mut cache, &input, &mut slots);
assert_eq!(Some(PatternID::must(1)), pid);
// The overall match offsets are always at 'pid * 2' and 'pid * 2 + 1'.
// See 'GroupInfo' for more details on the mapping between groups and
// slot indices.
let slot_start = pid.unwrap().as_usize() * 2;
let slot_end = slot_start + 1;
assert_eq!(Some(3), slots[slot_start].map(|s| s.get()));
assert_eq!(Some(6), slots[slot_end].map(|s| s.get()));
sourcepub fn which_overlapping_matches_with(
&self,
cache: &mut Cache,
input: &Input<'_>,
patset: &mut PatternSet,
)
pub fn which_overlapping_matches_with( &self, cache: &mut Cache, input: &Input<'_>, patset: &mut PatternSet, )
This is like Regex::which_overlapping_matches
, but requires the
caller to explicitly pass a Cache
.
Passing a Cache
explicitly will bypass the use of an internal memory
pool used by Regex
to get a Cache
for a search. The use of this
pool can be slower in some cases when a Regex
is used from multiple
threads simultaneously. Typically, performance only becomes an issue
when there is heavy contention, which in turn usually only occurs
when each thread’s primary unit of work is a regex search on a small
haystack.
§Why pass a Cache
explicitly?
§Example
use regex_automata::{meta::Regex, Input, MatchKind, PatternSet};
let patterns = &[
r"\w+", r"\d+", r"\pL+", r"foo", r"bar", r"barfoo", r"foobar",
];
let re = Regex::builder()
.configure(Regex::config().match_kind(MatchKind::All))
.build_many(patterns)?;
let mut cache = re.create_cache();
let input = Input::new("foobar");
let mut patset = PatternSet::new(re.pattern_len());
re.which_overlapping_matches_with(&mut cache, &input, &mut patset);
let expected = vec![0, 2, 3, 4, 6];
let got: Vec<usize> = patset.iter().map(|p| p.as_usize()).collect();
assert_eq!(expected, got);
source§impl Regex
impl Regex
Various non-search routines for querying properties of a Regex
and
convenience routines for creating Captures
and Cache
values.
sourcepub fn create_captures(&self) -> Captures
pub fn create_captures(&self) -> Captures
Creates a new object for recording capture group offsets. This is used
in search APIs like Regex::captures
and Regex::search_captures
.
This is a convenience routine for
Captures::all(re.group_info().clone())
. Callers may build other types
of Captures
values that record less information (and thus require
less work from the regex engine) using Captures::matches
and
Captures::empty
.
§Example
This shows some alternatives to Regex::create_captures
:
use regex_automata::{
meta::Regex,
util::captures::Captures,
Match, PatternID, Span,
};
let re = Regex::new(r"(?<first>[A-Z][a-z]+) (?<last>[A-Z][a-z]+)")?;
// This is equivalent to Regex::create_captures. It stores matching
// offsets for all groups in the regex.
let mut all = Captures::all(re.group_info().clone());
re.captures("Bruce Springsteen", &mut all);
assert_eq!(Some(Match::must(0, 0..17)), all.get_match());
assert_eq!(Some(Span::from(0..5)), all.get_group_by_name("first"));
assert_eq!(Some(Span::from(6..17)), all.get_group_by_name("last"));
// In this version, we only care about the implicit groups, which
// means offsets for the explicit groups will be unavailable. It can
// sometimes be faster to ask for fewer groups, since the underlying
// regex engine needs to do less work to keep track of them.
let mut matches = Captures::matches(re.group_info().clone());
re.captures("Bruce Springsteen", &mut matches);
// We still get the overall match info.
assert_eq!(Some(Match::must(0, 0..17)), matches.get_match());
// But now the explicit groups are unavailable.
assert_eq!(None, matches.get_group_by_name("first"));
assert_eq!(None, matches.get_group_by_name("last"));
// Finally, in this version, we don't ask to keep track of offsets for
// *any* groups. All we get back is whether a match occurred, and if
// so, the ID of the pattern that matched.
let mut empty = Captures::empty(re.group_info().clone());
re.captures("Bruce Springsteen", &mut empty);
// it's a match!
assert!(empty.is_match());
// for pattern ID 0
assert_eq!(Some(PatternID::ZERO), empty.pattern());
// Match offsets are unavailable.
assert_eq!(None, empty.get_match());
// And of course, explicit groups are unavailable too.
assert_eq!(None, empty.get_group_by_name("first"));
assert_eq!(None, empty.get_group_by_name("last"));
sourcepub fn create_cache(&self) -> Cache
pub fn create_cache(&self) -> Cache
Creates a new cache for use with lower level search APIs like
Regex::search_with
.
The cache returned should only be used for searches for this Regex
.
If you want to reuse the cache for another Regex
, then you must call
Cache::reset
with that Regex
.
This is a convenience routine for Cache::new
.
§Example
use regex_automata::{meta::Regex, Input, Match};
let re = Regex::new(r"(?-u)m\w+\s+m\w+")?;
let mut cache = re.create_cache();
let input = Input::new("crazy janey and her mission man");
assert_eq!(
Some(Match::must(0, 20..31)),
re.search_with(&mut cache, &input),
);
sourcepub fn pattern_len(&self) -> usize
pub fn pattern_len(&self) -> usize
Returns the total number of patterns in this regex.
The standard Regex::new
constructor always results in a Regex
with a single pattern, but Regex::new_many
permits building a
multi-pattern regex.
A Regex
guarantees that the maximum possible PatternID
returned in
any match is Regex::pattern_len() - 1
. In the case where the number
of patterns is 0
, a match is impossible.
§Example
use regex_automata::meta::Regex;
let re = Regex::new(r"(?m)^[a-z]$")?;
assert_eq!(1, re.pattern_len());
let re = Regex::new_many::<&str>(&[])?;
assert_eq!(0, re.pattern_len());
let re = Regex::new_many(&["a", "b", "c"])?;
assert_eq!(3, re.pattern_len());
sourcepub fn captures_len(&self) -> usize
pub fn captures_len(&self) -> usize
Returns the total number of capturing groups.
This includes the implicit capturing group corresponding to the
entire match. Therefore, the minimum value returned is 1
.
§Example
This shows a few patterns and how many capture groups they have.
use regex_automata::meta::Regex;
let len = |pattern| {
Regex::new(pattern).map(|re| re.captures_len())
};
assert_eq!(1, len("a")?);
assert_eq!(2, len("(a)")?);
assert_eq!(3, len("(a)|(b)")?);
assert_eq!(5, len("(a)(b)|(c)(d)")?);
assert_eq!(2, len("(a)|b")?);
assert_eq!(2, len("a|(b)")?);
assert_eq!(2, len("(b)*")?);
assert_eq!(2, len("(b)+")?);
§Example: multiple patterns
This routine also works for multiple patterns. The total number is the sum of the capture groups of each pattern.
use regex_automata::meta::Regex;
let len = |patterns| {
Regex::new_many(patterns).map(|re| re.captures_len())
};
assert_eq!(2, len(&["a", "b"])?);
assert_eq!(4, len(&["(a)", "(b)"])?);
assert_eq!(6, len(&["(a)|(b)", "(c)|(d)"])?);
assert_eq!(8, len(&["(a)(b)|(c)(d)", "(x)(y)"])?);
assert_eq!(3, len(&["(a)", "b"])?);
assert_eq!(3, len(&["a", "(b)"])?);
assert_eq!(4, len(&["(a)", "(b)*"])?);
assert_eq!(4, len(&["(a)+", "(b)+"])?);
sourcepub fn static_captures_len(&self) -> Option<usize>
pub fn static_captures_len(&self) -> Option<usize>
Returns the total number of capturing groups that appear in every possible match.
If the number of capture groups can vary depending on the match, then
this returns None
. That is, a value is only returned when the number
of matching groups is invariant or “static.”
Note that like Regex::captures_len
, this does include the
implicit capturing group corresponding to the entire match. Therefore,
when a non-None value is returned, it is guaranteed to be at least 1
.
Stated differently, a return value of Some(0)
is impossible.
§Example
This shows a few cases where a static number of capture groups is available and a few cases where it is not.
use regex_automata::meta::Regex;
let len = |pattern| {
Regex::new(pattern).map(|re| re.static_captures_len())
};
assert_eq!(Some(1), len("a")?);
assert_eq!(Some(2), len("(a)")?);
assert_eq!(Some(2), len("(a)|(b)")?);
assert_eq!(Some(3), len("(a)(b)|(c)(d)")?);
assert_eq!(None, len("(a)|b")?);
assert_eq!(None, len("a|(b)")?);
assert_eq!(None, len("(b)*")?);
assert_eq!(Some(2), len("(b)+")?);
§Example: multiple patterns
This property extends to regexes with multiple patterns as well. In order for their to be a static number of capture groups in this case, every pattern must have the same static number.
use regex_automata::meta::Regex;
let len = |patterns| {
Regex::new_many(patterns).map(|re| re.static_captures_len())
};
assert_eq!(Some(1), len(&["a", "b"])?);
assert_eq!(Some(2), len(&["(a)", "(b)"])?);
assert_eq!(Some(2), len(&["(a)|(b)", "(c)|(d)"])?);
assert_eq!(Some(3), len(&["(a)(b)|(c)(d)", "(x)(y)"])?);
assert_eq!(None, len(&["(a)", "b"])?);
assert_eq!(None, len(&["a", "(b)"])?);
assert_eq!(None, len(&["(a)", "(b)*"])?);
assert_eq!(Some(2), len(&["(a)+", "(b)+"])?);
sourcepub fn group_info(&self) -> &GroupInfo
pub fn group_info(&self) -> &GroupInfo
Return information about the capture groups in this Regex
.
A GroupInfo
is an immutable object that can be cheaply cloned. It
is responsible for maintaining a mapping between the capture groups
in the concrete syntax of zero or more regex patterns and their
internal representation used by some of the regex matchers. It is also
responsible for maintaining a mapping between the name of each group
(if one exists) and its corresponding group index.
A GroupInfo
is ultimately what is used to build a Captures
value,
which is some mutable space where group offsets are stored as a result
of a search.
§Example
This shows some alternatives to Regex::create_captures
:
use regex_automata::{
meta::Regex,
util::captures::Captures,
Match, PatternID, Span,
};
let re = Regex::new(r"(?<first>[A-Z][a-z]+) (?<last>[A-Z][a-z]+)")?;
// This is equivalent to Regex::create_captures. It stores matching
// offsets for all groups in the regex.
let mut all = Captures::all(re.group_info().clone());
re.captures("Bruce Springsteen", &mut all);
assert_eq!(Some(Match::must(0, 0..17)), all.get_match());
assert_eq!(Some(Span::from(0..5)), all.get_group_by_name("first"));
assert_eq!(Some(Span::from(6..17)), all.get_group_by_name("last"));
// In this version, we only care about the implicit groups, which
// means offsets for the explicit groups will be unavailable. It can
// sometimes be faster to ask for fewer groups, since the underlying
// regex engine needs to do less work to keep track of them.
let mut matches = Captures::matches(re.group_info().clone());
re.captures("Bruce Springsteen", &mut matches);
// We still get the overall match info.
assert_eq!(Some(Match::must(0, 0..17)), matches.get_match());
// But now the explicit groups are unavailable.
assert_eq!(None, matches.get_group_by_name("first"));
assert_eq!(None, matches.get_group_by_name("last"));
// Finally, in this version, we don't ask to keep track of offsets for
// *any* groups. All we get back is whether a match occurred, and if
// so, the ID of the pattern that matched.
let mut empty = Captures::empty(re.group_info().clone());
re.captures("Bruce Springsteen", &mut empty);
// it's a match!
assert!(empty.is_match());
// for pattern ID 0
assert_eq!(Some(PatternID::ZERO), empty.pattern());
// Match offsets are unavailable.
assert_eq!(None, empty.get_match());
// And of course, explicit groups are unavailable too.
assert_eq!(None, empty.get_group_by_name("first"));
assert_eq!(None, empty.get_group_by_name("last"));
sourcepub fn get_config(&self) -> &Config
pub fn get_config(&self) -> &Config
Returns the configuration object used to build this Regex
.
If no configuration object was explicitly passed, then the configuration returned represents the default.
sourcepub fn is_accelerated(&self) -> bool
pub fn is_accelerated(&self) -> bool
Returns true if this regex has a high chance of being “accelerated.”
The precise meaning of “accelerated” is specifically left unspecified, but the general meaning is that the search is a high likelihood of running faster than a character-at-a-time loop inside a standard regex engine.
When a regex is accelerated, it is only a probabilistic claim. That
is, just because the regex is believed to be accelerated, that doesn’t
mean it will definitely execute searches very fast. Similarly, if a
regex is not accelerated, that is also a probabilistic claim. That
is, a regex for which is_accelerated
returns false
could still run
searches more quickly than a regex for which is_accelerated
returns
true
.
Whether a regex is marked as accelerated or not is dependent on
implementations details that may change in a semver compatible release.
That is, a regex that is accelerated in a x.y.1
release might not be
accelerated in a x.y.2
release.
Basically, the value of acceleration boils down to a hedge: a hodge podge of internal heuristics combine to make a probabilistic guess that this regex search may run “fast.” The value in knowing this from a caller’s perspective is that it may act as a signal that no further work should be done to accelerate a search. For example, a grep-like tool might try to do some extra work extracting literals from a regex to create its own heuristic acceleration strategies. But it might choose to defer to this crate’s acceleration strategy if one exists. This routine permits querying whether such a strategy is active for a particular regex.
§Example
use regex_automata::meta::Regex;
// A simple literal is very likely to be accelerated.
let re = Regex::new(r"foo")?;
assert!(re.is_accelerated());
// A regex with no literals is likely to not be accelerated.
let re = Regex::new(r"\w")?;
assert!(!re.is_accelerated());
sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Return the total approximate heap memory, in bytes, used by this Regex
.
Note that currently, there is no high level configuration for setting a limit on the specific value returned by this routine. Instead, the following routines can be used to control heap memory at a bit of a lower level:
Config::nfa_size_limit
controls how big any of the NFAs are allowed to be.Config::onepass_size_limit
controls how big the one-pass DFA is allowed to be.Config::hybrid_cache_capacity
controls how much memory the lazy DFA is permitted to allocate to store its transition table.Config::dfa_size_limit
controls how big a fully compiled DFA is allowed to be.Config::dfa_state_limit
controls the conditions under which the meta regex engine will even attempt to build a fully compiled DFA.