Struct regex_automata::util::captures::Captures
source · pub struct Captures {
group_info: GroupInfo,
pid: Option<PatternID>,
slots: Vec<Option<NonMaxUsize>>,
}
Expand description
The span offsets of capturing groups after a match has been found.
This type represents the output of regex engines that can report the
offsets at which capturing groups matches or “submatches” occur. For
example, the PikeVM
. When a match
occurs, it will at minimum contain the PatternID
of the pattern that
matched. Depending upon how it was constructed, it may also contain the
start/end offsets of the entire match of the pattern and the start/end
offsets of each capturing group that participated in the match.
Values of this type are always created for a specific GroupInfo
. It is
unspecified behavior to use a Captures
value in a search with any regex
engine that has a different GroupInfo
than the one the Captures
were
created with.
§Constructors
There are three constructors for this type that control what kind of information is available upon a match:
Captures::all
: Will store overall pattern match offsets in addition to the offsets of capturing groups that participated in the match.Captures::matches
: Will store only the overall pattern match offsets. The offsets of capturing groups (even ones that participated in the match) are not available.Captures::empty
: Will only store the pattern ID that matched. No match offsets are available at all.
If you aren’t sure which to choose, then pick the first one. The first one
is what convenience routines like,
PikeVM::create_captures
,
will use automatically.
The main difference between these choices is performance. Namely, if you ask for less information, then the execution of regex search may be able to run more quickly.
§Notes
It is worth pointing out that this type is not coupled to any one specific
regex engine. Instead, its coupling is with GroupInfo
, which is the
thing that is responsible for mapping capturing groups to “slot” offsets.
Slot offsets are indices into a single sequence of memory at which matching
haystack offsets for the corresponding group are written by regex engines.
§Example
This example shows how to parse a simple date and extract the components of the date via capturing groups:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));
§Example: named capturing groups
This example is like the one above, but leverages the ability to name capturing groups in order to make the code a bit clearer:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(r"^(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group_by_name("y"));
assert_eq!(Some(Span::from(5..7)), caps.get_group_by_name("m"));
assert_eq!(Some(Span::from(8..10)), caps.get_group_by_name("d"));
Fields§
§group_info: GroupInfo
The group info that these capture groups are coupled to. This is what
gives the “convenience” of the Captures
API. Namely, it provides the
slot mapping and the name|–>index mapping for capture lookups by name.
pid: Option<PatternID>
The ID of the pattern that matched. Regex engines must set this to None when no match occurs.
slots: Vec<Option<NonMaxUsize>>
The slot values, i.e., submatch offsets.
In theory, the smallest sequence of slots would be something like
max(groups(pattern) for pattern in regex) * 2
, but instead, we use
sum(groups(pattern) for pattern in regex) * 2
. Why?
Well, the former could be used in theory, because we don’t generally have any overlapping APIs that involve capturing groups. Therefore, there’s technically never any need to have slots set for multiple patterns. However, this might change some day, in which case, we would need to have slots available.
The other reason is that during the execution of some regex engines,
there exists a point in time where multiple slots for different
patterns may be written to before knowing which pattern has matched.
Therefore, the regex engines themselves, in order to support multiple
patterns correctly, must have all slots available. If Captures
doesn’t have all slots available, then regex engines can’t write
directly into the caller provided Captures
and must instead write
into some other storage and then copy the slots involved in the match
at the end of the search.
So overall, at least as of the time of writing, it seems like the path of least resistance is to just require allocating all possible slots instead of the conceptual minimum. Another way to justify this is that the most common case is a single pattern, in which case, there is no inefficiency here since the ‘max’ and ‘sum’ calculations above are equivalent in that case.
N.B. The mapping from group index to slot is maintained by GroupInfo
and is considered an API guarantee. See GroupInfo
for more details on
that mapping.
N.B. Option<NonMaxUsize>
has the same size as a usize
.
Implementations§
source§impl Captures
impl Captures
sourcepub fn all(group_info: GroupInfo) -> Captures
pub fn all(group_info: GroupInfo) -> Captures
Create new storage for the offsets of all matching capturing groups.
This routine provides the most information for matches—namely, the spans of matching capturing groups—but also requires the regex search routines to do the most work.
It is unspecified behavior to use the returned Captures
value in a
search with a GroupInfo
other than the one that is provided to this
constructor.
§Example
This example shows that all capturing groups—but only ones that participated in a match—are available to query after a match has been found:
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
Span, Match,
};
let re = PikeVM::new(
r"^(?:(?P<lower>[a-z]+)|(?P<upper>[A-Z]+))(?P<digits>[0-9]+)$",
)?;
let mut cache = re.create_cache();
let mut caps = Captures::all(re.get_nfa().group_info().clone());
re.captures(&mut cache, "ABC123", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Match::must(0, 0..6)), caps.get_match());
// The 'lower' group didn't match, so it won't have any offsets.
assert_eq!(None, caps.get_group_by_name("lower"));
assert_eq!(Some(Span::from(0..3)), caps.get_group_by_name("upper"));
assert_eq!(Some(Span::from(3..6)), caps.get_group_by_name("digits"));
sourcepub fn matches(group_info: GroupInfo) -> Captures
pub fn matches(group_info: GroupInfo) -> Captures
Create new storage for only the full match spans of a pattern. This does not include any capturing group offsets.
It is unspecified behavior to use the returned Captures
value in a
search with a GroupInfo
other than the one that is provided to this
constructor.
§Example
This example shows that only overall match offsets are reported when
this constructor is used. Accessing any capturing groups other than
the 0th will always return None
.
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
Match,
};
let re = PikeVM::new(
r"^(?:(?P<lower>[a-z]+)|(?P<upper>[A-Z]+))(?P<digits>[0-9]+)$",
)?;
let mut cache = re.create_cache();
let mut caps = Captures::matches(re.get_nfa().group_info().clone());
re.captures(&mut cache, "ABC123", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Match::must(0, 0..6)), caps.get_match());
// We didn't ask for capturing group offsets, so they aren't available.
assert_eq!(None, caps.get_group_by_name("lower"));
assert_eq!(None, caps.get_group_by_name("upper"));
assert_eq!(None, caps.get_group_by_name("digits"));
sourcepub fn empty(group_info: GroupInfo) -> Captures
pub fn empty(group_info: GroupInfo) -> Captures
Create new storage for only tracking which pattern matched. No offsets are stored at all.
It is unspecified behavior to use the returned Captures
value in a
search with a GroupInfo
other than the one that is provided to this
constructor.
§Example
This example shows that only the pattern that matched can be accessed
from a Captures
value created via this constructor.
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
PatternID,
};
let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());
re.captures(&mut cache, "aABCz", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(PatternID::must(0)), caps.pattern());
// We didn't ask for any offsets, so they aren't available.
assert_eq!(None, caps.get_match());
re.captures(&mut cache, &"aABCz"[1..], &mut caps);
assert!(caps.is_match());
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// We didn't ask for any offsets, so they aren't available.
assert_eq!(None, caps.get_match());
sourcepub fn is_match(&self) -> bool
pub fn is_match(&self) -> bool
Returns true if and only if this capturing group represents a match.
This is a convenience routine for caps.pattern().is_some()
.
§Example
When using the PikeVM (for example), the lightest weight way of detecting whether a match exists is to create capturing groups that only track the ID of the pattern that match (if any):
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
};
let re = PikeVM::new(r"[a-z]+")?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());
re.captures(&mut cache, "aABCz", &mut caps);
assert!(caps.is_match());
sourcepub fn pattern(&self) -> Option<PatternID>
pub fn pattern(&self) -> Option<PatternID>
Returns the identifier of the pattern that matched when this
capturing group represents a match. If no match was found, then this
always returns None
.
This returns a pattern ID in precisely the cases in which is_match
returns true
. Similarly, the pattern ID returned is always the
same pattern ID found in the Match
returned by get_match
.
§Example
When using the PikeVM (for example), the lightest weight way of detecting which pattern matched is to create capturing groups that only track the ID of the pattern that match (if any):
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::captures::Captures,
PatternID,
};
let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());
re.captures(&mut cache, "ABC", &mut caps);
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// Recall that offsets are only available when using a non-empty
// Captures value. So even though a match occurred, this returns None!
assert_eq!(None, caps.get_match());
sourcepub fn get_match(&self) -> Option<Match>
pub fn get_match(&self) -> Option<Match>
Returns the pattern ID and the span of the match, if one occurred.
This always returns None
when Captures
was created with
Captures::empty
, even if a match was found.
If this routine returns a non-None
value, then is_match
is
guaranteed to return true
and pattern
is also guaranteed to return
a non-None
value.
§Example
This example shows how to get the full match from a search:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Match};
let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "ABC", &mut caps);
assert_eq!(Some(Match::must(1, 0..3)), caps.get_match());
sourcepub fn get_group(&self, index: usize) -> Option<Span>
pub fn get_group(&self, index: usize) -> Option<Span>
Returns the span of a capturing group match corresponding to the group index given, only if both the overall pattern matched and the capturing group participated in that match.
This returns None
if index
is invalid. index
is valid if and only
if it’s less than Captures::group_len
for the matching pattern.
This always returns None
when Captures
was created with
Captures::empty
, even if a match was found. This also always
returns None
for any index > 0
when Captures
was created with
Captures::matches
.
If this routine returns a non-None
value, then is_match
is
guaranteed to return true
, pattern
is guaranteed to return a
non-None
value and get_match
is guaranteed to return a non-None
value.
By convention, the 0th capture group will always return the same
span as the span returned by get_match
. This is because the 0th
capture group always corresponds to the entirety of the pattern’s
match. (It is similarly always unnamed because it is implicit.) This
isn’t necessarily true of all regex engines. For example, one can
hand-compile a thompson::NFA
via a
thompson::Builder
, which isn’t
technically forced to make the 0th capturing group always correspond to
the entire match.
§Example
This example shows how to get the capturing groups, by index, from a match:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span, Match};
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert_eq!(Some(Match::must(0, 0..17)), caps.get_match());
assert_eq!(Some(Span::from(0..5)), caps.get_group(1));
assert_eq!(Some(Span::from(6..17)), caps.get_group(2));
// Looking for a non-existent capturing group will return None:
assert_eq!(None, caps.get_group(3));
assert_eq!(None, caps.get_group(9944060567225171988));
sourcepub fn get_group_by_name(&self, name: &str) -> Option<Span>
pub fn get_group_by_name(&self, name: &str) -> Option<Span>
Returns the span of a capturing group match corresponding to the group name given, only if both the overall pattern matched and the capturing group participated in that match.
This returns None
if name
does not correspond to a valid capturing
group for the pattern that matched.
This always returns None
when Captures
was created with
Captures::empty
, even if a match was found. This also always
returns None
for any index > 0
when Captures
was created with
Captures::matches
.
If this routine returns a non-None
value, then is_match
is
guaranteed to return true
, pattern
is guaranteed to return a
non-None
value and get_match
is guaranteed to return a non-None
value.
§Example
This example shows how to get the capturing groups, by name, from a match:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span, Match};
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert_eq!(Some(Match::must(0, 0..17)), caps.get_match());
assert_eq!(Some(Span::from(0..5)), caps.get_group_by_name("first"));
assert_eq!(Some(Span::from(6..17)), caps.get_group_by_name("last"));
// Looking for a non-existent capturing group will return None:
assert_eq!(None, caps.get_group_by_name("middle"));
sourcepub fn iter(&self) -> CapturesPatternIter<'_> ⓘ
pub fn iter(&self) -> CapturesPatternIter<'_> ⓘ
Returns an iterator of possible spans for every capturing group in the matching pattern.
If this Captures
value does not correspond to a match, then the
iterator returned yields no elements.
Note that the iterator returned yields elements of type Option<Span>
.
A span is present if and only if it corresponds to a capturing group
that participated in a match.
§Example
This example shows how to collect all capturing groups:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(
// Matches first/last names, with an optional middle name.
r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Harry James Potter", &mut caps);
assert!(caps.is_match());
let groups: Vec<Option<Span>> = caps.iter().collect();
assert_eq!(groups, vec![
Some(Span::from(0..18)),
Some(Span::from(0..5)),
Some(Span::from(6..11)),
Some(Span::from(12..18)),
]);
This example uses the same regex as the previous example, but with a haystack that omits the middle name. This results in a capturing group that is present in the elements yielded by the iterator but without a match:
use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};
let re = PikeVM::new(
// Matches first/last names, with an optional middle name.
r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Harry Potter", &mut caps);
assert!(caps.is_match());
let groups: Vec<Option<Span>> = caps.iter().collect();
assert_eq!(groups, vec![
Some(Span::from(0..12)),
Some(Span::from(0..5)),
None,
Some(Span::from(6..12)),
]);
sourcepub fn group_len(&self) -> usize
pub fn group_len(&self) -> usize
Return the total number of capturing groups for the matching pattern.
If this Captures
value does not correspond to a match, then this
always returns 0
.
This always returns the same number of elements yielded by
Captures::iter
. That is, the number includes capturing groups even
if they don’t participate in the match.
§Example
This example shows how to count the total number of capturing groups
associated with a pattern. Notice that it includes groups that did not
participate in a match (just like Captures::iter
does).
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(
// Matches first/last names, with an optional middle name.
r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Harry Potter", &mut caps);
assert_eq!(4, caps.group_len());
sourcepub fn group_info(&self) -> &GroupInfo
pub fn group_info(&self) -> &GroupInfo
Returns a reference to the underlying group info on which these captures are based.
The difference between GroupInfo
and Captures
is that the former
defines the structure of capturing groups where as the latter is what
stores the actual match information. So where as Captures
only gives
you access to the current match, GroupInfo
lets you query any
information about all capturing groups, even ones for patterns that
weren’t involved in a match.
Note that a GroupInfo
uses reference counting internally, so it may
be cloned cheaply.
§Example
This example shows how to get all capturing group names from the
underlying GroupInfo
. Notice that we don’t even need to run a
search.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?P<foo>a)",
r"(a)(b)",
r"ab",
r"(?P<bar>a)(?P<quux>a)",
r"(?P<foo>z)",
])?;
let caps = re.create_captures();
let expected = vec![
(PatternID::must(0), 0, None),
(PatternID::must(0), 1, Some("foo")),
(PatternID::must(1), 0, None),
(PatternID::must(1), 1, None),
(PatternID::must(1), 2, None),
(PatternID::must(2), 0, None),
(PatternID::must(3), 0, None),
(PatternID::must(3), 1, Some("bar")),
(PatternID::must(3), 2, Some("quux")),
(PatternID::must(4), 0, None),
(PatternID::must(4), 1, Some("foo")),
];
// We could also just use 're.get_nfa().group_info()'.
let got: Vec<(PatternID, usize, Option<&str>)> =
caps.group_info().all_names().collect();
assert_eq!(expected, got);
sourcepub fn interpolate_string(&self, haystack: &str, replacement: &str) -> String
pub fn interpolate_string(&self, haystack: &str, replacement: &str) -> String
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated string is returned.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = "year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = "On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_string(hay, replacement);
assert_eq!("year=2010, month=03, day=14", result);
// And this matches the second pattern.
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_string(hay, replacement);
assert_eq!("year=2010, month=03, day=14", result);
sourcepub fn interpolate_string_into(
&self,
haystack: &str,
replacement: &str,
dst: &mut String,
)
pub fn interpolate_string_into( &self, haystack: &str, replacement: &str, dst: &mut String, )
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated string is written to dst
.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = "year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = "On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = String::new();
caps.interpolate_string_into(hay, replacement, &mut dst);
assert_eq!("year=2010, month=03, day=14", dst);
// And this matches the second pattern.
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = String::new();
caps.interpolate_string_into(hay, replacement, &mut dst);
assert_eq!("year=2010, month=03, day=14", dst);
sourcepub fn interpolate_bytes(&self, haystack: &[u8], replacement: &[u8]) -> Vec<u8>
pub fn interpolate_bytes(&self, haystack: &[u8], replacement: &[u8]) -> Vec<u8>
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated byte string is returned.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = b"year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_bytes(hay, replacement);
assert_eq!(&b"year=2010, month=03, day=14"[..], result);
// And this matches the second pattern.
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_bytes(hay, replacement);
assert_eq!(&b"year=2010, month=03, day=14"[..], result);
sourcepub fn interpolate_bytes_into(
&self,
haystack: &[u8],
replacement: &[u8],
dst: &mut Vec<u8>,
)
pub fn interpolate_bytes_into( &self, haystack: &[u8], replacement: &[u8], dst: &mut Vec<u8>, )
Interpolates the capture references in replacement
with the
corresponding substrings in haystack
matched by each reference. The
interpolated byte string is written to dst
.
See the interpolate
module for documentation on the
format of the replacement string.
§Example
This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.
use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};
let re = PikeVM::new_many(&[
r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let replacement = b"year=$year, month=$month, day=$day";
// This matches the first pattern.
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = vec![];
caps.interpolate_bytes_into(hay, replacement, &mut dst);
assert_eq!(&b"year=2010, month=03, day=14"[..], dst);
// And this matches the second pattern.
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = vec![];
caps.interpolate_bytes_into(hay, replacement, &mut dst);
assert_eq!(&b"year=2010, month=03, day=14"[..], dst);
sourcepub fn extract<'h, const N: usize>(
&self,
haystack: &'h str,
) -> (&'h str, [&'h str; N])
pub fn extract<'h, const N: usize>( &self, haystack: &'h str, ) -> (&'h str, [&'h str; N])
This is a convenience routine for extracting the substrings
corresponding to matching capture groups in the given haystack
. The
haystack
should be the same substring used to find the match spans in
this Captures
value.
This is identical to Captures::extract_bytes
, except it works with
&str
instead of &[u8]
.
§Panics
This panics if the number of explicit matching groups in this
Captures
value is less than N
. This also panics if this Captures
value does not correspond to a match.
Note that this does not panic if the number of explicit matching
groups is bigger than N
. In that case, only the first N
matching
groups are extracted.
§Example
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
assert!(caps.is_match());
let (full, [year, month, day]) = caps.extract(hay);
assert_eq!("2010-03-14", full);
assert_eq!("2010", year);
assert_eq!("03", month);
assert_eq!("14", day);
// We can also ask for fewer than all capture groups.
let (full, [year]) = caps.extract(hay);
assert_eq!("2010-03-14", full);
assert_eq!("2010", year);
sourcepub fn extract_bytes<'h, const N: usize>(
&self,
haystack: &'h [u8],
) -> (&'h [u8], [&'h [u8]; N])
pub fn extract_bytes<'h, const N: usize>( &self, haystack: &'h [u8], ) -> (&'h [u8], [&'h [u8]; N])
This is a convenience routine for extracting the substrings
corresponding to matching capture groups in the given haystack
. The
haystack
should be the same substring used to find the match spans in
this Captures
value.
This is identical to Captures::extract
, except it works with
&[u8]
instead of &str
.
§Panics
This panics if the number of explicit matching groups in this
Captures
value is less than N
. This also panics if this Captures
value does not correspond to a match.
Note that this does not panic if the number of explicit matching
groups is bigger than N
. In that case, only the first N
matching
groups are extracted.
§Example
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
assert!(caps.is_match());
let (full, [year, month, day]) = caps.extract_bytes(hay);
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
assert_eq!(b"03", month);
assert_eq!(b"14", day);
// We can also ask for fewer than all capture groups.
let (full, [year]) = caps.extract_bytes(hay);
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
source§impl Captures
impl Captures
Lower level “slot” oriented APIs. One does not typically need to use these
when executing a search. They are instead mostly intended for folks that
are writing their own regex engine while reusing this Captures
type.
sourcepub fn clear(&mut self)
pub fn clear(&mut self)
Clear this Captures
value.
After clearing, all slots inside this Captures
value will be set to
None
. Similarly, any pattern ID that it was previously associated
with (for a match) is erased.
It is not usually necessary to call this routine. Namely, a Captures
value only provides high level access to the capturing groups of the
pattern that matched, and only low level access to individual slots.
Thus, even if slots corresponding to groups that aren’t associated
with the matching pattern are set, then it won’t impact the higher
level APIs. Namely, higher level APIs like Captures::get_group
will
return None
if no pattern ID is present, even if there are spans set
in the underlying slots.
Thus, to “clear” a Captures
value of a match, it is usually only
necessary to call Captures::set_pattern
with None
.
§Example
This example shows what happens when a Captures
value is cleared.
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert!(caps.is_match());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
Some(0),
Some(17),
Some(0),
Some(5),
Some(6),
Some(17),
]);
// Now clear the slots. Everything is gone and it is no longer a match.
caps.clear();
assert!(!caps.is_match());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
assert_eq!(slots, vec![
None,
None,
None,
None,
None,
None,
]);
sourcepub fn set_pattern(&mut self, pid: Option<PatternID>)
pub fn set_pattern(&mut self, pid: Option<PatternID>)
Set the pattern on this Captures
value.
When the pattern ID is None
, then this Captures
value does not
correspond to a match (is_match
will return false
). Otherwise, it
corresponds to a match.
This is useful in search implementations where you might want to
initially call set_pattern(None)
in order to avoid the cost of
calling clear()
if it turns out to not be necessary.
§Example
This example shows that set_pattern
merely overwrites the pattern ID.
It does not actually change the underlying slot values.
use regex_automata::nfa::thompson::pikevm::PikeVM;
let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert!(caps.is_match());
assert!(caps.pattern().is_some());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
Some(0),
Some(17),
Some(0),
Some(5),
Some(6),
Some(17),
]);
// Now set the pattern to None. Note that the slot values remain.
caps.set_pattern(None);
assert!(!caps.is_match());
assert!(!caps.pattern().is_some());
let slots: Vec<Option<usize>> =
caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
Some(0),
Some(17),
Some(0),
Some(5),
Some(6),
Some(17),
]);
sourcepub fn slots(&self) -> &[Option<NonMaxUsize>]
pub fn slots(&self) -> &[Option<NonMaxUsize>]
Returns the underlying slots, where each slot stores a single offset.
Every matching capturing group generally corresponds to two slots: one slot for the starting position and another for the ending position. Typically, either both are present or neither are. (The weasel word “typically” is used here because it really depends on the regex engine implementation. Every sensible regex engine likely adheres to this invariant, and every regex engine in this crate is sensible.)
Generally speaking, callers should prefer to use higher level routines
like Captures::get_match
or Captures::get_group
.
An important note here is that a regex engine may not reset all of the
slots to None
values when no match occurs, or even when a match of
a different pattern occurs. But this depends on how the regex engine
implementation deals with slots.
§Example
This example shows how to get the underlying slots from a regex match.
use regex_automata::{
nfa::thompson::pikevm::PikeVM,
util::primitives::{PatternID, NonMaxUsize},
};
let re = PikeVM::new_many(&[
r"[a-z]+",
r"[0-9]+",
])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());
re.captures(&mut cache, "123", &mut caps);
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// Note that the only guarantee we have here is that slots 2 and 3
// are set to correct values. The contents of the first two slots are
// unspecified since the 0th pattern did not match.
let expected = &[
None,
None,
NonMaxUsize::new(0),
NonMaxUsize::new(3),
];
assert_eq!(expected, caps.slots());
sourcepub fn slots_mut(&mut self) -> &mut [Option<NonMaxUsize>]
pub fn slots_mut(&mut self) -> &mut [Option<NonMaxUsize>]
Returns the underlying slots as a mutable slice, where each slot stores a single offset.
This tends to be most useful for regex engine implementations for writing offsets for matching capturing groups to slots.
See Captures::slots
for more information about slots.