Struct regex_automata::util::captures::Captures

source ·
pub struct Captures {
    group_info: GroupInfo,
    pid: Option<PatternID>,
    slots: Vec<Option<NonMaxUsize>>,
}
Expand description

The span offsets of capturing groups after a match has been found.

This type represents the output of regex engines that can report the offsets at which capturing groups matches or “submatches” occur. For example, the PikeVM. When a match occurs, it will at minimum contain the PatternID of the pattern that matched. Depending upon how it was constructed, it may also contain the start/end offsets of the entire match of the pattern and the start/end offsets of each capturing group that participated in the match.

Values of this type are always created for a specific GroupInfo. It is unspecified behavior to use a Captures value in a search with any regex engine that has a different GroupInfo than the one the Captures were created with.

§Constructors

There are three constructors for this type that control what kind of information is available upon a match:

  • Captures::all: Will store overall pattern match offsets in addition to the offsets of capturing groups that participated in the match.
  • Captures::matches: Will store only the overall pattern match offsets. The offsets of capturing groups (even ones that participated in the match) are not available.
  • Captures::empty: Will only store the pattern ID that matched. No match offsets are available at all.

If you aren’t sure which to choose, then pick the first one. The first one is what convenience routines like, PikeVM::create_captures, will use automatically.

The main difference between these choices is performance. Namely, if you ask for less information, then the execution of regex search may be able to run more quickly.

§Notes

It is worth pointing out that this type is not coupled to any one specific regex engine. Instead, its coupling is with GroupInfo, which is the thing that is responsible for mapping capturing groups to “slot” offsets. Slot offsets are indices into a single sequence of memory at which matching haystack offsets for the corresponding group are written by regex engines.

§Example

This example shows how to parse a simple date and extract the components of the date via capturing groups:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};

let re = PikeVM::new(r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));

§Example: named capturing groups

This example is like the one above, but leverages the ability to name capturing groups in order to make the code a bit clearer:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};

let re = PikeVM::new(r"^(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group_by_name("y"));
assert_eq!(Some(Span::from(5..7)), caps.get_group_by_name("m"));
assert_eq!(Some(Span::from(8..10)), caps.get_group_by_name("d"));

Fields§

§group_info: GroupInfo

The group info that these capture groups are coupled to. This is what gives the “convenience” of the Captures API. Namely, it provides the slot mapping and the name|–>index mapping for capture lookups by name.

§pid: Option<PatternID>

The ID of the pattern that matched. Regex engines must set this to None when no match occurs.

§slots: Vec<Option<NonMaxUsize>>

The slot values, i.e., submatch offsets.

In theory, the smallest sequence of slots would be something like max(groups(pattern) for pattern in regex) * 2, but instead, we use sum(groups(pattern) for pattern in regex) * 2. Why?

Well, the former could be used in theory, because we don’t generally have any overlapping APIs that involve capturing groups. Therefore, there’s technically never any need to have slots set for multiple patterns. However, this might change some day, in which case, we would need to have slots available.

The other reason is that during the execution of some regex engines, there exists a point in time where multiple slots for different patterns may be written to before knowing which pattern has matched. Therefore, the regex engines themselves, in order to support multiple patterns correctly, must have all slots available. If Captures doesn’t have all slots available, then regex engines can’t write directly into the caller provided Captures and must instead write into some other storage and then copy the slots involved in the match at the end of the search.

So overall, at least as of the time of writing, it seems like the path of least resistance is to just require allocating all possible slots instead of the conceptual minimum. Another way to justify this is that the most common case is a single pattern, in which case, there is no inefficiency here since the ‘max’ and ‘sum’ calculations above are equivalent in that case.

N.B. The mapping from group index to slot is maintained by GroupInfo and is considered an API guarantee. See GroupInfo for more details on that mapping.

N.B. Option<NonMaxUsize> has the same size as a usize.

Implementations§

source§

impl Captures

source

pub fn all(group_info: GroupInfo) -> Captures

Create new storage for the offsets of all matching capturing groups.

This routine provides the most information for matches—namely, the spans of matching capturing groups—but also requires the regex search routines to do the most work.

It is unspecified behavior to use the returned Captures value in a search with a GroupInfo other than the one that is provided to this constructor.

§Example

This example shows that all capturing groups—but only ones that participated in a match—are available to query after a match has been found:

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::captures::Captures,
    Span, Match,
};

let re = PikeVM::new(
    r"^(?:(?P<lower>[a-z]+)|(?P<upper>[A-Z]+))(?P<digits>[0-9]+)$",
)?;
let mut cache = re.create_cache();
let mut caps = Captures::all(re.get_nfa().group_info().clone());

re.captures(&mut cache, "ABC123", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Match::must(0, 0..6)), caps.get_match());
// The 'lower' group didn't match, so it won't have any offsets.
assert_eq!(None, caps.get_group_by_name("lower"));
assert_eq!(Some(Span::from(0..3)), caps.get_group_by_name("upper"));
assert_eq!(Some(Span::from(3..6)), caps.get_group_by_name("digits"));
source

pub fn matches(group_info: GroupInfo) -> Captures

Create new storage for only the full match spans of a pattern. This does not include any capturing group offsets.

It is unspecified behavior to use the returned Captures value in a search with a GroupInfo other than the one that is provided to this constructor.

§Example

This example shows that only overall match offsets are reported when this constructor is used. Accessing any capturing groups other than the 0th will always return None.

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::captures::Captures,
    Match,
};

let re = PikeVM::new(
    r"^(?:(?P<lower>[a-z]+)|(?P<upper>[A-Z]+))(?P<digits>[0-9]+)$",
)?;
let mut cache = re.create_cache();
let mut caps = Captures::matches(re.get_nfa().group_info().clone());

re.captures(&mut cache, "ABC123", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Match::must(0, 0..6)), caps.get_match());
// We didn't ask for capturing group offsets, so they aren't available.
assert_eq!(None, caps.get_group_by_name("lower"));
assert_eq!(None, caps.get_group_by_name("upper"));
assert_eq!(None, caps.get_group_by_name("digits"));
source

pub fn empty(group_info: GroupInfo) -> Captures

Create new storage for only tracking which pattern matched. No offsets are stored at all.

It is unspecified behavior to use the returned Captures value in a search with a GroupInfo other than the one that is provided to this constructor.

§Example

This example shows that only the pattern that matched can be accessed from a Captures value created via this constructor.

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::captures::Captures,
    PatternID,
};

let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());

re.captures(&mut cache, "aABCz", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(PatternID::must(0)), caps.pattern());
// We didn't ask for any offsets, so they aren't available.
assert_eq!(None, caps.get_match());

re.captures(&mut cache, &"aABCz"[1..], &mut caps);
assert!(caps.is_match());
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// We didn't ask for any offsets, so they aren't available.
assert_eq!(None, caps.get_match());
source

pub fn is_match(&self) -> bool

Returns true if and only if this capturing group represents a match.

This is a convenience routine for caps.pattern().is_some().

§Example

When using the PikeVM (for example), the lightest weight way of detecting whether a match exists is to create capturing groups that only track the ID of the pattern that match (if any):

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::captures::Captures,
};

let re = PikeVM::new(r"[a-z]+")?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());

re.captures(&mut cache, "aABCz", &mut caps);
assert!(caps.is_match());
source

pub fn pattern(&self) -> Option<PatternID>

Returns the identifier of the pattern that matched when this capturing group represents a match. If no match was found, then this always returns None.

This returns a pattern ID in precisely the cases in which is_match returns true. Similarly, the pattern ID returned is always the same pattern ID found in the Match returned by get_match.

§Example

When using the PikeVM (for example), the lightest weight way of detecting which pattern matched is to create capturing groups that only track the ID of the pattern that match (if any):

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::captures::Captures,
    PatternID,
};

let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let mut cache = re.create_cache();
let mut caps = Captures::empty(re.get_nfa().group_info().clone());

re.captures(&mut cache, "ABC", &mut caps);
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// Recall that offsets are only available when using a non-empty
// Captures value. So even though a match occurred, this returns None!
assert_eq!(None, caps.get_match());
source

pub fn get_match(&self) -> Option<Match>

Returns the pattern ID and the span of the match, if one occurred.

This always returns None when Captures was created with Captures::empty, even if a match was found.

If this routine returns a non-None value, then is_match is guaranteed to return true and pattern is also guaranteed to return a non-None value.

§Example

This example shows how to get the full match from a search:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Match};

let re = PikeVM::new_many(&[r"[a-z]+", r"[A-Z]+"])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "ABC", &mut caps);
assert_eq!(Some(Match::must(1, 0..3)), caps.get_match());
source

pub fn get_group(&self, index: usize) -> Option<Span>

Returns the span of a capturing group match corresponding to the group index given, only if both the overall pattern matched and the capturing group participated in that match.

This returns None if index is invalid. index is valid if and only if it’s less than Captures::group_len for the matching pattern.

This always returns None when Captures was created with Captures::empty, even if a match was found. This also always returns None for any index > 0 when Captures was created with Captures::matches.

If this routine returns a non-None value, then is_match is guaranteed to return true, pattern is guaranteed to return a non-None value and get_match is guaranteed to return a non-None value.

By convention, the 0th capture group will always return the same span as the span returned by get_match. This is because the 0th capture group always corresponds to the entirety of the pattern’s match. (It is similarly always unnamed because it is implicit.) This isn’t necessarily true of all regex engines. For example, one can hand-compile a thompson::NFA via a thompson::Builder, which isn’t technically forced to make the 0th capturing group always correspond to the entire match.

§Example

This example shows how to get the capturing groups, by index, from a match:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span, Match};

let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert_eq!(Some(Match::must(0, 0..17)), caps.get_match());
assert_eq!(Some(Span::from(0..5)), caps.get_group(1));
assert_eq!(Some(Span::from(6..17)), caps.get_group(2));
// Looking for a non-existent capturing group will return None:
assert_eq!(None, caps.get_group(3));
assert_eq!(None, caps.get_group(9944060567225171988));
source

pub fn get_group_by_name(&self, name: &str) -> Option<Span>

Returns the span of a capturing group match corresponding to the group name given, only if both the overall pattern matched and the capturing group participated in that match.

This returns None if name does not correspond to a valid capturing group for the pattern that matched.

This always returns None when Captures was created with Captures::empty, even if a match was found. This also always returns None for any index > 0 when Captures was created with Captures::matches.

If this routine returns a non-None value, then is_match is guaranteed to return true, pattern is guaranteed to return a non-None value and get_match is guaranteed to return a non-None value.

§Example

This example shows how to get the capturing groups, by name, from a match:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span, Match};

let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert_eq!(Some(Match::must(0, 0..17)), caps.get_match());
assert_eq!(Some(Span::from(0..5)), caps.get_group_by_name("first"));
assert_eq!(Some(Span::from(6..17)), caps.get_group_by_name("last"));
// Looking for a non-existent capturing group will return None:
assert_eq!(None, caps.get_group_by_name("middle"));
source

pub fn iter(&self) -> CapturesPatternIter<'_>

Returns an iterator of possible spans for every capturing group in the matching pattern.

If this Captures value does not correspond to a match, then the iterator returned yields no elements.

Note that the iterator returned yields elements of type Option<Span>. A span is present if and only if it corresponds to a capturing group that participated in a match.

§Example

This example shows how to collect all capturing groups:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};

let re = PikeVM::new(
    // Matches first/last names, with an optional middle name.
    r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Harry James Potter", &mut caps);
assert!(caps.is_match());
let groups: Vec<Option<Span>> = caps.iter().collect();
assert_eq!(groups, vec![
    Some(Span::from(0..18)),
    Some(Span::from(0..5)),
    Some(Span::from(6..11)),
    Some(Span::from(12..18)),
]);

This example uses the same regex as the previous example, but with a haystack that omits the middle name. This results in a capturing group that is present in the elements yielded by the iterator but without a match:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};

let re = PikeVM::new(
    // Matches first/last names, with an optional middle name.
    r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Harry Potter", &mut caps);
assert!(caps.is_match());
let groups: Vec<Option<Span>> = caps.iter().collect();
assert_eq!(groups, vec![
    Some(Span::from(0..12)),
    Some(Span::from(0..5)),
    None,
    Some(Span::from(6..12)),
]);
source

pub fn group_len(&self) -> usize

Return the total number of capturing groups for the matching pattern.

If this Captures value does not correspond to a match, then this always returns 0.

This always returns the same number of elements yielded by Captures::iter. That is, the number includes capturing groups even if they don’t participate in the match.

§Example

This example shows how to count the total number of capturing groups associated with a pattern. Notice that it includes groups that did not participate in a match (just like Captures::iter does).

use regex_automata::nfa::thompson::pikevm::PikeVM;

let re = PikeVM::new(
    // Matches first/last names, with an optional middle name.
    r"^(?P<first>\pL+)\s+(?:(?P<middle>\pL+)\s+)?(?P<last>\pL+)$",
)?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Harry Potter", &mut caps);
assert_eq!(4, caps.group_len());
source

pub fn group_info(&self) -> &GroupInfo

Returns a reference to the underlying group info on which these captures are based.

The difference between GroupInfo and Captures is that the former defines the structure of capturing groups where as the latter is what stores the actual match information. So where as Captures only gives you access to the current match, GroupInfo lets you query any information about all capturing groups, even ones for patterns that weren’t involved in a match.

Note that a GroupInfo uses reference counting internally, so it may be cloned cheaply.

§Example

This example shows how to get all capturing group names from the underlying GroupInfo. Notice that we don’t even need to run a search.

use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};

let re = PikeVM::new_many(&[
    r"(?P<foo>a)",
    r"(a)(b)",
    r"ab",
    r"(?P<bar>a)(?P<quux>a)",
    r"(?P<foo>z)",
])?;
let caps = re.create_captures();

let expected = vec![
    (PatternID::must(0), 0, None),
    (PatternID::must(0), 1, Some("foo")),
    (PatternID::must(1), 0, None),
    (PatternID::must(1), 1, None),
    (PatternID::must(1), 2, None),
    (PatternID::must(2), 0, None),
    (PatternID::must(3), 0, None),
    (PatternID::must(3), 1, Some("bar")),
    (PatternID::must(3), 2, Some("quux")),
    (PatternID::must(4), 0, None),
    (PatternID::must(4), 1, Some("foo")),
];
// We could also just use 're.get_nfa().group_info()'.
let got: Vec<(PatternID, usize, Option<&str>)> =
    caps.group_info().all_names().collect();
assert_eq!(expected, got);
source

pub fn interpolate_string(&self, haystack: &str, replacement: &str) -> String

Interpolates the capture references in replacement with the corresponding substrings in haystack matched by each reference. The interpolated string is returned.

See the interpolate module for documentation on the format of the replacement string.

§Example

This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.

use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};

let re = PikeVM::new_many(&[
    r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
    r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

let replacement = "year=$year, month=$month, day=$day";

// This matches the first pattern.
let hay = "On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_string(hay, replacement);
assert_eq!("year=2010, month=03, day=14", result);

// And this matches the second pattern.
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_string(hay, replacement);
assert_eq!("year=2010, month=03, day=14", result);
source

pub fn interpolate_string_into( &self, haystack: &str, replacement: &str, dst: &mut String, )

Interpolates the capture references in replacement with the corresponding substrings in haystack matched by each reference. The interpolated string is written to dst.

See the interpolate module for documentation on the format of the replacement string.

§Example

This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.

use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};

let re = PikeVM::new_many(&[
    r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
    r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

let replacement = "year=$year, month=$month, day=$day";

// This matches the first pattern.
let hay = "On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = String::new();
caps.interpolate_string_into(hay, replacement, &mut dst);
assert_eq!("year=2010, month=03, day=14", dst);

// And this matches the second pattern.
let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = String::new();
caps.interpolate_string_into(hay, replacement, &mut dst);
assert_eq!("year=2010, month=03, day=14", dst);
source

pub fn interpolate_bytes(&self, haystack: &[u8], replacement: &[u8]) -> Vec<u8>

Interpolates the capture references in replacement with the corresponding substrings in haystack matched by each reference. The interpolated byte string is returned.

See the interpolate module for documentation on the format of the replacement string.

§Example

This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.

use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};

let re = PikeVM::new_many(&[
    r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
    r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

let replacement = b"year=$year, month=$month, day=$day";

// This matches the first pattern.
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_bytes(hay, replacement);
assert_eq!(&b"year=2010, month=03, day=14"[..], result);

// And this matches the second pattern.
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let result = caps.interpolate_bytes(hay, replacement);
assert_eq!(&b"year=2010, month=03, day=14"[..], result);
source

pub fn interpolate_bytes_into( &self, haystack: &[u8], replacement: &[u8], dst: &mut Vec<u8>, )

Interpolates the capture references in replacement with the corresponding substrings in haystack matched by each reference. The interpolated byte string is written to dst.

See the interpolate module for documentation on the format of the replacement string.

§Example

This example shows how to use interpolation, and also shows how it can work with multi-pattern regexes.

use regex_automata::{nfa::thompson::pikevm::PikeVM, PatternID};

let re = PikeVM::new_many(&[
    r"(?<day>[0-9]{2})-(?<month>[0-9]{2})-(?<year>[0-9]{4})",
    r"(?<year>[0-9]{4})-(?<month>[0-9]{2})-(?<day>[0-9]{2})",
])?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

let replacement = b"year=$year, month=$month, day=$day";

// This matches the first pattern.
let hay = b"On 14-03-2010, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = vec![];
caps.interpolate_bytes_into(hay, replacement, &mut dst);
assert_eq!(&b"year=2010, month=03, day=14"[..], dst);

// And this matches the second pattern.
let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
let mut dst = vec![];
caps.interpolate_bytes_into(hay, replacement, &mut dst);
assert_eq!(&b"year=2010, month=03, day=14"[..], dst);
source

pub fn extract<'h, const N: usize>( &self, haystack: &'h str, ) -> (&'h str, [&'h str; N])

This is a convenience routine for extracting the substrings corresponding to matching capture groups in the given haystack. The haystack should be the same substring used to find the match spans in this Captures value.

This is identical to Captures::extract_bytes, except it works with &str instead of &[u8].

§Panics

This panics if the number of explicit matching groups in this Captures value is less than N. This also panics if this Captures value does not correspond to a match.

Note that this does not panic if the number of explicit matching groups is bigger than N. In that case, only the first N matching groups are extracted.

§Example
use regex_automata::nfa::thompson::pikevm::PikeVM;

let re = PikeVM::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

let hay = "On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
assert!(caps.is_match());
let (full, [year, month, day]) = caps.extract(hay);
assert_eq!("2010-03-14", full);
assert_eq!("2010", year);
assert_eq!("03", month);
assert_eq!("14", day);

// We can also ask for fewer than all capture groups.
let (full, [year]) = caps.extract(hay);
assert_eq!("2010-03-14", full);
assert_eq!("2010", year);
source

pub fn extract_bytes<'h, const N: usize>( &self, haystack: &'h [u8], ) -> (&'h [u8], [&'h [u8]; N])

This is a convenience routine for extracting the substrings corresponding to matching capture groups in the given haystack. The haystack should be the same substring used to find the match spans in this Captures value.

This is identical to Captures::extract, except it works with &[u8] instead of &str.

§Panics

This panics if the number of explicit matching groups in this Captures value is less than N. This also panics if this Captures value does not correspond to a match.

Note that this does not panic if the number of explicit matching groups is bigger than N. In that case, only the first N matching groups are extracted.

§Example
use regex_automata::nfa::thompson::pikevm::PikeVM;

let re = PikeVM::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})")?;
let mut cache = re.create_cache();
let mut caps = re.create_captures();

let hay = b"On 2010-03-14, I became a Tenneessee lamb.";
re.captures(&mut cache, hay, &mut caps);
assert!(caps.is_match());
let (full, [year, month, day]) = caps.extract_bytes(hay);
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
assert_eq!(b"03", month);
assert_eq!(b"14", day);

// We can also ask for fewer than all capture groups.
let (full, [year]) = caps.extract_bytes(hay);
assert_eq!(b"2010-03-14", full);
assert_eq!(b"2010", year);
source§

impl Captures

Lower level “slot” oriented APIs. One does not typically need to use these when executing a search. They are instead mostly intended for folks that are writing their own regex engine while reusing this Captures type.

source

pub fn clear(&mut self)

Clear this Captures value.

After clearing, all slots inside this Captures value will be set to None. Similarly, any pattern ID that it was previously associated with (for a match) is erased.

It is not usually necessary to call this routine. Namely, a Captures value only provides high level access to the capturing groups of the pattern that matched, and only low level access to individual slots. Thus, even if slots corresponding to groups that aren’t associated with the matching pattern are set, then it won’t impact the higher level APIs. Namely, higher level APIs like Captures::get_group will return None if no pattern ID is present, even if there are spans set in the underlying slots.

Thus, to “clear” a Captures value of a match, it is usually only necessary to call Captures::set_pattern with None.

§Example

This example shows what happens when a Captures value is cleared.

use regex_automata::nfa::thompson::pikevm::PikeVM;

let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert!(caps.is_match());
let slots: Vec<Option<usize>> =
    caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
    Some(0),
    Some(17),
    Some(0),
    Some(5),
    Some(6),
    Some(17),
]);

// Now clear the slots. Everything is gone and it is no longer a match.
caps.clear();
assert!(!caps.is_match());
let slots: Vec<Option<usize>> =
    caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
assert_eq!(slots, vec![
    None,
    None,
    None,
    None,
    None,
    None,
]);
source

pub fn set_pattern(&mut self, pid: Option<PatternID>)

Set the pattern on this Captures value.

When the pattern ID is None, then this Captures value does not correspond to a match (is_match will return false). Otherwise, it corresponds to a match.

This is useful in search implementations where you might want to initially call set_pattern(None) in order to avoid the cost of calling clear() if it turns out to not be necessary.

§Example

This example shows that set_pattern merely overwrites the pattern ID. It does not actually change the underlying slot values.

use regex_automata::nfa::thompson::pikevm::PikeVM;

let re = PikeVM::new(r"^(?P<first>\pL+)\s+(?P<last>\pL+)$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "Bruce Springsteen", &mut caps);
assert!(caps.is_match());
assert!(caps.pattern().is_some());
let slots: Vec<Option<usize>> =
    caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
    Some(0),
    Some(17),
    Some(0),
    Some(5),
    Some(6),
    Some(17),
]);

// Now set the pattern to None. Note that the slot values remain.
caps.set_pattern(None);
assert!(!caps.is_match());
assert!(!caps.pattern().is_some());
let slots: Vec<Option<usize>> =
    caps.slots().iter().map(|s| s.map(|x| x.get())).collect();
// Note that the following ordering is considered an API guarantee.
assert_eq!(slots, vec![
    Some(0),
    Some(17),
    Some(0),
    Some(5),
    Some(6),
    Some(17),
]);
source

pub fn slots(&self) -> &[Option<NonMaxUsize>]

Returns the underlying slots, where each slot stores a single offset.

Every matching capturing group generally corresponds to two slots: one slot for the starting position and another for the ending position. Typically, either both are present or neither are. (The weasel word “typically” is used here because it really depends on the regex engine implementation. Every sensible regex engine likely adheres to this invariant, and every regex engine in this crate is sensible.)

Generally speaking, callers should prefer to use higher level routines like Captures::get_match or Captures::get_group.

An important note here is that a regex engine may not reset all of the slots to None values when no match occurs, or even when a match of a different pattern occurs. But this depends on how the regex engine implementation deals with slots.

§Example

This example shows how to get the underlying slots from a regex match.

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    util::primitives::{PatternID, NonMaxUsize},
};

let re = PikeVM::new_many(&[
    r"[a-z]+",
    r"[0-9]+",
])?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "123", &mut caps);
assert_eq!(Some(PatternID::must(1)), caps.pattern());
// Note that the only guarantee we have here is that slots 2 and 3
// are set to correct values. The contents of the first two slots are
// unspecified since the 0th pattern did not match.
let expected = &[
    None,
    None,
    NonMaxUsize::new(0),
    NonMaxUsize::new(3),
];
assert_eq!(expected, caps.slots());
source

pub fn slots_mut(&mut self) -> &mut [Option<NonMaxUsize>]

Returns the underlying slots as a mutable slice, where each slot stores a single offset.

This tends to be most useful for regex engine implementations for writing offsets for matching capturing groups to slots.

See Captures::slots for more information about slots.

Trait Implementations§

source§

impl Clone for Captures

source§

fn clone(&self) -> Captures

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Captures

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.