Struct Captures

Source

pub struct Captures {
    group_info: GroupInfo,
    pid: Option<PatternID>,
    slots: Vec<Option<NonMaxUsize>>,
}

Expand description

The span offsets of capturing groups after a match has been found.

This type represents the output of regex engines that can report the offsets at which capturing groups matches or “submatches” occur. For example, the PikeVM. When a match occurs, it will at minimum contain the PatternID of the pattern that matched. Depending upon how it was constructed, it may also contain the start/end offsets of the entire match of the pattern and the start/end offsets of each capturing group that participated in the match.

Values of this type are always created for a specific GroupInfo. It is unspecified behavior to use a Captures value in a search with any regex engine that has a different GroupInfo than the one the Captures were created with.

§Constructors

There are three constructors for this type that control what kind of information is available upon a match:

Captures::all: Will store overall pattern match offsets in addition to the offsets of capturing groups that participated in the match.
Captures::matches: Will store only the overall pattern match offsets. The offsets of capturing groups (even ones that participated in the match) are not available.
Captures::empty: Will only store the pattern ID that matched. No match offsets are available at all.

If you aren’t sure which to choose, then pick the first one. The first one is what convenience routines like, PikeVM::create_captures, will use automatically.

The main difference between these choices is performance. Namely, if you ask for less information, then the execution of regex search may be able to run more quickly.

§Notes

It is worth pointing out that this type is not coupled to any one specific regex engine. Instead, its coupling is with GroupInfo, which is the thing that is responsible for mapping capturing groups to “slot” offsets. Slot offsets are indices into a single sequence of memory at which matching haystack offsets for the corresponding group are written by regex engines.

§Example

This example shows how to parse a simple date and extract the components of the date via capturing groups:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};

let re = PikeVM::new(r"^([0-9]{4})-([0-9]{2})-([0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group(1));
assert_eq!(Some(Span::from(5..7)), caps.get_group(2));
assert_eq!(Some(Span::from(8..10)), caps.get_group(3));

§Example: named capturing groups

This example is like the one above, but leverages the ability to name capturing groups in order to make the code a bit clearer:

use regex_automata::{nfa::thompson::pikevm::PikeVM, Span};

let re = PikeVM::new(r"^(?P<y>[0-9]{4})-(?P<m>[0-9]{2})-(?P<d>[0-9]{2})$")?;
let (mut cache, mut caps) = (re.create_cache(), re.create_captures());

re.captures(&mut cache, "2010-03-14", &mut caps);
assert!(caps.is_match());
assert_eq!(Some(Span::from(0..4)), caps.get_group_by_name("y"));
assert_eq!(Some(Span::from(5..7)), caps.get_group_by_name("m"));
assert_eq!(Some(Span::from(8..10)), caps.get_group_by_name("d"));

Fields§

§group_info: GroupInfo

The group info that these capture groups are coupled to. This is what gives the “convenience” of the Captures API. Namely, it provides the slot mapping and the name|–>index mapping for capture lookups by name.

§pid: Option<PatternID>

The ID of the pattern that matched. Regex engines must set this to None when no match occurs.

§slots: Vec<Option<NonMaxUsize>>

The slot values, i.e., submatch offsets.

In theory, the smallest sequence of slots would be something like max(groups(pattern) for pattern in regex) * 2, but instead, we use sum(groups(pattern) for pattern in regex) * 2. Why?

Well, the former could be used in theory, because we don’t generally have any overlapping APIs that involve capturing groups. Therefore, there’s technically never any need to have slots set for multiple patterns. However, this might change some day, in which case, we would need to have slots available.

The other reason is that during the execution of some regex engines, there exists a point in time where multiple slots for different patterns may be written to before knowing which pattern has matched. Therefore, the regex engines themselves, in order to support multiple patterns correctly, must have all slots available. If Captures doesn’t have all slots available, then regex engines can’t write directly into the caller provided Captures and must instead write into some other storage and then copy the slots involved in the match at the end of the search.

So overall, at least as of the time of writing, it seems like the path of least resistance is to just require allocating all possible slots instead of the conceptual minimum. Another way to justify this is that the most common case is a single pattern, in which case, there is no inefficiency here since the ‘max’ and ‘sum’ calculations above are equivalent in that case.

N.B. The mapping from group index to slot is maintained by GroupInfo and is considered an API guarantee. See GroupInfo for more details on that mapping.

N.B. Option<NonMaxUsize> has the same size as a usize.

Struct Captures Copy item path

§Constructors

§Notes

§Example

§Example: named capturing groups

Fields§

Implementations§

impl Captures

pub fn all(group_info: GroupInfo) -> Captures

§Example

pub fn matches(group_info: GroupInfo) -> Captures

§Example

pub fn empty(group_info: GroupInfo) -> Captures

§Example

pub fn is_match(&self) -> bool

§Example

pub fn pattern(&self) -> Option<PatternID>

§Example

pub fn get_match(&self) -> Option<Match>

§Example

pub fn get_group(&self, index: usize) -> Option<Span>

§Example

pub fn get_group_by_name(&self, name: &str) -> Option<Span>

§Example

pub fn iter(&self) -> CapturesPatternIter<'_> ⓘ

§Example

pub fn group_len(&self) -> usize

§Example

pub fn group_info(&self) -> &GroupInfo

§Example

pub fn interpolate_string(&self, haystack: &str, replacement: &str) -> String

§Example

pub fn interpolate_string_into( &self, haystack: &str, replacement: &str, dst: &mut String, )

§Example

pub fn interpolate_bytes(&self, haystack: &[u8], replacement: &[u8]) -> Vec<u8> ⓘ

§Example

pub fn interpolate_bytes_into( &self, haystack: &[u8], replacement: &[u8], dst: &mut Vec<u8>, )

§Example

pub fn extract<'h, const N: usize>( &self, haystack: &'h str, ) -> (&'h str, [&'h str; N])

§Panics

§Example

pub fn extract_bytes<'h, const N: usize>( &self, haystack: &'h [u8], ) -> (&'h [u8], [&'h [u8]; N])

§Panics

§Example

impl Captures

pub fn clear(&mut self)

§Example

pub fn set_pattern(&mut self, pid: Option<PatternID>)

§Example

pub fn slots(&self) -> &[Option<NonMaxUsize>]

§Example

pub fn slots_mut(&mut self) -> &mut [Option<NonMaxUsize>]

Trait Implementations§

impl Clone for Captures

fn clone(&self) -> Captures

fn clone_from(&mut self, source: &Self)

impl Debug for Captures

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl Freeze for Captures

impl RefUnwindSafe for Captures

impl Send for Captures

impl Sync for Captures

impl Unpin for Captures

impl UnwindSafe for Captures

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dest: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

Struct Captures

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,