Enum regex_automata::util::search::MatchKind

source ·

#[non_exhaustive]pub enum MatchKind {
    All,
    LeftmostFirst,
}

Expand description

The kind of match semantics to use for a regex pattern.

The default match kind is LeftmostFirst, and this corresponds to the match semantics used by most backtracking engines, such as Perl.

Leftmost first or “preference order” match semantics

Leftmost-first semantics determine which match to report when there are multiple paths through a regex that match at the same position. The tie is essentially broken by how a backtracker would behave. For example, consider running the regex foofoofoo|foofoo|foo on the haystack foofoo. In this case, both the foofoo and foo branches match at position 0. So should the end of the match be 3 or 6?

A backtracker will conceptually work by trying foofoofoo and failing. Then it will try foofoo, find the match and stop there. Thus, the leftmost-first match position is 6. This is called “leftmost-first” or “preference order” because the order of the branches as written in the regex pattern is what determines how to break the tie.

(Note that leftmost-longest match semantics, which break ties by always taking the longest matching string, are not currently supported by this crate. These match semantics tend to be found in POSIX regex engines.)

This example shows how leftmost-first semantics work, and how it even applies to multi-pattern regexes:

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    Match,
};

let re = PikeVM::new_many(&[
    r"foofoofoo",
    r"foofoo",
    r"foo",
])?;
let mut cache = re.create_cache();
let got: Vec<Match> = re.find_iter(&mut cache, "foofoo").collect();
let expected = vec![Match::must(1, 0..6)];
assert_eq!(expected, got);

All matches

The All match semantics report any and all matches, and generally will attempt to match as much as possible. It doesn’t respect any sort of match priority at all, so things like non-greedy matching don’t work in this mode.

The fact that non-greedy matching doesn’t work generally makes most forms of unanchored non-overlapping searches have unintuitive behavior. Namely, unanchored searches behave as if there is a (?s-u:.)*? prefix at the beginning of the pattern, which is specifically non-greedy. Since it will be treated as greedy in All match semantics, this generally means that it will first attempt to consume all of the haystack and is likely to wind up skipping matches.

Generally speaking, All should only be used in two circumstances:

When running an anchored search and there is a desire to match as much as possible. For example, when building a reverse regex matcher to find the start of a match after finding the end. In this case, the reverse search is anchored to the end of the match found by the forward search.
When running overlapping searches. Since All encodes all possible matches, this is generally what you want for an overlapping search. If you try to use leftmost-first in an overlapping search, it is likely to produce counter-intuitive results since leftmost-first specifically excludes some matches from its underlying finite state machine.

This example demonstrates the counter-intuitive behavior of All semantics when using a standard leftmost unanchored search:

use regex_automata::{
    nfa::thompson::pikevm::PikeVM,
    Match, MatchKind,
};

let re = PikeVM::builder()
    .configure(PikeVM::config().match_kind(MatchKind::All))
    .build("foo")?;
let hay = "first foo second foo wat";
let mut cache = re.create_cache();
let got: Vec<Match> = re.find_iter(&mut cache, hay).collect();
// Notice that it completely skips the first 'foo'!
let expected = vec![Match::must(0, 17..20)];
assert_eq!(expected, got);

This second example shows how All semantics are useful for an overlapping search. Note that we use lower level lazy DFA APIs here since the NFA engines only currently support a very limited form of overlapping search.

use regex_automata::{
    hybrid::dfa::{DFA, OverlappingState},
    HalfMatch, Input, MatchKind,
};

let re = DFA::builder()
    // If we didn't set 'All' semantics here, then the regex would only
    // match 'foo' at offset 3 and nothing else. Why? Because the state
    // machine implements preference order and knows that the 'foofoo' and
    // 'foofoofoo' branches can never match since 'foo' will always match
    // when they match and take priority.
    .configure(DFA::config().match_kind(MatchKind::All))
    .build(r"foo|foofoo|foofoofoo")?;
let mut cache = re.create_cache();
let mut state = OverlappingState::start();
let input = Input::new("foofoofoo");
let mut got = vec![];
loop {
    re.try_search_overlapping_fwd(&mut cache, &input, &mut state)?;
    let m = match state.get_match() {
        None => break,
        Some(m) => m,
    };
    got.push(m);
}
let expected = vec![
    HalfMatch::must(0, 3),
    HalfMatch::must(0, 6),
    HalfMatch::must(0, 9),
];
assert_eq!(expected, got);

Variants (Non-exhaustive)§

This enum is marked as non-exhaustive

Non-exhaustive enums could have additional variants added in future. Therefore, when matching against variants of non-exhaustive enums, an extra wildcard arm must be added to account for any future variants.

§

All

Report all possible matches.

§

LeftmostFirst

Report only the leftmost matches. When multiple leftmost matches exist, report the match corresponding to the part of the regex that appears first in the syntax.

Enum regex_automata::util::search::MatchKind

Variants (Non-exhaustive)§

All

LeftmostFirst

Implementations§

impl MatchKind

pub(crate) fn continue_past_first_match(&self) -> bool

Trait Implementations§

impl Clone for MatchKind

fn clone(&self) -> MatchKind

fn clone_from(&mut self, source: &Self)

impl Debug for MatchKind

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for MatchKind

fn default() -> MatchKind

impl PartialEq<MatchKind> for MatchKind

fn eq(&self, other: &MatchKind) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Copy for MatchKind

impl Eq for MatchKind

impl StructuralEq for MatchKind

impl StructuralPartialEq for MatchKind

Auto Trait Implementations§

impl RefUnwindSafe for MatchKind

impl Send for MatchKind

impl Sync for MatchKind

impl Unpin for MatchKind

impl UnwindSafe for MatchKind

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>