Struct regex_automata::hybrid::regex::Regex

source ·
pub struct Regex {
    forward: DFA,
    reverse: DFA,
}
Expand description

A regular expression that uses hybrid NFA/DFAs (also called “lazy DFAs”) for searching.

A regular expression is comprised of two lazy DFAs, a “forward” DFA and a “reverse” DFA. The forward DFA is responsible for detecting the end of a match while the reverse DFA is responsible for detecting the start of a match. Thus, in order to find the bounds of any given match, a forward search must first be run followed by a reverse search. A match found by the forward DFA guarantees that the reverse DFA will also find a match.

§Fallibility

Most of the search routines defined on this type will panic when the underlying search fails. This might be because the DFA gave up because it saw a quit byte, whether configured explicitly or via heuristic Unicode word boundary support, although neither are enabled by default. It might also fail if the underlying DFA determines it isn’t making effective use of the cache (which also never happens by default). Or it might fail because an invalid Input configuration is given, for example, with an unsupported Anchored mode.

If you need to handle these error cases instead of allowing them to trigger a panic, then the lower level Regex::try_search provides a fallible API that never panics.

§Example

This example shows how to cause a search to terminate if it sees a \n byte, and handle the error returned. This could be useful if, for example, you wanted to prevent a user supplied pattern from matching across a line boundary.

use regex_automata::{hybrid::{dfa, regex::Regex}, Input, MatchError};

let re = Regex::builder()
    .dfa(dfa::Config::new().quit(b'\n', true))
    .build(r"foo\p{any}+bar")?;
let mut cache = re.create_cache();

let input = Input::new("foo\nbar");
// Normally this would produce a match, since \p{any} contains '\n'.
// But since we instructed the automaton to enter a quit state if a
// '\n' is observed, this produces a match error instead.
let expected = MatchError::quit(b'\n', 3);
let got = re.try_search(&mut cache, &input).unwrap_err();
assert_eq!(expected, got);

Fields§

§forward: DFA

The forward lazy DFA. This can only find the end of a match.

§reverse: DFA

The reverse lazy DFA. This can only find the start of a match.

This is built with ‘all’ match semantics (instead of leftmost-first) so that it always finds the longest possible match (which corresponds to the leftmost starting position). It is also compiled as an anchored matcher and has ‘starts_for_each_pattern’ enabled. Including starting states for each pattern is necessary to ensure that we only look for matches of a pattern that matched in the forward direction. Otherwise, we might wind up finding the “leftmost” starting position of a totally different pattern!

Implementations§

source§

impl Regex

Convenience routines for regex and cache construction.

source

pub fn new(pattern: &str) -> Result<Regex, BuildError>

Parse the given regular expression using the default configuration and return the corresponding regex.

If you want a non-default configuration, then use the Builder to set your own configuration.

§Example
use regex_automata::{hybrid::regex::Regex, Match};

let re = Regex::new("foo[0-9]+bar")?;
let mut cache = re.create_cache();
assert_eq!(
    Some(Match::must(0, 3..14)),
    re.find(&mut cache, "zzzfoo12345barzzz"),
);
source

pub fn new_many<P: AsRef<str>>(patterns: &[P]) -> Result<Regex, BuildError>

Like new, but parses multiple patterns into a single “multi regex.” This similarly uses the default regex configuration.

§Example
use regex_automata::{hybrid::regex::Regex, Match};

let re = Regex::new_many(&["[a-z]+", "[0-9]+"])?;
let mut cache = re.create_cache();

let mut it = re.find_iter(&mut cache, "abc 1 foo 4567 0 quux");
assert_eq!(Some(Match::must(0, 0..3)), it.next());
assert_eq!(Some(Match::must(1, 4..5)), it.next());
assert_eq!(Some(Match::must(0, 6..9)), it.next());
assert_eq!(Some(Match::must(1, 10..14)), it.next());
assert_eq!(Some(Match::must(1, 15..16)), it.next());
assert_eq!(Some(Match::must(0, 17..21)), it.next());
assert_eq!(None, it.next());
source

pub fn builder() -> Builder

Return a builder for configuring the construction of a Regex.

This is a convenience routine to avoid needing to import the Builder type in common cases.

§Example

This example shows how to use the builder to disable UTF-8 mode everywhere.

use regex_automata::{
    hybrid::regex::Regex, nfa::thompson, util::syntax, Match,
};

let re = Regex::builder()
    .syntax(syntax::Config::new().utf8(false))
    .thompson(thompson::Config::new().utf8(false))
    .build(r"foo(?-u:[^b])ar.*")?;
let mut cache = re.create_cache();

let haystack = b"\xFEfoo\xFFarzz\xE2\x98\xFF\n";
let expected = Some(Match::must(0, 1..9));
let got = re.find(&mut cache, haystack);
assert_eq!(expected, got);
source

pub fn create_cache(&self) -> Cache

Create a new cache for this Regex.

The cache returned should only be used for searches for this Regex. If you want to reuse the cache for another Regex, then you must call Cache::reset with that Regex (or, equivalently, Regex::reset_cache).

source

pub fn reset_cache(&self, cache: &mut Cache)

Reset the given cache such that it can be used for searching with the this Regex (and only this Regex).

A cache reset permits reusing memory already allocated in this cache with a different Regex.

Resetting a cache sets its “clear count” to 0. This is relevant if the Regex has been configured to “give up” after it has cleared the cache a certain number of times.

§Example

This shows how to re-purpose a cache for use with a different Regex.

use regex_automata::{hybrid::regex::Regex, Match};

let re1 = Regex::new(r"\w")?;
let re2 = Regex::new(r"\W")?;

let mut cache = re1.create_cache();
assert_eq!(
    Some(Match::must(0, 0..2)),
    re1.find(&mut cache, "Δ"),
);

// Using 'cache' with re2 is not allowed. It may result in panics or
// incorrect results. In order to re-purpose the cache, we must reset
// it with the Regex we'd like to use it with.
//
// Similarly, after this reset, using the cache with 're1' is also not
// allowed.
re2.reset_cache(&mut cache);
assert_eq!(
    Some(Match::must(0, 0..3)),
    re2.find(&mut cache, "☃"),
);
source§

impl Regex

Standard infallible search routines for finding and iterating over matches.

source

pub fn is_match<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, ) -> bool

Returns true if and only if this regex matches the given haystack.

This routine may short circuit if it knows that scanning future input will never lead to a different result. In particular, if the underlying DFA enters a match state or a dead state, then this routine will return true or false, respectively, without inspecting any future input.

§Panics

This routine panics if the search could not complete. This can occur in a number of circumstances:

  • The configuration of the lazy DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the lazy DFA quitting.
  • The configuration of the lazy DFA may also permit it to “give up” on a search if it makes ineffective use of its transition table cache. The default configuration does not enable this by default, although it is typically a good idea to.
  • When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode.

When a search panics, callers cannot know whether a match exists or not.

Use Regex::try_search if you want to handle these error conditions.

§Example
use regex_automata::hybrid::regex::Regex;

let re = Regex::new("foo[0-9]+bar")?;
let mut cache = re.create_cache();

assert!(re.is_match(&mut cache, "foo12345bar"));
assert!(!re.is_match(&mut cache, "foobar"));
source

pub fn find<'h, I: Into<Input<'h>>>( &self, cache: &mut Cache, input: I, ) -> Option<Match>

Returns the start and end offset of the leftmost match. If no match exists, then None is returned.

§Panics

This routine panics if the search could not complete. This can occur in a number of circumstances:

  • The configuration of the lazy DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the lazy DFA quitting.
  • The configuration of the lazy DFA may also permit it to “give up” on a search if it makes ineffective use of its transition table cache. The default configuration does not enable this by default, although it is typically a good idea to.
  • When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode.

When a search panics, callers cannot know whether a match exists or not.

Use Regex::try_search if you want to handle these error conditions.

§Example
use regex_automata::{Match, hybrid::regex::Regex};

let re = Regex::new("foo[0-9]+")?;
let mut cache = re.create_cache();
assert_eq!(
    Some(Match::must(0, 3..11)),
    re.find(&mut cache, "zzzfoo12345zzz"),
);

// Even though a match is found after reading the first byte (`a`),
// the default leftmost-first match semantics demand that we find the
// earliest match that prefers earlier parts of the pattern over latter
// parts.
let re = Regex::new("abc|a")?;
let mut cache = re.create_cache();
assert_eq!(Some(Match::must(0, 0..3)), re.find(&mut cache, "abc"));
source

pub fn find_iter<'r, 'c, 'h, I: Into<Input<'h>>>( &'r self, cache: &'c mut Cache, input: I, ) -> FindMatches<'r, 'c, 'h>

Returns an iterator over all non-overlapping leftmost matches in the given bytes. If no match exists, then the iterator yields no elements.

§Panics

This routine panics if the search could not complete. This can occur in a number of circumstances:

  • The configuration of the lazy DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the lazy DFA quitting.
  • The configuration of the lazy DFA may also permit it to “give up” on a search if it makes ineffective use of its transition table cache. The default configuration does not enable this by default, although it is typically a good idea to.
  • When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode.

When a search panics, callers cannot know whether a match exists or not.

The above conditions also apply to the iterator returned as well. For example, if the lazy DFA gives up or quits during a search using this method, then a panic will occur during iteration.

Use Regex::try_search with util::iter::Searcher if you want to handle these error conditions.

§Example
use regex_automata::{hybrid::regex::Regex, Match};

let re = Regex::new("foo[0-9]+")?;
let mut cache = re.create_cache();

let text = "foo1 foo12 foo123";
let matches: Vec<Match> = re.find_iter(&mut cache, text).collect();
assert_eq!(matches, vec![
    Match::must(0, 0..4),
    Match::must(0, 5..10),
    Match::must(0, 11..17),
]);
source§

impl Regex

Lower level “search” primitives that accept a &Input for cheap reuse and return an error if one occurs instead of panicking.

Returns the start and end offset of the leftmost match. If no match exists, then None is returned.

This is like Regex::find but with two differences:

  1. It is not generic over Into<Input> and instead accepts a &Input. This permits reusing the same Input for multiple searches without needing to create a new one. This may help with latency.
  2. It returns an error if the search could not complete where as Regex::find will panic.
§Errors

This routine errors if the search could not complete. This can occur in a number of circumstances:

  • The configuration of the lazy DFA may permit it to “quit” the search. For example, setting quit bytes or enabling heuristic support for Unicode word boundaries. The default configuration does not enable any option that could result in the lazy DFA quitting.
  • The configuration of the lazy DFA may also permit it to “give up” on a search if it makes ineffective use of its transition table cache. The default configuration does not enable this by default, although it is typically a good idea to.
  • When the provided Input configuration is not supported. For example, by providing an unsupported anchor mode.

When a search returns an error, callers cannot know whether a match exists or not.

source

fn is_anchored(&self, input: &Input<'_>) -> bool

Returns true if either the given input specifies an anchored search or if the underlying NFA is always anchored.

source§

impl Regex

Non-search APIs for querying information about the regex and setting a prefilter.

source

pub fn forward(&self) -> &DFA

Return the underlying lazy DFA responsible for forward matching.

This is useful for accessing the underlying lazy DFA and using it directly if the situation calls for it.

source

pub fn reverse(&self) -> &DFA

Return the underlying lazy DFA responsible for reverse matching.

This is useful for accessing the underlying lazy DFA and using it directly if the situation calls for it.

source

pub fn pattern_len(&self) -> usize

Returns the total number of patterns matched by this regex.

§Example
use regex_automata::hybrid::regex::Regex;

let re = Regex::new_many(&[r"[a-z]+", r"[0-9]+", r"\w+"])?;
assert_eq!(3, re.pattern_len());

Trait Implementations§

source§

impl Debug for Regex

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl Freeze for Regex

§

impl RefUnwindSafe for Regex

§

impl Send for Regex

§

impl Sync for Regex

§

impl Unpin for Regex

§

impl UnwindSafe for Regex

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.