Struct regex_syntax::hir::literal::Extractor

source ·

pub struct Extractor {
    kind: ExtractKind,
    limit_class: usize,
    limit_repeat: usize,
    limit_literal_len: usize,
    limit_total: usize,
}

Expand description

Extracts prefix or suffix literal sequences from Hir expressions.

Literal extraction is based on the following observations:

Many regexes start with one or a small number of literals.
Substring search for literals is often much faster (sometimes by an order of magnitude) than a regex search.

Thus, in many cases, one can search for literals to find candidate starting locations of a match, and then only run the full regex engine at each such location instead of over the full haystack.

The main downside of literal extraction is that it can wind up causing a search to be slower overall. For example, if there are many matches or if there are many candidates that don’t ultimately lead to a match, then a lot of overhead will be spent in shuffing back-and-forth between substring search and the regex engine. This is the fundamental reason why literal optimizations for regex patterns is sometimes considered a “black art.”

§Look-around assertions

Literal extraction treats all look-around assertions as-if they match every empty string. So for example, the regex \bquux\b will yield a sequence containing a single exact literal quux. However, not all occurrences of quux correspond to a match a of the regex. For example, \bquux\b does not match ZquuxZ anywhere because quux does not fall on a word boundary.

In effect, if your regex contains look-around assertions, then a match of an exact literal does not necessarily mean the regex overall matches. So you may still need to run the regex engine in such cases to confirm the match.

The precise guarantee you get from a literal sequence is: if every literal in the sequence is exact and the original regex contains zero look-around assertions, then a preference-order multi-substring search of those literals will precisely match a preference-order search of the original regex.

§Example

This shows how to extract prefixes:

use regex_syntax::{hir::literal::{Extractor, Literal, Seq}, parse};

let hir = parse(r"(a|b|c)(x|y|z)[A-Z]+foo")?;
let got = Extractor::new().extract(&hir);
// All literals returned are "inexact" because none of them reach the
// match state.
let expected = Seq::from_iter([
    Literal::inexact("ax"),
    Literal::inexact("ay"),
    Literal::inexact("az"),
    Literal::inexact("bx"),
    Literal::inexact("by"),
    Literal::inexact("bz"),
    Literal::inexact("cx"),
    Literal::inexact("cy"),
    Literal::inexact("cz"),
]);
assert_eq!(expected, got);

This shows how to extract suffixes:

use regex_syntax::{
    hir::literal::{Extractor, ExtractKind, Literal, Seq},
    parse,
};

let hir = parse(r"foo|[A-Z]+bar")?;
let got = Extractor::new().kind(ExtractKind::Suffix).extract(&hir);
// Since 'foo' gets to a match state, it is considered exact. But 'bar'
// does not because of the '[A-Z]+', and thus is marked inexact.
let expected = Seq::from_iter([
    Literal::exact("foo"),
    Literal::inexact("bar"),
]);
assert_eq!(expected, got);

Fields§

§kind: ExtractKind§limit_class: usize§limit_repeat: usize§limit_literal_len: usize§limit_total: usize

Struct regex_syntax::hir::literal::ExtractorCopy item path

§Look-around assertions

§Example

Fields§

Implementations§

impl Extractor

pub fn new() -> Extractor

pub fn extract(&self, hir: &Hir) -> Seq

pub fn kind(&mut self, kind: ExtractKind) -> &mut Extractor

pub fn limit_class(&mut self, limit: usize) -> &mut Extractor

§Example

pub fn limit_repeat(&mut self, limit: usize) -> &mut Extractor

§Example

pub fn limit_literal_len(&mut self, limit: usize) -> &mut Extractor

§Example

pub fn limit_total(&mut self, limit: usize) -> &mut Extractor

§Example

fn extract_concat<'a, I: Iterator<Item = &'a Hir>>(&self, it: I) -> Seq

fn extract_alternation<'a, I: Iterator<Item = &'a Hir>>(&self, it: I) -> Seq

fn extract_repetition(&self, rep: &Repetition) -> Seq

fn extract_class_unicode(&self, cls: &ClassUnicode) -> Seq

fn extract_class_bytes(&self, cls: &ClassBytes) -> Seq

fn class_over_limit_unicode(&self, cls: &ClassUnicode) -> bool

fn class_over_limit_bytes(&self, cls: &ClassBytes) -> bool

fn cross(&self, seq1: Seq, seq2: &mut Seq) -> Seq

fn union(&self, seq1: Seq, seq2: &mut Seq) -> Seq

fn enforce_literal_len(&self, seq: &mut Seq)

Trait Implementations§

impl Clone for Extractor

fn clone(&self) -> Extractor

fn clone_from(&mut self, source: &Self)

impl Debug for Extractor

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl Default for Extractor

fn default() -> Extractor

Auto Trait Implementations§

impl Freeze for Extractor

impl RefUnwindSafe for Extractor

impl Send for Extractor

impl Sync for Extractor

impl Unpin for Extractor

impl UnwindSafe for Extractor

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut T)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct regex_syntax::hir::literal::Extractor

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,