Struct regex_automata::util::look::LookMatcher

source ·
pub struct LookMatcher {
    lineterm: DebugByte,
}
Expand description

A matcher for look-around assertions.

This matcher permits configuring aspects of how look-around assertions are matched.

§Example

A LookMatcher can change the line terminator used for matching multi-line anchors such as (?m:^) and (?m:$).

use regex_automata::{
    nfa::thompson::{self, pikevm::PikeVM},
    util::look::LookMatcher,
    Match, Input,
};

let mut lookm = LookMatcher::new();
lookm.set_line_terminator(b'\x00');

let re = PikeVM::builder()
    .thompson(thompson::Config::new().look_matcher(lookm))
    .build(r"(?m)^[a-z]+$")?;
let mut cache = re.create_cache();

// Multi-line assertions now use NUL as a terminator.
assert_eq!(
    Some(Match::must(0, 1..4)),
    re.find(&mut cache, b"\x00abc\x00"),
);
// ... and \n is no longer recognized as a terminator.
assert_eq!(
    None,
    re.find(&mut cache, b"\nabc\n"),
);

Fields§

§lineterm: DebugByte

Implementations§

source§

impl LookMatcher

source

pub fn new() -> LookMatcher

Creates a new default matcher for look-around assertions.

source

pub fn set_line_terminator(&mut self, byte: u8) -> &mut LookMatcher

Sets the line terminator for use with (?m:^) and (?m:$).

Namely, instead of ^ matching after \n and $ matching immediately before a \n, this will cause it to match after and before the byte given.

It can occasionally be useful to use this to configure the line terminator to the NUL byte when searching binary data.

Note that this does not apply to CRLF-aware line anchors such as (?Rm:^) and (?Rm:$). CRLF-aware line anchors are hard-coded to use \r and \n.

source

pub fn get_line_terminator(&self) -> u8

Returns the line terminator that was configured for this matcher.

If no line terminator was configured, then this returns \n.

Note that the line terminator should only be used for matching (?m:^) and (?m:$) assertions. It specifically should not be used for matching the CRLF aware assertions (?Rm:^) and (?Rm:$).

source

pub fn matches(&self, look: Look, haystack: &[u8], at: usize) -> bool

Returns true when the position at in haystack satisfies the given look-around assertion.

§Panics

This panics when testing any Unicode word boundary assertion in this set and when the Unicode word data is not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

Since it’s generally expected that this routine is called inside of a matching engine, callers should check the error condition when building the matching engine. If there is a Unicode word boundary in the matcher and the data isn’t available, then the matcher should fail to build.

Callers can check the error condition with LookSet::available.

This also may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub(crate) fn matches_inline( &self, look: Look, haystack: &[u8], at: usize, ) -> bool

Like matches, but forcefully inlined.

§Panics

This panics when testing any Unicode word boundary assertion in this set and when the Unicode word data is not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

Since it’s generally expected that this routine is called inside of a matching engine, callers should check the error condition when building the matching engine. If there is a Unicode word boundary in the matcher and the data isn’t available, then the matcher should fail to build.

Callers can check the error condition with LookSet::available.

This also may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn matches_set(&self, set: LookSet, haystack: &[u8], at: usize) -> bool

Returns true when all of the assertions in the given set match at the given position in the haystack.

§Panics

This panics when testing any Unicode word boundary assertion in this set and when the Unicode word data is not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

Since it’s generally expected that this routine is called inside of a matching engine, callers should check the error condition when building the matching engine. If there is a Unicode word boundary in the matcher and the data isn’t available, then the matcher should fail to build.

Callers can check the error condition with LookSet::available.

This also may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub(crate) fn matches_set_inline( &self, set: LookSet, haystack: &[u8], at: usize, ) -> bool

Like LookSet::matches, but forcefully inlined for perf.

source

pub(crate) fn add_to_byteset(&self, look: Look, set: &mut ByteClassSet)

Split up the given byte classes into equivalence classes in a way that is consistent with this look-around assertion.

source

pub fn is_start(&self, _haystack: &[u8], at: usize) -> bool

Returns true when Look::Start is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_end(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::End is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_start_lf(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::StartLF is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_end_lf(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::EndLF is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_start_crlf(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::StartCRLF is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_end_crlf(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::EndCRLF is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_word_ascii(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::WordAscii is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_word_ascii_negate(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::WordAsciiNegate is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_word_unicode( &self, haystack: &[u8], at: usize, ) -> Result<bool, UnicodeWordBoundaryError>

Returns true when Look::WordUnicode is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

§Errors

This returns an error when Unicode word boundary tables are not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

source

pub fn is_word_unicode_negate( &self, haystack: &[u8], at: usize, ) -> Result<bool, UnicodeWordBoundaryError>

Returns true when Look::WordUnicodeNegate is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

§Errors

This returns an error when Unicode word boundary tables are not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

source

pub fn is_word_start_ascii(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::WordStartAscii is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_word_end_ascii(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::WordEndAscii is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_word_start_unicode( &self, haystack: &[u8], at: usize, ) -> Result<bool, UnicodeWordBoundaryError>

Returns true when Look::WordStartUnicode is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

§Errors

This returns an error when Unicode word boundary tables are not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

source

pub fn is_word_end_unicode( &self, haystack: &[u8], at: usize, ) -> Result<bool, UnicodeWordBoundaryError>

Returns true when Look::WordEndUnicode is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

§Errors

This returns an error when Unicode word boundary tables are not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

source

pub fn is_word_start_half_ascii(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::WordStartHalfAscii is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_word_end_half_ascii(&self, haystack: &[u8], at: usize) -> bool

Returns true when Look::WordEndHalfAscii is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

source

pub fn is_word_start_half_unicode( &self, haystack: &[u8], at: usize, ) -> Result<bool, UnicodeWordBoundaryError>

Returns true when Look::WordStartHalfUnicode is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

§Errors

This returns an error when Unicode word boundary tables are not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

source

pub fn is_word_end_half_unicode( &self, haystack: &[u8], at: usize, ) -> Result<bool, UnicodeWordBoundaryError>

Returns true when Look::WordEndHalfUnicode is satisfied at the given position in haystack.

§Panics

This may panic when at > haystack.len(). Note that at == haystack.len() is legal and guaranteed not to panic.

§Errors

This returns an error when Unicode word boundary tables are not available. Specifically, this only occurs when the unicode-word-boundary feature is not enabled.

Trait Implementations§

source§

impl Clone for LookMatcher

source§

fn clone(&self) -> LookMatcher

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for LookMatcher

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for LookMatcher

source§

fn default() -> LookMatcher

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.