regex_automata

Struct PatternSet

source
pub struct PatternSet {
    len: usize,
    which: Box<[bool]>,
}
Expand description

A set of PatternIDs.

A set of pattern identifiers is useful for recording which patterns have matched a particular haystack. A pattern set only includes pattern identifiers. It does not include offset information.

§Example

This shows basic usage of a set.

use regex_automata::{PatternID, PatternSet};

let pid1 = PatternID::must(5);
let pid2 = PatternID::must(8);
// Create a new empty set.
let mut set = PatternSet::new(10);
// Insert pattern IDs.
set.insert(pid1);
set.insert(pid2);
// Test membership.
assert!(set.contains(pid1));
assert!(set.contains(pid2));
// Get all members.
assert_eq!(
    vec![5, 8],
    set.iter().map(|p| p.as_usize()).collect::<Vec<usize>>(),
);
// Clear the set.
set.clear();
// Test that it is indeed empty.
assert!(set.is_empty());

Fields§

§len: usize

The number of patterns set to ‘true’ in this set.

§which: Box<[bool]>

A map from PatternID to boolean of whether a pattern matches or not.

This should probably be a bitset, but it’s probably unlikely to matter much in practice.

The main downside of this representation (and similarly for a bitset) is that iteration scales with the capacity of the set instead of the length of the set. This doesn’t seem likely to be a problem in practice.

Another alternative is to just use a ‘SparseSet’ for this. It does use more memory (quite a bit more), but that seems fine I think compared to the memory being used by the regex engine. The real hiccup with it is that it yields pattern IDs in the order they were inserted. Which is actually kind of nice, but at the time of writing, pattern IDs are yielded in ascending order in the regex crate RegexSet API. If we did change to ‘SparseSet’, we could provide an additional ‘iter_match_order’ iterator, but keep the ascending order one for compatibility.

Implementations§

source§

impl PatternSet

source

pub fn new(capacity: usize) -> PatternSet

Create a new set of pattern identifiers with the given capacity.

The given capacity typically corresponds to (at least) the number of patterns in a compiled regex object.

§Panics

This panics if the given capacity exceeds PatternID::LIMIT. This is impossible if you use the pattern_len() method as defined on any of the regex engines in this crate. Namely, a regex will fail to build by returning an error if the number of patterns given to it exceeds the limit. Therefore, the number of patterns in a valid regex is always a correct capacity to provide here.

source

pub fn clear(&mut self)

Clear this set such that it contains no pattern IDs.

source

pub fn contains(&self, pid: PatternID) -> bool

Return true if and only if the given pattern identifier is in this set.

source

pub fn insert(&mut self, pid: PatternID) -> bool

Insert the given pattern identifier into this set and return true if the given pattern ID was not previously in this set.

If the pattern identifier is already in this set, then this is a no-op.

Use PatternSet::try_insert for a fallible version of this routine.

§Panics

This panics if this pattern set has insufficient capacity to store the given pattern ID.

source

pub fn try_insert( &mut self, pid: PatternID, ) -> Result<bool, PatternSetInsertError>

Insert the given pattern identifier into this set and return true if the given pattern ID was not previously in this set.

If the pattern identifier is already in this set, then this is a no-op.

§Errors

This returns an error if this pattern set has insufficient capacity to store the given pattern ID.

source

pub fn is_empty(&self) -> bool

Return true if and only if this set has no pattern identifiers in it.

source

pub fn is_full(&self) -> bool

Return true if and only if this set has the maximum number of pattern identifiers in the set. This occurs precisely when PatternSet::len() == PatternSet::capacity().

This particular property is useful to test because it may allow one to stop a search earlier than you might otherwise. Namely, if a search is only reporting which patterns match a haystack and if you know all of the patterns match at a given point, then there’s no new information that can be learned by continuing the search. (Because a pattern set does not keep track of offset information.)

source

pub fn len(&self) -> usize

Returns the total number of pattern identifiers in this set.

source

pub fn capacity(&self) -> usize

Returns the total number of pattern identifiers that may be stored in this set.

This is guaranteed to be less than or equal to PatternID::LIMIT.

Typically, the capacity of a pattern set matches the number of patterns in a regex object with which you are searching.

source

pub fn iter(&self) -> PatternSetIter<'_>

Returns an iterator over all pattern identifiers in this set.

The iterator yields pattern identifiers in ascending order, starting at zero.

Trait Implementations§

source§

impl Clone for PatternSet

source§

fn clone(&self) -> PatternSet

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for PatternSet

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for PatternSet

source§

fn eq(&self, other: &PatternSet) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for PatternSet

source§

impl StructuralPartialEq for PatternSet

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.