Struct regex_automata::util::search::PatternSet
source · pub struct PatternSet {
len: usize,
which: Box<[bool]>,
}
Expand description
A set of PatternID
s.
A set of pattern identifiers is useful for recording which patterns have matched a particular haystack. A pattern set only includes pattern identifiers. It does not include offset information.
§Example
This shows basic usage of a set.
use regex_automata::{PatternID, PatternSet};
let pid1 = PatternID::must(5);
let pid2 = PatternID::must(8);
// Create a new empty set.
let mut set = PatternSet::new(10);
// Insert pattern IDs.
set.insert(pid1);
set.insert(pid2);
// Test membership.
assert!(set.contains(pid1));
assert!(set.contains(pid2));
// Get all members.
assert_eq!(
vec![5, 8],
set.iter().map(|p| p.as_usize()).collect::<Vec<usize>>(),
);
// Clear the set.
set.clear();
// Test that it is indeed empty.
assert!(set.is_empty());
Fields§
§len: usize
The number of patterns set to ‘true’ in this set.
which: Box<[bool]>
A map from PatternID to boolean of whether a pattern matches or not.
This should probably be a bitset, but it’s probably unlikely to matter much in practice.
The main downside of this representation (and similarly for a bitset) is that iteration scales with the capacity of the set instead of the length of the set. This doesn’t seem likely to be a problem in practice.
Another alternative is to just use a ‘SparseSet’ for this. It does use more memory (quite a bit more), but that seems fine I think compared to the memory being used by the regex engine. The real hiccup with it is that it yields pattern IDs in the order they were inserted. Which is actually kind of nice, but at the time of writing, pattern IDs are yielded in ascending order in the regex crate RegexSet API. If we did change to ‘SparseSet’, we could provide an additional ‘iter_match_order’ iterator, but keep the ascending order one for compatibility.
Implementations§
source§impl PatternSet
impl PatternSet
sourcepub fn new(capacity: usize) -> PatternSet
pub fn new(capacity: usize) -> PatternSet
Create a new set of pattern identifiers with the given capacity.
The given capacity typically corresponds to (at least) the number of patterns in a compiled regex object.
§Panics
This panics if the given capacity exceeds PatternID::LIMIT
. This is
impossible if you use the pattern_len()
method as defined on any of
the regex engines in this crate. Namely, a regex will fail to build by
returning an error if the number of patterns given to it exceeds the
limit. Therefore, the number of patterns in a valid regex is always
a correct capacity to provide here.
sourcepub fn contains(&self, pid: PatternID) -> bool
pub fn contains(&self, pid: PatternID) -> bool
Return true if and only if the given pattern identifier is in this set.
sourcepub fn insert(&mut self, pid: PatternID) -> bool
pub fn insert(&mut self, pid: PatternID) -> bool
Insert the given pattern identifier into this set and return true
if
the given pattern ID was not previously in this set.
If the pattern identifier is already in this set, then this is a no-op.
Use PatternSet::try_insert
for a fallible version of this routine.
§Panics
This panics if this pattern set has insufficient capacity to store the given pattern ID.
sourcepub fn try_insert(
&mut self,
pid: PatternID,
) -> Result<bool, PatternSetInsertError>
pub fn try_insert( &mut self, pid: PatternID, ) -> Result<bool, PatternSetInsertError>
Insert the given pattern identifier into this set and return true
if
the given pattern ID was not previously in this set.
If the pattern identifier is already in this set, then this is a no-op.
§Errors
This returns an error if this pattern set has insufficient capacity to store the given pattern ID.
sourcepub fn is_empty(&self) -> bool
pub fn is_empty(&self) -> bool
Return true if and only if this set has no pattern identifiers in it.
sourcepub fn is_full(&self) -> bool
pub fn is_full(&self) -> bool
Return true if and only if this set has the maximum number of pattern
identifiers in the set. This occurs precisely when PatternSet::len() == PatternSet::capacity()
.
This particular property is useful to test because it may allow one to stop a search earlier than you might otherwise. Namely, if a search is only reporting which patterns match a haystack and if you know all of the patterns match at a given point, then there’s no new information that can be learned by continuing the search. (Because a pattern set does not keep track of offset information.)
sourcepub fn capacity(&self) -> usize
pub fn capacity(&self) -> usize
Returns the total number of pattern identifiers that may be stored in this set.
This is guaranteed to be less than or equal to PatternID::LIMIT
.
Typically, the capacity of a pattern set matches the number of patterns in a regex object with which you are searching.
sourcepub fn iter(&self) -> PatternSetIter<'_> ⓘ
pub fn iter(&self) -> PatternSetIter<'_> ⓘ
Returns an iterator over all pattern identifiers in this set.
The iterator yields pattern identifiers in ascending order, starting at zero.
Trait Implementations§
source§impl Clone for PatternSet
impl Clone for PatternSet
source§fn clone(&self) -> PatternSet
fn clone(&self) -> PatternSet
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moresource§impl Debug for PatternSet
impl Debug for PatternSet
source§impl PartialEq for PatternSet
impl PartialEq for PatternSet
source§fn eq(&self, other: &PatternSet) -> bool
fn eq(&self, other: &PatternSet) -> bool
self
and other
values to be equal, and is used
by ==
.