Struct regex_automata::nfa::thompson::map::Utf8BoundedMap

source ·
pub struct Utf8BoundedMap {
    version: u16,
    capacity: usize,
    map: Vec<Utf8BoundedEntry>,
}
Expand description

A bounded hash map where the key is a sequence of NFA transitions and the value is a pre-existing NFA state ID.

std’s hashmap can be used for this, however, this map has two important advantages. Firstly, it has lower overhead. Secondly, it permits us to control our memory usage by limited the number of slots. In general, the cost here is that this map acts as a cache. That is, inserting a new entry may remove an old entry. We are okay with this, since it does not impact correctness in the cases where it is used. The only effect that dropping states from the cache has is that the resulting NFA generated may be bigger than it otherwise would be.

This improves benchmarks that compile large Unicode character classes, since it makes the generation of (almost) minimal UTF-8 automaton faster. Specifically, one could observe the difference with std’s hashmap via something like the following benchmark:

hyperfine “regex-cli debug thompson -qr –captures none ‘\w{90} ecurB’”

But to observe that difference, you’d have to modify the code to use std’s hashmap.

It is quite possible that there is a better way to approach this problem. For example, if there happens to be a very common state that collides with a lot of less frequent states, then we could wind up with very poor caching behavior. Alas, the effectiveness of this cache has not been measured. Instead, ad hoc experiments suggest that it is “good enough.” Additional smarts (such as an LRU eviction policy) have to be weighed against the amount of extra time they cost.

Fields§

§version: u16

The current version of this map. Only entries with matching versions are considered during lookups. If an entry is found with a mismatched version, then the map behaves as if the entry does not exist.

This makes it possible to clear the map by simply incrementing the version number instead of actually deallocating any storage.

§capacity: usize

The total number of entries this map can store.

§map: Vec<Utf8BoundedEntry>

The actual entries, keyed by hash. Collisions between different states result in the old state being dropped.

Implementations§

source§

impl Utf8BoundedMap

source

pub fn new(capacity: usize) -> Utf8BoundedMap

Create a new bounded map with the given capacity. The map will never grow beyond the given size.

Note that this does not allocate. Instead, callers must call clear before using this map. clear will allocate space if necessary.

This avoids the need to pay for the allocation of this map when compiling regexes that lack large Unicode character classes.

source

pub fn clear(&mut self)

Clear this map of all entries, but permit the reuse of allocation if possible.

This must be called before the map can be used.

source

pub fn hash(&self, key: &[Transition]) -> usize

Return a hash of the given transitions.

source

pub fn get(&mut self, key: &[Transition], hash: usize) -> Option<StateID>

Retrieve the cached state ID corresponding to the given key. The hash given must have been computed with hash using the same key value.

If there is no cached state with the given transitions, then None is returned.

source

pub fn set(&mut self, key: Vec<Transition>, hash: usize, state_id: StateID)

Add a cached state to this map with the given key. Callers should ensure that state_id points to a state that contains precisely the NFA transitions given.

hash must have been computed using the hash method with the same key.

Trait Implementations§

source§

impl Clone for Utf8BoundedMap

source§

fn clone(&self) -> Utf8BoundedMap

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Utf8BoundedMap

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.