Struct Utf8BoundedMap

Help

pub struct Utf8BoundedMap {
    version: u16,
    capacity: usize,
    map: Vec<Utf8BoundedEntry>,
}

Expand description

A bounded hash map where the key is a sequence of NFA transitions and the value is a pre-existing NFA state ID.

std’s hashmap can be used for this, however, this map has two important advantages. Firstly, it has lower overhead. Secondly, it permits us to control our memory usage by limited the number of slots. In general, the cost here is that this map acts as a cache. That is, inserting a new entry may remove an old entry. We are okay with this, since it does not impact correctness in the cases where it is used. The only effect that dropping states from the cache has is that the resulting NFA generated may be bigger than it otherwise would be.

This improves benchmarks that compile large Unicode character classes, since it makes the generation of (almost) minimal UTF-8 automaton faster. Specifically, one could observe the difference with std’s hashmap via something like the following benchmark:

hyperfine “regex-cli debug thompson -qr –captures none ‘\w{90} ecurB’”

But to observe that difference, you’d have to modify the code to use std’s hashmap.

It is quite possible that there is a better way to approach this problem. For example, if there happens to be a very common state that collides with a lot of less frequent states, then we could wind up with very poor caching behavior. Alas, the effectiveness of this cache has not been measured. Instead, ad hoc experiments suggest that it is “good enough.” Additional smarts (such as an LRU eviction policy) have to be weighed against the amount of extra time they cost.

Fields§

§version: u16

The current version of this map. Only entries with matching versions are considered during lookups. If an entry is found with a mismatched version, then the map behaves as if the entry does not exist.

This makes it possible to clear the map by simply incrementing the version number instead of actually deallocating any storage.

§capacity: usize

The total number of entries this map can store.

§map: Vec<Utf8BoundedEntry>

The actual entries, keyed by hash. Collisions between different states result in the old state being dropped.

Struct Utf8BoundedMapCopy item path

Fields§

Implementations§

impl Utf8BoundedMap

pub fn new(capacity: usize) -> Utf8BoundedMap

pub fn clear(&mut self)

pub fn hash(&self, key: &[Transition]) -> usize

pub fn get(&mut self, key: &[Transition], hash: usize) -> Option<StateID>

pub fn set(&mut self, key: Vec<Transition>, hash: usize, state_id: StateID)

Trait Implementations§

impl Clone for Utf8BoundedMap

fn clone(&self) -> Utf8BoundedMap

fn clone_from(&mut self, source: &Self)

impl Debug for Utf8BoundedMap

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl Freeze for Utf8BoundedMap

impl RefUnwindSafe for Utf8BoundedMap

impl Send for Utf8BoundedMap

impl Sync for Utf8BoundedMap

impl Unpin for Utf8BoundedMap

impl UnwindSafe for Utf8BoundedMap

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Utf8BoundedMap

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,