Struct Cache

Help

pub struct Cache {
    trans: Vec<LazyStateID>,
    starts: Vec<LazyStateID>,
    states: Vec<State>,
    states_to_id: HashMap<State, LazyStateID>,
    sparses: SparseSets,
    stack: Vec<StateID>,
    scratch_state_builder: StateBuilderEmpty,
    state_saver: StateSaver,
    memory_usage_state: usize,
    clear_count: usize,
    bytes_searched: usize,
    progress: Option<SearchProgress>,
}

Expand description

A cache represents a partially computed DFA.

A cache is the key component that differentiates a classical DFA and a hybrid NFA/DFA (also called a “lazy DFA”). Where a classical DFA builds a complete transition table that can handle all possible inputs, a hybrid NFA/DFA starts with an empty transition table and builds only the parts required during search. The parts that are built are stored in a cache. For this reason, a cache is a required parameter for nearly every operation on a DFA.

Caches can be created from their corresponding DFA via DFA::create_cache. A cache can only be used with either the DFA that created it, or the DFA that was most recently used to reset it with Cache::reset. Using a cache with any other DFA may result in panics or incorrect results.

Fields§

§trans: Vec<LazyStateID>

The transition table.

Given a current LazyStateID and an input byte, the next state can be computed via trans[untagged(current) + equiv_class(input)]. Notice that no multiplication is used. That’s because state identifiers are “premultiplied.”

Note that the next state may be the “unknown” state. In this case, the next state is not known and determinization for current on input must be performed.

§starts: Vec<LazyStateID>

The starting states for this DFA.

These are computed lazily. Initially, these are all set to “unknown” lazy state IDs.

When ‘starts_for_each_pattern’ is disabled (the default), then the size of this is constrained to the possible starting configurations based on the search parameters. (At time of writing, that’s 4.) However, when starting states for each pattern is enabled, then there are N additional groups of starting states, where each group reflects the different possible configurations and N is the number of patterns.

§states: Vec<State>

A sequence of NFA/DFA powerset states that have been computed for this lazy DFA. This sequence is indexable by untagged LazyStateIDs. (Every tagged LazyStateID can be used to index this sequence by converting it to its untagged form.)

§states_to_id: HashMap<State, LazyStateID>

A map from states to their corresponding IDs. This map may be accessed via the raw byte representation of a state, which means that a State does not need to be allocated to determine whether it already exists in this map. Indeed, the existence of such a state is what determines whether we allocate a new State or not.

The higher level idea here is that we do just enough determinization for a state to check whether we’ve already computed it. If we have, then we can save a little (albeit not much) work. The real savings is in memory usage. If we never checked for trivially duplicate states, then our memory usage would explode to unreasonable levels.

§sparses: SparseSets

Sparse sets used to track which NFA states have been visited during various traversals.

§stack: Vec<StateID>

Scratch space for traversing the NFA graph. (We use space on the heap instead of the call stack.)

§scratch_state_builder: StateBuilderEmpty

Scratch space for building a NFA/DFA powerset state. This is used to help amortize allocation since not every powerset state generated is added to the cache. In particular, if it already exists in the cache, then there is no need to allocate a new State for it.

§state_saver: StateSaver

A simple abstraction for handling the saving of at most a single state across a cache clearing. This is required for correctness. Namely, if adding a new state after clearing the cache fails, then the caller must retain the ability to continue using the state ID given. The state corresponding to the state ID is what we preserve across cache clearings.

§memory_usage_state: usize

The memory usage, in bytes, used by ‘states’ and ‘states_to_id’. We track this as new states are added since states use a variable amount of heap. Tracking this as we add states makes it possible to compute the total amount of memory used by the determinizer in constant time.

§clear_count: usize

The number of times the cache has been cleared. When a minimum cache clear count is set, then the cache will return an error instead of clearing the cache if the count has been exceeded.

§bytes_searched: usize

The total number of bytes searched since the last time this cache was cleared, not including the current search.

This can be added to the length of the current search to get the true total number of bytes searched.

This is generally only non-zero when the Cache::search_{start,update,finish} APIs are used to track search progress.

§progress: Option<SearchProgress>

The progress of the current search.

This is only non-None when callers utlize the Cache::search_start, Cache::search_update and Cache::search_finish APIs.

The purpose of recording search progress is to be able to make a determination about the efficiency of the cache. Namely, by keeping track of the

Struct CacheCopy item path

Fields§

Implementations§

impl Cache

pub fn new(dfa: &DFA) -> Cache

pub fn reset(&mut self, dfa: &DFA)

§Example

pub fn search_start(&mut self, at: usize)

pub fn search_update(&mut self, at: usize)

§Panics

pub fn search_finish(&mut self, at: usize)

§Panics

pub fn search_total_len(&self) -> usize

pub fn clear_count(&self) -> usize

pub fn memory_usage(&self) -> usize

Trait Implementations§

impl Clone for Cache

fn clone(&self) -> Cache

fn clone_from(&mut self, source: &Self)

impl Debug for Cache

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl Freeze for Cache

impl RefUnwindSafe for Cache

impl Send for Cache

impl Sync for Cache

impl Unpin for Cache

impl UnwindSafe for Cache

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Cache

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,