Struct Utf8Compiler

source

struct Utf8Compiler<'a> {
    builder: &'a mut Builder,
    state: &'a mut Utf8State,
    target: StateID,
}

Expand description

A UTF-8 compiler based on Daciuk’s algorithm for compilining minimal DFAs from a lexicographically sorted sequence of strings in linear time.

The trick here is that any Unicode codepoint range can be converted to a sequence of byte ranges that form a UTF-8 automaton. Connecting them together via an alternation is trivial, and indeed, it works. However, there is a lot of redundant structure in many UTF-8 automatons. Since our UTF-8 ranges are in lexicographic order, we can use Daciuk’s algorithm to build nearly minimal DFAs in linear time. (They are guaranteed to be minimal because we use a bounded cache of previously build DFA states.)

The drawback is that this sadly doesn’t work for reverse automata, since the ranges are no longer in lexicographic order. For that, we invented the range trie (which gets its own module). Once a range trie is built, we then use this same Utf8Compiler to build a reverse UTF-8 automaton.

The high level idea is described here: https://blog.burntsushi.net/transducers/#finite-state-machines-as-data-structures

There is also another implementation of this in the fst crate.

Fields§

§builder: &'a mut Builder§state: &'a mut Utf8State§target: StateID

Struct Utf8CompilerCopy item path

Fields§

Implementations§

impl<'a> Utf8Compiler<'a>

fn new( builder: &'a mut Builder, state: &'a mut Utf8State, ) -> Result<Utf8Compiler<'a>, BuildError>

fn finish(&mut self) -> Result<ThompsonRef, BuildError>

fn add(&mut self, ranges: &[Utf8Range]) -> Result<(), BuildError>

fn compile_from(&mut self, from: usize) -> Result<(), BuildError>

fn compile(&mut self, node: Vec<Transition>) -> Result<StateID, BuildError>

fn add_suffix(&mut self, ranges: &[Utf8Range])

fn add_empty(&mut self)

fn pop_freeze(&mut self, next: StateID) -> Vec<Transition>

fn pop_root(&mut self) -> Vec<Transition>

fn top_last_freeze(&mut self, next: StateID)

Trait Implementations§

impl<'a> Debug for Utf8Compiler<'a>

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Auto Trait Implementations§

impl<'a> Freeze for Utf8Compiler<'a>

impl<'a> RefUnwindSafe for Utf8Compiler<'a>

impl<'a> Send for Utf8Compiler<'a>

impl<'a> Sync for Utf8Compiler<'a>

impl<'a> Unpin for Utf8Compiler<'a>

impl<'a> !UnwindSafe for Utf8Compiler<'a>

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Utf8Compiler

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,