Trait aho_corasick::packed::teddy::builder::SearcherT

source ·
trait SearcherT: Debug + Send + Sync + UnwindSafe + RefUnwindSafe + 'static {
    // Required method
    unsafe fn find(&self, start: *const u8, end: *const u8) -> Option<Match>;
}
Expand description

A trait that provides dynamic dispatch over the different possible Teddy variants on the same algorithm.

On x86_64 for example, it isn’t known until runtime which of 12 possible variants will be used. One might use one of the four slim 128-bit vector variants, or one of the four 256-bit vector variants or even one of the four fat 256-bit vector variants.

Since this choice is generally made when the Teddy searcher is constructed and this choice is based on the patterns given and what the current CPU supports, it follows that there must be some kind of indirection at search time that “selects” the variant chosen at build time.

There are a few different ways to go about this. One approach is to use an enum. It works fine, but in my experiments, this generally results in worse codegen. Another approach, which is what we use here, is dynamic dispatch via a trait object. We basically implement this trait for each possible variant, select the variant we want at build time and convert it to a trait object for use at search time.

Another approach is to use function pointers and stick each of the possible variants into a union. This is essentially isomorphic to the dynamic dispatch approach, but doesn’t require any allocations. Since this crate requires alloc, there’s no real reason (AFAIK) to go down this path. (The memchr crate does this.)

Required Methods§

source

unsafe fn find(&self, start: *const u8, end: *const u8) -> Option<Match>

Execute a search on the given haystack (identified by start and end raw pointers).

§Safety

Essentially, the start and end pointers must be valid and point to a haystack one can read. As long as you derive them from, for example, a &[u8], they should automatically satisfy all of the safety obligations:

  • Both start and end must be valid for reads.
  • Both start and end must point to an initialized value.
  • Both start and end must point to the same allocated object and must either be in bounds or at most one byte past the end of the allocated object.
  • Both start and end must be derived from a pointer to the same object.
  • The distance between start and end must not overflow isize.
  • The distance being in bounds must not rely on “wrapping around” the address space.
  • It must be the case that start <= end.
  • end - start must be greater than the minimum length for this searcher.

Also, it is expected that implementations of this trait will tag this method with a target_feature attribute. Callers must ensure that they are executing this method in an environment where that attribute is valid.

Implementors§