Module regex_automata::util::prefilter
source · Expand description
Defines a prefilter for accelerating regex searches.
A prefilter can be created by building a Prefilter
value.
A prefilter represents one of the most important optimizations available for accelerating regex searches. The idea of a prefilter is to very quickly find candidate locations in a haystack where a regex could match. Once a candidate is found, it is then intended for the regex engine to run at that position to determine whether the candidate is a match or a false positive.
In the aforementioned description of the prefilter optimization also lay its demise. Namely, if a prefilter has a high false positive rate and it produces lots of candidates, then a prefilter can overall make a regex search slower. It can run more slowly because more time is spent ping-ponging between the prefilter search and the regex engine attempting to confirm each candidate as a match. This ping-ponging has overhead that adds up, and is exacerbated by a high false positive rate.
Nevertheless, the optimization is still generally worth performing in most cases. Particularly given just how much throughput can be improved. (It is not uncommon for prefilter optimizations to improve throughput by one or two orders of magnitude.)
Typically a prefilter is used to find occurrences of literal prefixes from a regex pattern, but this isn’t required. A prefilter can be used to look for suffixes or even inner literals.
Note that as of now, prefilters throw away information about which pattern each literal comes from. In other words, when a prefilter finds a match, there’s no way to know which pattern (or patterns) it came from. Therefore, in order to confirm a match, you’ll have to check all of the patterns by running the full regex engine.
Modules§
Structs§
- A prefilter for accelerating regex searches.
Enums§
- Choice 🔒A type that encapsulates the selection of a prefilter algorithm from a sequence of needles.
Traits§
- A trait for abstracting over prefilters. Basically, a prefilter is something that do an unanchored and an anchored search in a haystack within a given span.
Functions§
- prefixes 🔒Extracts all of the prefix literals from the given HIR expressions into a single
Seq
. The literals in the sequence are ordered with respect to the order of the given HIR expressions and consistent with the match semantics given. - suffixes 🔒Like
prefixes
, but for all suffixes of all matches for the given HIRs.