Struct regex_syntax::hir::literal::Seq
source · pub struct Seq {
literals: Option<Vec<Literal>>,
}
Expand description
A sequence of literals.
A Seq
is very much like a set in that it represents a union of its
members. That is, it corresponds to a set of literals where at least one
must match in order for a particular Hir
expression to match. (Whether
this corresponds to the entire Hir
expression, a prefix of it or a suffix
of it depends on how the Seq
was extracted from the Hir
.)
It is also unlike a set in that multiple identical literals may appear,
and that the order of the literals in the Seq
matters. For example, if
the sequence is [sam, samwise]
and leftmost-first matching is used, then
samwise
can never match and the sequence is equivalent to [sam]
.
§States of a sequence
A Seq
has a few different logical states to consider:
- The sequence can represent “any” literal. When this happens, the set does
not have a finite size. The purpose of this state is to inhibit callers
from making assumptions about what literals are required in order to match
a particular
Hir
expression. Generally speaking, when a set is in this state, literal optimizations are inhibited. A good example of a regex that will cause this sort of set to appear is[A-Za-z]
. The character class is just too big (and also too narrow) to be usefully expanded into 52 different literals. (Note that the decision for when a seq should become infinite is determined by the caller. A seq itself has no hard-coded limits.) - The sequence can be empty, in which case, it is an affirmative statement
that there are no literals that can match the corresponding
Hir
. Consequently, theHir
never matches any input. For example,[a&&b]
. - The sequence can be non-empty, in which case, at least one of the
literals must match in order for the corresponding
Hir
to match.
§Example
This example shows how literal sequences can be simplified by stripping suffixes and minimizing while maintaining preference order.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq = Seq::new(&[
"farm",
"appliance",
"faraway",
"apple",
"fare",
"gap",
"applicant",
"applaud",
]);
seq.keep_first_bytes(3);
seq.minimize_by_preference();
// Notice that 'far' comes before 'app', which matches the order in the
// original sequence. This guarantees that leftmost-first semantics are
// not altered by simplifying the set.
let expected = Seq::from_iter([
Literal::inexact("far"),
Literal::inexact("app"),
Literal::exact("gap"),
]);
assert_eq!(expected, seq);
Fields§
§literals: Option<Vec<Literal>>
The members of this seq.
When None
, the seq represents all possible literals. That is, it
prevents one from making assumptions about specific literals in the
seq, and forces one to treat it as if any literal might be in the seq.
Note that Some(vec![])
is valid and corresponds to the empty seq of
literals, i.e., a regex that can never match. For example, [a&&b]
.
It is distinct from Some(vec![""])
, which corresponds to the seq
containing an empty string, which matches at every position.
Implementations§
source§impl Seq
impl Seq
sourcepub fn empty() -> Seq
pub fn empty() -> Seq
Returns an empty sequence.
An empty sequence matches zero literals, and thus corresponds to a regex that itself can never match.
sourcepub fn infinite() -> Seq
pub fn infinite() -> Seq
Returns a sequence of literals without a finite size and may contain any literal.
A sequence without finite size does not reveal anything about the characteristics of the literals in its set. There are no fixed prefixes or suffixes, nor are lower or upper bounds on the length of the literals in the set known.
This is useful to represent constructs in a regex that are “too big”
to useful represent as a sequence of literals. For example, [A-Za-z]
.
When sequences get too big, they lose their discriminating nature and
are more likely to produce false positives, which in turn makes them
less likely to speed up searches.
More pragmatically, for many regexes, enumerating all possible literals is itself not possible or might otherwise use too many resources. So constraining the size of sets during extraction is a practical trade off to make.
sourcepub fn new<I, B>(it: I) -> Seq
pub fn new<I, B>(it: I) -> Seq
Returns a sequence of exact literals from the given byte strings.
sourcepub fn literals(&self) -> Option<&[Literal]>
pub fn literals(&self) -> Option<&[Literal]>
If this is a finite sequence, return its members as a slice of literals.
The slice returned may be empty, in which case, there are no literals that can match this sequence.
sourcepub fn push(&mut self, lit: Literal)
pub fn push(&mut self, lit: Literal)
Push a literal to the end of this sequence.
If this sequence is not finite, then this is a no-op.
Similarly, if the most recently added item of this sequence is
equivalent to the literal given, then it is not added. This reflects
a Seq
’s “set like” behavior, and represents a practical trade off.
Namely, there is never any need to have two adjacent and equivalent
literals in the same sequence, and it is easy to detect in some
cases.
sourcepub fn make_inexact(&mut self)
pub fn make_inexact(&mut self)
Make all of the literals in this sequence inexact.
This is a no-op if this sequence is not finite.
sourcepub fn make_infinite(&mut self)
pub fn make_infinite(&mut self)
Converts this sequence to an infinite sequence.
This is a no-op if the sequence is already infinite.
sourcepub fn cross_forward(&mut self, other: &mut Seq)
pub fn cross_forward(&mut self, other: &mut Seq)
Modify this sequence to contain the cross product between it and the sequence given.
The cross product only considers literals in this sequence that are exact. That is, inexact literals are not extended.
The literals are always drained from other
, even if none are used.
This permits callers to reuse the sequence allocation elsewhere.
If this sequence is infinite, then this is a no-op, regardless of what
other
contains (and in this case, the literals are still drained from
other
). If other
is infinite and this sequence is finite, then this
is a no-op, unless this sequence contains a zero-length literal. In
which case, the infiniteness of other
infects this sequence, and this
sequence is itself made infinite.
Like Seq::union
, this may attempt to deduplicate literals. See
Seq::dedup
for how deduplication deals with exact and inexact
literals.
§Example
This example shows basic usage and how exact and inexact literals interact.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::from_iter([
Literal::exact("foo"),
Literal::inexact("bar"),
]);
let mut seq2 = Seq::from_iter([
Literal::inexact("quux"),
Literal::exact("baz"),
]);
seq1.cross_forward(&mut seq2);
// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());
let expected = Seq::from_iter([
Literal::inexact("fooquux"),
Literal::exact("foobaz"),
Literal::inexact("bar"),
]);
assert_eq!(expected, seq1);
This example shows the behavior of when other
is an infinite
sequence.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::from_iter([
Literal::exact("foo"),
Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_forward(&mut seq2);
// When seq2 is infinite, cross product doesn't add anything, but
// ensures all members of seq1 are inexact.
let expected = Seq::from_iter([
Literal::inexact("foo"),
Literal::inexact("bar"),
]);
assert_eq!(expected, seq1);
This example is like the one above, but shows what happens when this
sequence contains an empty string. In this case, an infinite other
sequence infects this sequence (because the empty string means that
there are no finite prefixes):
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::from_iter([
Literal::exact("foo"),
Literal::exact(""), // inexact provokes same behavior
Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_forward(&mut seq2);
// seq1 is now infinite!
assert!(!seq1.is_finite());
This example shows the behavior of this sequence is infinite.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::infinite();
let mut seq2 = Seq::from_iter([
Literal::exact("foo"),
Literal::inexact("bar"),
]);
seq1.cross_forward(&mut seq2);
// seq1 remains unchanged.
assert!(!seq1.is_finite());
// Even though the literals in seq2 weren't used, it was still drained.
assert_eq!(Some(0), seq2.len());
sourcepub fn cross_reverse(&mut self, other: &mut Seq)
pub fn cross_reverse(&mut self, other: &mut Seq)
Modify this sequence to contain the cross product between it and
the sequence given, where the sequences are treated as suffixes
instead of prefixes. Namely, the sequence other
is prepended
to self
(as opposed to other
being appended to self
in
Seq::cross_forward
).
The cross product only considers literals in this sequence that are exact. That is, inexact literals are not extended.
The literals are always drained from other
, even if none are used.
This permits callers to reuse the sequence allocation elsewhere.
If this sequence is infinite, then this is a no-op, regardless of what
other
contains (and in this case, the literals are still drained from
other
). If other
is infinite and this sequence is finite, then this
is a no-op, unless this sequence contains a zero-length literal. In
which case, the infiniteness of other
infects this sequence, and this
sequence is itself made infinite.
Like Seq::union
, this may attempt to deduplicate literals. See
Seq::dedup
for how deduplication deals with exact and inexact
literals.
§Example
This example shows basic usage and how exact and inexact literals interact.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::from_iter([
Literal::exact("foo"),
Literal::inexact("bar"),
]);
let mut seq2 = Seq::from_iter([
Literal::inexact("quux"),
Literal::exact("baz"),
]);
seq1.cross_reverse(&mut seq2);
// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());
let expected = Seq::from_iter([
Literal::inexact("quuxfoo"),
Literal::inexact("bar"),
Literal::exact("bazfoo"),
]);
assert_eq!(expected, seq1);
This example shows the behavior of when other
is an infinite
sequence.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::from_iter([
Literal::exact("foo"),
Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_reverse(&mut seq2);
// When seq2 is infinite, cross product doesn't add anything, but
// ensures all members of seq1 are inexact.
let expected = Seq::from_iter([
Literal::inexact("foo"),
Literal::inexact("bar"),
]);
assert_eq!(expected, seq1);
This example is like the one above, but shows what happens when this
sequence contains an empty string. In this case, an infinite other
sequence infects this sequence (because the empty string means that
there are no finite suffixes):
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::from_iter([
Literal::exact("foo"),
Literal::exact(""), // inexact provokes same behavior
Literal::inexact("bar"),
]);
let mut seq2 = Seq::infinite();
seq1.cross_reverse(&mut seq2);
// seq1 is now infinite!
assert!(!seq1.is_finite());
This example shows the behavior when this sequence is infinite.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq1 = Seq::infinite();
let mut seq2 = Seq::from_iter([
Literal::exact("foo"),
Literal::inexact("bar"),
]);
seq1.cross_reverse(&mut seq2);
// seq1 remains unchanged.
assert!(!seq1.is_finite());
// Even though the literals in seq2 weren't used, it was still drained.
assert_eq!(Some(0), seq2.len());
sourcefn cross_preamble<'a>(
&'a mut self,
other: &'a mut Seq,
) -> Option<(&'a mut Vec<Literal>, &'a mut Vec<Literal>)>
fn cross_preamble<'a>( &'a mut self, other: &'a mut Seq, ) -> Option<(&'a mut Vec<Literal>, &'a mut Vec<Literal>)>
A helper function the corresponds to the subtle preamble for both
cross_forward
and cross_reverse
. In effect, it handles the cases
of infinite sequences for both self
and other
, as well as ensuring
that literals from other
are drained even if they aren’t used.
sourcepub fn union(&mut self, other: &mut Seq)
pub fn union(&mut self, other: &mut Seq)
Unions the other
sequence into this one.
The literals are always drained out of the given other
sequence,
even if they are being unioned into an infinite sequence. This permits
the caller to reuse the other
sequence in another context.
Some literal deduping may be performed. If any deduping happens, any leftmost-first or “preference” order match semantics will be preserved.
§Example
This example shows basic usage.
use regex_syntax::hir::literal::Seq;
let mut seq1 = Seq::new(&["foo", "bar"]);
let mut seq2 = Seq::new(&["bar", "quux", "foo"]);
seq1.union(&mut seq2);
// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());
// Adjacent literals are deduped, but non-adjacent literals may not be.
assert_eq!(Seq::new(&["foo", "bar", "quux", "foo"]), seq1);
This example shows that literals are drained from other
even when
they aren’t necessarily used.
use regex_syntax::hir::literal::Seq;
let mut seq1 = Seq::infinite();
// Infinite sequences have no finite length.
assert_eq!(None, seq1.len());
let mut seq2 = Seq::new(&["bar", "quux", "foo"]);
seq1.union(&mut seq2);
// seq1 is still infinite and seq2 has been drained.
assert_eq!(None, seq1.len());
assert_eq!(Some(0), seq2.len());
sourcepub fn union_into_empty(&mut self, other: &mut Seq)
pub fn union_into_empty(&mut self, other: &mut Seq)
Unions the other
sequence into this one by splice the other
sequence at the position of the first zero-length literal.
This is useful for preserving preference order semantics when combining
two literal sequences. For example, in the regex (a||f)+foo
, the
correct preference order prefix sequence is [a, foo, f]
.
The literals are always drained out of the given other
sequence,
even if they are being unioned into an infinite sequence. This permits
the caller to reuse the other
sequence in another context. Note that
the literals are drained even if no union is performed as well, i.e.,
when this sequence does not contain a zero-length literal.
Some literal deduping may be performed. If any deduping happens, any leftmost-first or “preference” order match semantics will be preserved.
§Example
This example shows basic usage.
use regex_syntax::hir::literal::Seq;
let mut seq1 = Seq::new(&["a", "", "f", ""]);
let mut seq2 = Seq::new(&["foo"]);
seq1.union_into_empty(&mut seq2);
// The literals are pulled out of seq2.
assert_eq!(Some(0), seq2.len());
// 'foo' gets spliced into seq1 where the first empty string occurs.
assert_eq!(Seq::new(&["a", "foo", "f"]), seq1);
This example shows that literals are drained from other
even when
they aren’t necessarily used.
use regex_syntax::hir::literal::Seq;
let mut seq1 = Seq::new(&["foo", "bar"]);
let mut seq2 = Seq::new(&["bar", "quux", "foo"]);
seq1.union_into_empty(&mut seq2);
// seq1 has no zero length literals, so no splicing happens.
assert_eq!(Seq::new(&["foo", "bar"]), seq1);
// Even though no splicing happens, seq2 is still drained.
assert_eq!(Some(0), seq2.len());
sourcepub fn dedup(&mut self)
pub fn dedup(&mut self)
Deduplicate adjacent equivalent literals in this sequence.
If adjacent literals are equivalent strings but one is exact and the other inexact, the inexact literal is kept and the exact one is removed.
Deduping an infinite sequence is a no-op.
§Example
This example shows how literals that are duplicate byte strings but are not equivalent with respect to exactness are resolved.
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq = Seq::from_iter([
Literal::exact("foo"),
Literal::inexact("foo"),
]);
seq.dedup();
assert_eq!(Seq::from_iter([Literal::inexact("foo")]), seq);
sourcepub fn sort(&mut self)
pub fn sort(&mut self)
Sorts this sequence of literals lexicographically.
Note that if, before sorting, if a literal that is a prefix of another
literal appears after it, then after sorting, the sequence will not
represent the same preference order match semantics. For example,
sorting the sequence [samwise, sam]
yields the sequence [sam, samwise]
. Under preference order semantics, the latter sequence will
never match samwise
where as the first sequence can.
§Example
This example shows basic usage.
use regex_syntax::hir::literal::Seq;
let mut seq = Seq::new(&["foo", "quux", "bar"]);
seq.sort();
assert_eq!(Seq::new(&["bar", "foo", "quux"]), seq);
sourcepub fn reverse_literals(&mut self)
pub fn reverse_literals(&mut self)
Reverses all of the literals in this sequence.
The order of the sequence itself is preserved.
§Example
This example shows basic usage.
use regex_syntax::hir::literal::Seq;
let mut seq = Seq::new(&["oof", "rab"]);
seq.reverse_literals();
assert_eq!(Seq::new(&["foo", "bar"]), seq);
sourcepub fn minimize_by_preference(&mut self)
pub fn minimize_by_preference(&mut self)
Shrinks this seq to its minimal size while respecting the preference order of its literals.
While this routine will remove duplicate literals from this seq, it
will also remove literals that can never match in a leftmost-first or
“preference order” search. Similar to Seq::dedup
, if a literal is
deduped, then the one that remains is made inexact.
This is a no-op on seqs that are empty or not finite.
§Example
This example shows the difference between {sam, samwise}
and
{samwise, sam}
.
use regex_syntax::hir::literal::{Literal, Seq};
// If 'sam' comes before 'samwise' and a preference order search is
// executed, then 'samwise' can never match.
let mut seq = Seq::new(&["sam", "samwise"]);
seq.minimize_by_preference();
assert_eq!(Seq::from_iter([Literal::inexact("sam")]), seq);
// But if they are reversed, then it's possible for 'samwise' to match
// since it is given higher preference.
let mut seq = Seq::new(&["samwise", "sam"]);
seq.minimize_by_preference();
assert_eq!(Seq::new(&["samwise", "sam"]), seq);
This example shows that if an empty string is in this seq, then anything that comes after it can never match.
use regex_syntax::hir::literal::{Literal, Seq};
// An empty string is a prefix of all strings, so it automatically
// inhibits any subsequent strings from matching.
let mut seq = Seq::new(&["foo", "bar", "", "quux", "fox"]);
seq.minimize_by_preference();
let expected = Seq::from_iter([
Literal::exact("foo"),
Literal::exact("bar"),
Literal::inexact(""),
]);
assert_eq!(expected, seq);
// And of course, if it's at the beginning, then it makes it impossible
// for anything else to match.
let mut seq = Seq::new(&["", "foo", "quux", "fox"]);
seq.minimize_by_preference();
assert_eq!(Seq::from_iter([Literal::inexact("")]), seq);
sourcepub fn keep_first_bytes(&mut self, len: usize)
pub fn keep_first_bytes(&mut self, len: usize)
Trims all literals in this seq such that only the first len
bytes
remain. If a literal has less than or equal to len
bytes, then it
remains unchanged. Otherwise, it is trimmed and made inexact.
§Example
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq = Seq::new(&["a", "foo", "quux"]);
seq.keep_first_bytes(2);
let expected = Seq::from_iter([
Literal::exact("a"),
Literal::inexact("fo"),
Literal::inexact("qu"),
]);
assert_eq!(expected, seq);
sourcepub fn keep_last_bytes(&mut self, len: usize)
pub fn keep_last_bytes(&mut self, len: usize)
Trims all literals in this seq such that only the last len
bytes
remain. If a literal has less than or equal to len
bytes, then it
remains unchanged. Otherwise, it is trimmed and made inexact.
§Example
use regex_syntax::hir::literal::{Literal, Seq};
let mut seq = Seq::new(&["a", "foo", "quux"]);
seq.keep_last_bytes(2);
let expected = Seq::from_iter([
Literal::exact("a"),
Literal::inexact("oo"),
Literal::inexact("ux"),
]);
assert_eq!(expected, seq);
sourcepub fn is_finite(&self) -> bool
pub fn is_finite(&self) -> bool
Returns true if this sequence is finite.
When false, this sequence is infinite and must be treated as if it contains every possible literal.
sourcepub fn is_empty(&self) -> bool
pub fn is_empty(&self) -> bool
Returns true if and only if this sequence is finite and empty.
An empty sequence never matches anything. It can only be produced by literal extraction when the corresponding regex itself cannot match.
sourcepub fn len(&self) -> Option<usize>
pub fn len(&self) -> Option<usize>
Returns the number of literals in this sequence if the sequence is
finite. If the sequence is infinite, then None
is returned.
sourcepub fn is_exact(&self) -> bool
pub fn is_exact(&self) -> bool
Returns true if and only if all literals in this sequence are exact.
This returns false if the sequence is infinite.
sourcepub fn is_inexact(&self) -> bool
pub fn is_inexact(&self) -> bool
Returns true if and only if all literals in this sequence are inexact.
This returns true if the sequence is infinite.
sourcepub fn max_union_len(&self, other: &Seq) -> Option<usize>
pub fn max_union_len(&self, other: &Seq) -> Option<usize>
Return the maximum length of the sequence that would result from
unioning self
with other
. If either set is infinite, then this
returns None
.
sourcepub fn max_cross_len(&self, other: &Seq) -> Option<usize>
pub fn max_cross_len(&self, other: &Seq) -> Option<usize>
Return the maximum length of the sequence that would result from the
cross product of self
with other
. If either set is infinite, then
this returns None
.
sourcepub fn min_literal_len(&self) -> Option<usize>
pub fn min_literal_len(&self) -> Option<usize>
Returns the length of the shortest literal in this sequence.
If the sequence is infinite or empty, then this returns None
.
sourcepub fn max_literal_len(&self) -> Option<usize>
pub fn max_literal_len(&self) -> Option<usize>
Returns the length of the longest literal in this sequence.
If the sequence is infinite or empty, then this returns None
.
sourcepub fn longest_common_prefix(&self) -> Option<&[u8]>
pub fn longest_common_prefix(&self) -> Option<&[u8]>
Returns the longest common prefix from this seq.
If the seq matches any literal or other contains no literals, then
there is no meaningful prefix and this returns None
.
§Example
This shows some example seqs and their longest common prefix.
use regex_syntax::hir::literal::Seq;
let seq = Seq::new(&["foo", "foobar", "fo"]);
assert_eq!(Some(&b"fo"[..]), seq.longest_common_prefix());
let seq = Seq::new(&["foo", "foo"]);
assert_eq!(Some(&b"foo"[..]), seq.longest_common_prefix());
let seq = Seq::new(&["foo", "bar"]);
assert_eq!(Some(&b""[..]), seq.longest_common_prefix());
let seq = Seq::new(&[""]);
assert_eq!(Some(&b""[..]), seq.longest_common_prefix());
let seq = Seq::infinite();
assert_eq!(None, seq.longest_common_prefix());
let seq = Seq::empty();
assert_eq!(None, seq.longest_common_prefix());
sourcepub fn longest_common_suffix(&self) -> Option<&[u8]>
pub fn longest_common_suffix(&self) -> Option<&[u8]>
Returns the longest common suffix from this seq.
If the seq matches any literal or other contains no literals, then
there is no meaningful suffix and this returns None
.
§Example
This shows some example seqs and their longest common suffix.
use regex_syntax::hir::literal::Seq;
let seq = Seq::new(&["oof", "raboof", "of"]);
assert_eq!(Some(&b"of"[..]), seq.longest_common_suffix());
let seq = Seq::new(&["foo", "foo"]);
assert_eq!(Some(&b"foo"[..]), seq.longest_common_suffix());
let seq = Seq::new(&["foo", "bar"]);
assert_eq!(Some(&b""[..]), seq.longest_common_suffix());
let seq = Seq::new(&[""]);
assert_eq!(Some(&b""[..]), seq.longest_common_suffix());
let seq = Seq::infinite();
assert_eq!(None, seq.longest_common_suffix());
let seq = Seq::empty();
assert_eq!(None, seq.longest_common_suffix());
sourcepub fn optimize_for_prefix_by_preference(&mut self)
pub fn optimize_for_prefix_by_preference(&mut self)
Optimizes this seq while treating its literals as prefixes and respecting the preference order of its literals.
The specific way “optimization” works is meant to be an implementation detail, as it essentially represents a set of heuristics. The goal that optimization tries to accomplish is to make the literals in this set reflect inputs that will result in a more effective prefilter. Principally by reducing the false positive rate of candidates found by the literals in this sequence. That is, when a match of a literal is found, we would like it to be a strong predictor of the overall match of the regex. If it isn’t, then much time will be spent starting and stopping the prefilter search and attempting to confirm the match only to have it fail.
Some of those heuristics might be:
- Identifying a common prefix from a larger sequence of literals, and shrinking the sequence down to that single common prefix.
- Rejecting the sequence entirely if it is believed to result in very high false positive rate. When this happens, the sequence is made infinite.
- Shrinking the sequence to a smaller number of literals representing prefixes, but not shrinking it so much as to make literals too short. (A sequence with very short literals, of 1 or 2 bytes, will typically result in a higher false positive rate.)
Optimization should only be run once extraction is complete. Namely,
optimization may make assumptions that do not compose with other
operations in the middle of extraction. For example, optimization will
reduce [E(sam), E(samwise)]
to [E(sam)]
, but such a transformation
is only valid if no other extraction will occur. If other extraction
may occur, then the correct transformation would be to [I(sam)]
.
The Seq::optimize_for_suffix_by_preference
does the same thing, but
for suffixes.
§Example
This shows how optimization might transform a sequence. Note that the specific behavior is not a documented guarantee. The heuristics used are an implementation detail and may change over time in semver compatible releases.
use regex_syntax::hir::literal::{Seq, Literal};
let mut seq = Seq::new(&[
"samantha",
"sam",
"samwise",
"frodo",
]);
seq.optimize_for_prefix_by_preference();
assert_eq!(Seq::from_iter([
Literal::exact("samantha"),
// Kept exact even though 'samwise' got pruned
// because optimization assumes literal extraction
// has finished.
Literal::exact("sam"),
Literal::exact("frodo"),
]), seq);
§Example: optimization may make the sequence infinite
If the heuristics deem that the sequence could cause a very high false positive rate, then it may make the sequence infinite, effectively disabling its use as a prefilter.
use regex_syntax::hir::literal::{Seq, Literal};
let mut seq = Seq::new(&[
"samantha",
// An empty string matches at every position,
// thus rendering the prefilter completely
// ineffective.
"",
"sam",
"samwise",
"frodo",
]);
seq.optimize_for_prefix_by_preference();
assert!(!seq.is_finite());
Do note that just because there is a " "
in the sequence, that
doesn’t mean the sequence will always be made infinite after it is
optimized. Namely, if the sequence is considered exact (any match
corresponds to an overall match of the original regex), then any match
is an overall match, and so the false positive rate is always 0
.
To demonstrate this, we remove samwise
from our sequence. This
results in no optimization happening and all literals remain exact.
Thus the entire sequence is exact, and it is kept as-is, even though
one is an ASCII space:
use regex_syntax::hir::literal::{Seq, Literal};
let mut seq = Seq::new(&[
"samantha",
" ",
"sam",
"frodo",
]);
seq.optimize_for_prefix_by_preference();
assert!(seq.is_finite());
sourcepub fn optimize_for_suffix_by_preference(&mut self)
pub fn optimize_for_suffix_by_preference(&mut self)
Optimizes this seq while treating its literals as suffixes and respecting the preference order of its literals.
Optimization should only be run once extraction is complete.
The Seq::optimize_for_prefix_by_preference
does the same thing, but
for prefixes. See its documentation for more explanation.