regex_automata::nfa::thompson::range_trie

Struct Split

Help

struct Split {
    partitions: [SplitRange; 3],
    len: usize,
}

Expand description

Split represents a partitioning of two ranges into one or more ranges. This is the secret sauce that makes a range trie work, as it’s what tells us how to deal with two overlapping but unequal ranges during insertion.

Essentially, either two ranges overlap or they don’t. If they don’t, then handling insertion is easy: just insert the new range into its lexicographically correct position. Since it does not overlap with anything else, no other transitions are impacted by the new range.

If they do overlap though, there are generally three possible cases to handle:

The part where the two ranges actually overlap. i.e., The intersection.
The part of the existing range that is not in the new range.
The part of the new range that is not in the old range.

(1) is guaranteed to always occur since all overlapping ranges have a non-empty intersection. If the two ranges are not equivalent, then at least one of (2) or (3) is guaranteed to occur as well. In some cases, e.g., [0-4] and [4-9], all three cases will occur.

This Split type is responsible for providing (1), (2) and (3) for any possible pair of byte ranges.

As for insertion, for the overlap in (1), the remaining ranges to insert should be added by following the corresponding transition. However, this should only be done for the overlapping parts of the range. If there was a part of the existing range that was not in the new range, then that existing part must be split off from the transition and duplicated. The remaining parts of the overlap can then be added to using the new ranges without disturbing the existing range.

Handling the case for the part of a new range that is not in an existing range is seemingly easy. Just treat it as if it were a non-overlapping range. The problem here is that if this new non-overlapping range occurs after both (1) and (2), then it’s possible that it can overlap with the next transition in the current state. If it does, then the whole process must be repeated!

§Details of the 3 cases

The following details the various cases that are implemented in code below. It’s plausible that the number of cases is not actually minimal, but it’s important for this code to remain at least somewhat readable.

Given [a,b] and [x,y], where a <= b, x <= y, b < 256 and y < 256, we define the follow distinct relationships where at least one must apply. The order of these matters, since multiple can match. The first to match applies.

b < x <=> [a,b] < [x,y]
y < a <=> [x,y] < [a,b]

In the case of (1) and (2), these are the only cases where there is no overlap. Or otherwise, the intersection of [a,b] and [x,y] is empty. In order to compute the intersection, one can do [max(a,x), min(b,y)]. The intersection in all of the following cases is non-empty.

a = x && b = y <=> [a,b] == [x,y]
a = x && b < y <=> [x,y] right-extends [a,b]
b = y && a > x <=> [x,y] left-extends [a,b]
x = a && y [a,b] right-extends [x,y]
y = b && x > a <=> [a,b] left-extends [x,y]
a > x && b < y <=> [x,y] covers [a,b]
x > a && y [a,b] covers [x,y]
b = x && a < y <=> [a,b] is left-adjacent to [x,y]
y = a && x [x,y] is left-adjacent to [a,b]
b > x && b < y <=> [a,b] left-overlaps [x,y]
y > a && y [x,y] left-overlaps [a,b]

In cases 3-13, we can form rules that partition the ranges into a non-overlapping ordered sequence of ranges:

[a,b]
[a,b], [b+1,y]
[x,a-1], [a,b]
[x,y], [y+1,b]
[a,x-1], [x,y]
[x,a-1], [a,b], [b+1,y]
[a,x-1], [x,y], [y+1,b]
[a,b-1], [b,b], [b+1,y]
[x,y-1], [y,y], [y+1,b]
[a,x-1], [x,b], [b+1,y]
[x,a-1], [a,y], [y+1,b]

In the code below, we go a step further and identify each of the above outputs as belonging either to the overlap of the two ranges or to one of [a,b] or [x,y] exclusively.

Fields§

§partitions: [SplitRange; 3]§len: usize

Struct SplitCopy item path

§Details of the 3 cases

Fields§

Implementations§

impl Split

fn new(o: Utf8Range, n: Utf8Range) -> Option<Split>

fn parts1(r1: SplitRange) -> Split

fn parts2(r1: SplitRange, r2: SplitRange) -> Split

fn parts3(r1: SplitRange, r2: SplitRange, r3: SplitRange) -> Split

fn as_slice(&self) -> &[SplitRange]

Trait Implementations§

impl Clone for Split

fn clone(&self) -> Split

fn clone_from(&mut self, source: &Self)

impl Debug for Split

fn fmt(&self, f: &mut Formatter<'_>) -> Result

impl PartialEq for Split

fn eq(&self, other: &Split) -> bool

fn ne(&self, other: &Rhs) -> bool

impl Eq for Split

impl StructuralPartialEq for Split

Auto Trait Implementations§

impl Freeze for Split

impl RefUnwindSafe for Split

impl Send for Split

impl Sync for Split

impl Unpin for Split

impl UnwindSafe for Split

Blanket Implementations§

impl<T> Any for Twhere T: 'static + ?Sized,

fn type_id(&self) -> TypeId

impl<T> Borrow<T> for Twhere T: ?Sized,

fn borrow(&self) -> &T

impl<T> BorrowMut<T> for Twhere T: ?Sized,

fn borrow_mut(&mut self) -> &mut T

impl<T> CloneToUninit for Twhere T: Clone,

unsafe fn clone_to_uninit(&self, dst: *mut u8)

impl<T> From<T> for T

fn from(t: T) -> T

impl<T, U> Into<U> for Twhere U: From<T>,

fn into(self) -> U

impl<T> ToOwned for Twhere T: Clone,

type Owned = T

fn to_owned(&self) -> T

fn clone_into(&self, target: &mut T)

impl<T, U> TryFrom<U> for Twhere U: Into<T>,

type Error = Infallible

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

impl<T, U> TryInto<U> for Twhere U: TryFrom<T>,

type Error = <U as TryFrom<T>>::Error

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Struct Split

impl<T> Any for T
where T: 'static + ?Sized,

impl<T> Borrow<T> for T
where T: ?Sized,

impl<T> BorrowMut<T> for T
where T: ?Sized,

impl<T> CloneToUninit for T
where T: Clone,

impl<T, U> Into<U> for T
where U: From<T>,

impl<T> ToOwned for T
where T: Clone,

impl<T, U> TryFrom<U> for T
where U: Into<T>,

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,