regex_automata::nfa::thompson::range_trie

Struct Split

source
struct Split {
    partitions: [SplitRange; 3],
    len: usize,
}
Expand description

Split represents a partitioning of two ranges into one or more ranges. This is the secret sauce that makes a range trie work, as it’s what tells us how to deal with two overlapping but unequal ranges during insertion.

Essentially, either two ranges overlap or they don’t. If they don’t, then handling insertion is easy: just insert the new range into its lexicographically correct position. Since it does not overlap with anything else, no other transitions are impacted by the new range.

If they do overlap though, there are generally three possible cases to handle:

  1. The part where the two ranges actually overlap. i.e., The intersection.
  2. The part of the existing range that is not in the new range.
  3. The part of the new range that is not in the old range.

(1) is guaranteed to always occur since all overlapping ranges have a non-empty intersection. If the two ranges are not equivalent, then at least one of (2) or (3) is guaranteed to occur as well. In some cases, e.g., [0-4] and [4-9], all three cases will occur.

This Split type is responsible for providing (1), (2) and (3) for any possible pair of byte ranges.

As for insertion, for the overlap in (1), the remaining ranges to insert should be added by following the corresponding transition. However, this should only be done for the overlapping parts of the range. If there was a part of the existing range that was not in the new range, then that existing part must be split off from the transition and duplicated. The remaining parts of the overlap can then be added to using the new ranges without disturbing the existing range.

Handling the case for the part of a new range that is not in an existing range is seemingly easy. Just treat it as if it were a non-overlapping range. The problem here is that if this new non-overlapping range occurs after both (1) and (2), then it’s possible that it can overlap with the next transition in the current state. If it does, then the whole process must be repeated!

§Details of the 3 cases

The following details the various cases that are implemented in code below. It’s plausible that the number of cases is not actually minimal, but it’s important for this code to remain at least somewhat readable.

Given [a,b] and [x,y], where a <= b, x <= y, b < 256 and y < 256, we define the follow distinct relationships where at least one must apply. The order of these matters, since multiple can match. The first to match applies.

  1. b < x <=> [a,b] < [x,y]
  2. y < a <=> [x,y] < [a,b]

In the case of (1) and (2), these are the only cases where there is no overlap. Or otherwise, the intersection of [a,b] and [x,y] is empty. In order to compute the intersection, one can do [max(a,x), min(b,y)]. The intersection in all of the following cases is non-empty.

  1. a = x && b = y <=> [a,b] == [x,y]
  2. a = x && b < y <=> [x,y] right-extends [a,b]
  3. b = y && a > x <=> [x,y] left-extends [a,b]
  4. x = a && y < b <=> [a,b] right-extends [x,y]
  5. y = b && x > a <=> [a,b] left-extends [x,y]
  6. a > x && b < y <=> [x,y] covers [a,b]
  7. x > a && y < b <=> [a,b] covers [x,y]
  8. b = x && a < y <=> [a,b] is left-adjacent to [x,y]
  9. y = a && x < b <=> [x,y] is left-adjacent to [a,b]
  10. b > x && b < y <=> [a,b] left-overlaps [x,y]
  11. y > a && y < b <=> [x,y] left-overlaps [a,b]

In cases 3-13, we can form rules that partition the ranges into a non-overlapping ordered sequence of ranges:

  1. [a,b]
  2. [a,b], [b+1,y]
  3. [x,a-1], [a,b]
  4. [x,y], [y+1,b]
  5. [a,x-1], [x,y]
  6. [x,a-1], [a,b], [b+1,y]
  7. [a,x-1], [x,y], [y+1,b]
  8. [a,b-1], [b,b], [b+1,y]
  9. [x,y-1], [y,y], [y+1,b]
  10. [a,x-1], [x,b], [b+1,y]
  11. [x,a-1], [a,y], [y+1,b]

In the code below, we go a step further and identify each of the above outputs as belonging either to the overlap of the two ranges or to one of [a,b] or [x,y] exclusively.

Fields§

§partitions: [SplitRange; 3]§len: usize

Implementations§

source§

impl Split

source

fn new(o: Utf8Range, n: Utf8Range) -> Option<Split>

Create a partitioning of the given ranges.

If the given ranges have an empty intersection, then None is returned.

source

fn parts1(r1: SplitRange) -> Split

Create a new split with a single partition. This only occurs when two ranges are equivalent.

source

fn parts2(r1: SplitRange, r2: SplitRange) -> Split

Create a new split with two partitions.

source

fn parts3(r1: SplitRange, r2: SplitRange, r3: SplitRange) -> Split

Create a new split with three partitions.

source

fn as_slice(&self) -> &[SplitRange]

Return the partitions in this split as a slice.

Trait Implementations§

source§

impl Clone for Split

source§

fn clone(&self) -> Split

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Split

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl PartialEq for Split

source§

fn eq(&self, other: &Split) -> bool

Tests for self and other values to be equal, and is used by ==.
1.0.0 · source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.
source§

impl Eq for Split

source§

impl StructuralPartialEq for Split

Auto Trait Implementations§

§

impl Freeze for Split

§

impl RefUnwindSafe for Split

§

impl Send for Split

§

impl Sync for Split

§

impl Unpin for Split

§

impl UnwindSafe for Split

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

source§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.