Struct regex_automata::nfa::thompson::range_trie::Split
source · struct Split {
partitions: [SplitRange; 3],
len: usize,
}
Expand description
Split represents a partitioning of two ranges into one or more ranges. This is the secret sauce that makes a range trie work, as it’s what tells us how to deal with two overlapping but unequal ranges during insertion.
Essentially, either two ranges overlap or they don’t. If they don’t, then handling insertion is easy: just insert the new range into its lexicographically correct position. Since it does not overlap with anything else, no other transitions are impacted by the new range.
If they do overlap though, there are generally three possible cases to handle:
- The part where the two ranges actually overlap. i.e., The intersection.
- The part of the existing range that is not in the new range.
- The part of the new range that is not in the old range.
(1) is guaranteed to always occur since all overlapping ranges have a
non-empty intersection. If the two ranges are not equivalent, then at
least one of (2) or (3) is guaranteed to occur as well. In some cases,
e.g., [0-4]
and [4-9]
, all three cases will occur.
This Split
type is responsible for providing (1), (2) and (3) for any
possible pair of byte ranges.
As for insertion, for the overlap in (1), the remaining ranges to insert should be added by following the corresponding transition. However, this should only be done for the overlapping parts of the range. If there was a part of the existing range that was not in the new range, then that existing part must be split off from the transition and duplicated. The remaining parts of the overlap can then be added to using the new ranges without disturbing the existing range.
Handling the case for the part of a new range that is not in an existing range is seemingly easy. Just treat it as if it were a non-overlapping range. The problem here is that if this new non-overlapping range occurs after both (1) and (2), then it’s possible that it can overlap with the next transition in the current state. If it does, then the whole process must be repeated!
§Details of the 3 cases
The following details the various cases that are implemented in code below. It’s plausible that the number of cases is not actually minimal, but it’s important for this code to remain at least somewhat readable.
Given [a,b] and [x,y], where a <= b, x <= y, b < 256 and y < 256, we define the follow distinct relationships where at least one must apply. The order of these matters, since multiple can match. The first to match applies.
- b < x <=> [a,b] < [x,y]
- y < a <=> [x,y] < [a,b]
In the case of (1) and (2), these are the only cases where there is no overlap. Or otherwise, the intersection of [a,b] and [x,y] is empty. In order to compute the intersection, one can do [max(a,x), min(b,y)]. The intersection in all of the following cases is non-empty.
- a = x && b = y <=> [a,b] == [x,y]
- a = x && b < y <=> [x,y] right-extends [a,b]
- b = y && a > x <=> [x,y] left-extends [a,b]
- x = a && y < b <=> [a,b] right-extends [x,y]
- y = b && x > a <=> [a,b] left-extends [x,y]
- a > x && b < y <=> [x,y] covers [a,b]
- x > a && y < b <=> [a,b] covers [x,y]
- b = x && a < y <=> [a,b] is left-adjacent to [x,y]
- y = a && x < b <=> [x,y] is left-adjacent to [a,b]
- b > x && b < y <=> [a,b] left-overlaps [x,y]
- y > a && y < b <=> [x,y] left-overlaps [a,b]
In cases 3-13, we can form rules that partition the ranges into a non-overlapping ordered sequence of ranges:
- [a,b]
- [a,b], [b+1,y]
- [x,a-1], [a,b]
- [x,y], [y+1,b]
- [a,x-1], [x,y]
- [x,a-1], [a,b], [b+1,y]
- [a,x-1], [x,y], [y+1,b]
- [a,b-1], [b,b], [b+1,y]
- [x,y-1], [y,y], [y+1,b]
- [a,x-1], [x,b], [b+1,y]
- [x,a-1], [a,y], [y+1,b]
In the code below, we go a step further and identify each of the above outputs as belonging either to the overlap of the two ranges or to one of [a,b] or [x,y] exclusively.
Fields§
§partitions: [SplitRange; 3]
§len: usize
Implementations§
source§impl Split
impl Split
sourcefn new(o: Utf8Range, n: Utf8Range) -> Option<Split>
fn new(o: Utf8Range, n: Utf8Range) -> Option<Split>
Create a partitioning of the given ranges.
If the given ranges have an empty intersection, then None is returned.
sourcefn parts1(r1: SplitRange) -> Split
fn parts1(r1: SplitRange) -> Split
Create a new split with a single partition. This only occurs when two ranges are equivalent.
sourcefn parts2(r1: SplitRange, r2: SplitRange) -> Split
fn parts2(r1: SplitRange, r2: SplitRange) -> Split
Create a new split with two partitions.
sourcefn parts3(r1: SplitRange, r2: SplitRange, r3: SplitRange) -> Split
fn parts3(r1: SplitRange, r2: SplitRange, r3: SplitRange) -> Split
Create a new split with three partitions.
sourcefn as_slice(&self) -> &[SplitRange]
fn as_slice(&self) -> &[SplitRange]
Return the partitions in this split as a slice.