Struct icu_collator::elements::CollationElements
source · pub(crate) struct CollationElements<'data, I>{Show 15 fields
iter: I,
pending: SmallVec<[CollationElement; 6]>,
pending_pos: usize,
prefix: [char; 2],
upcoming: SmallVec<[CharacterAndClassAndTrieValue; 10]>,
root: &'data CollationDataV1<'data>,
tailoring: &'data CollationDataV1<'data>,
jamo: &'data [<u32 as AsULE>::ULE; 256],
diacritics: &'data ZeroSlice<u16>,
trie: &'data CodePointTrie<'data, u32>,
scalars16: &'data ZeroSlice<u16>,
scalars32: &'data ZeroSlice<char>,
numeric_primary: Option<u8>,
lithuanian_dot_above: bool,
iter_exhausted: bool,
}
Expand description
Iterator that transforms an iterator over char
into an iterator
over CollationElement
with a tailoring.
Not a real Rust iterator: Instead of None
uses NO_CE
to indicate
end of iteration to optimize comparison.
Fields§
§iter: I
§pending: SmallVec<[CollationElement; 6]>
Already computed but not yet returned CollationElement
s.
pending_pos: usize
The index of the next item to be returned from pending
. The purpose
of this index is to avoid moving the rest of the items.
prefix: [char; 2]
The characters most previously seen (or never-matching placeholders) CLDR, as of 40, has two kinds of prefixes: Prefixes that contain a single starter Prefixes that contain a starter followed by either U+3099 or U+309A Last-pushed is at index 0 and previously-pushed at index 1
upcoming: SmallVec<[CharacterAndClassAndTrieValue; 10]>
upcoming
holds the characters that have already been read from
iter
but haven’t yet been mapped to CollationElement
s.
Typically, upcoming
holds one character and corresponds semantically
to pending_unnormalized_starter
in icu::normalizer::Decomposition
.
This is why there isn’t a move avoidance optimization similar to
pending_pos
above for this buffer. A complex decomposition, a
Hangul syllable followed by a non-starter, or lookahead can cause
pending
to hold more than one char
.
Invariant: upcoming
is allowed to become empty only after iter
has been exhausted.
Invariant: (Checked by debug_assert!
) At the start of next()
call,
if upcoming
isn’t empty (with iter
having been exhausted), the
first char
in upcoming
must have its decomposition start with a
starter.
root: &'data CollationDataV1<'data>
The root collation data.
tailoring: &'data CollationDataV1<'data>
Tailoring if applicable.
jamo: &'data [<u32 as AsULE>::ULE; 256]
The CollationElement32
mapping for the Hangul Jamo block.
Note: in ICU4C the jamo table contains only modern jamo. Here, the jamo table contains the whole Unicode block.
diacritics: &'data ZeroSlice<u16>
The CollationElement32
mapping for the Combining Diacritical Marks block.
trie: &'data CodePointTrie<'data, u32>
NFD main trie.
scalars16: &'data ZeroSlice<u16>
NFD complex decompositions on the BMP
scalars32: &'data ZeroSlice<char>
NFD complex decompositions on supplementary planes
numeric_primary: Option<u8>
If numeric mode is enabled, the 8 high bits of the numeric primary.
None
if disabled.
lithuanian_dot_above: bool
Whether the Lithuanian combining dot above handling is enabled.
iter_exhausted: bool
Whether iter
has been exhausted
Implementations§
source§impl<'data, I> CollationElements<'data, I>
impl<'data, I> CollationElements<'data, I>
pub fn new( delegate: I, root: &'data CollationDataV1<'_>, tailoring: &'data CollationDataV1<'_>, jamo: &'data [<u32 as AsULE>::ULE; 256], diacritics: &'data ZeroSlice<u16>, decompositions: &'data DecompositionDataV1<'_>, tables: &'data DecompositionTablesV1<'_>, numeric_primary: Option<u8>, lithuanian_dot_above: bool, ) -> Self
fn iter_next(&mut self) -> Option<CharacterAndClassAndTrieValue>
fn next_internal(&mut self) -> Option<CharacterAndClassAndTrieValue>
fn maybe_gather_combining(&mut self)
fn push_decomposed_combining( &mut self, c: CharacterAndClassAndTrieValue, ) -> usize
fn push_decomposed_and_gather_combining( &mut self, c: CharacterAndClassAndTrieValue, )
fn look_ahead(&mut self, pos: usize) -> Option<CharacterAndClassAndTrieValue>
fn is_next_decomposition_starts_with_starter(&self) -> bool
fn prepend_and_sort_non_starter_prefix_of_suffix( &mut self, c: CharacterAndClassAndTrieValue, )
fn prefix_push(&mut self, c: char)
sourcefn mark_prefix_unmatchable(&mut self)
fn mark_prefix_unmatchable(&mut self)
Micro optimization for doing a simpler write when we know the most recent character was a non-starter that is not a kana voicing mark.