Enum icu_collator::elements::Tag
source · #[repr(u8)]pub(crate) enum Tag {
Show 16 variants
Fallback = 0,
LongPrimary = 1,
LongSecondary = 2,
Reserved3 = 3,
LatinExpansion = 4,
Expansion32 = 5,
Expansion = 6,
BuilderData = 7,
Prefix = 8,
Contraction = 9,
Digit = 10,
U0000 = 11,
Hangul = 12,
LeadSurrogate = 13,
Offset = 14,
Implicit = 15,
}
Expand description
Special-CE32 tags, from bits 3..0 of a special 32-bit CE. Bits 31..8 are available for tag-specific data. Bits 5..4: Reserved. May be used in the future to indicate lccc!=0 and tccc!=0.
Variants§
Fallback = 0
Fall back to the base collator. This is the tag value in SPECIAL_CE32_LOW_BYTE and FALLBACK_CE32. Bits 31..8: Unused, 0.
LongPrimary = 1
Long-primary CE with COMMON_SEC_AND_TER_CE. Bits 31..8: Three-byte primary.
LongSecondary = 2
Long-secondary CE with zero primary. Bits 31..16: Secondary weight. Bits 15.. 8: Tertiary weight.
Reserved3 = 3
Unused. May be used in the future for single-byte secondary CEs (SHORT_SECONDARY_TAG), storing the secondary in bits 31..24, the ccc in bits 23..16, and the tertiary in bits 15..8.
LatinExpansion = 4
Latin mini expansions of two simple CEs [pp, 05, tt] [00, ss, 05]. Bits 31..24: Single-byte primary weight pp of the first CE. Bits 23..16: Tertiary weight tt of the first CE. Bits 15.. 8: Secondary weight ss of the second CE. Unused by ICU4X, may get repurposed for jamo expansions is Korean search.
Expansion32 = 5
Points to one or more simple/long-primary/long-secondary 32-bit CE32s. Bits 31..13: Index into uint32_t table. Bits 12.. 8: Length=1..31.
Expansion = 6
Points to one or more 64-bit CEs. Bits 31..13: Index into CE table. Bits 12.. 8: Length=1..31.
BuilderData = 7
Builder data, used only in the CollationDataBuilder, not in runtime data.
If bit 8 is 0: Builder context, points to a list of context-sensitive mappings. Bits 31..13: Index to the builder’s list of ConditionalCE32 for this character. Bits 12.. 9: Unused, 0.
If bit 8 is 1 (IS_BUILDER_JAMO_CE32): Builder-only jamoCE32 value. The builder fetches the Jamo CE32 from the trie. Bits 31..13: Jamo code point. Bits 12.. 9: Unused, 0.
Prefix = 8
Points to prefix trie. Bits 31..13: Index into prefix/contraction data. Bits 12.. 8: Unused, 0.
Contraction = 9
Points to contraction data. Bits 31..13: Index into prefix/contraction data. Bits 12..11: Unused, 0. Bit 10: CONTRACT_TRAILING_CCC flag. Bit 9: CONTRACT_NEXT_CCC flag. Bit 8: CONTRACT_SINGLE_CP_NO_MATCH flag.
Digit = 10
Decimal digit. Bits 31..13: Index into uint32_t table for non-numeric-collation CE32. Bit 12: Unused, 0. Bits 11.. 8: Digit value 0..9.
U0000 = 11
Tag for U+0000, for moving the NUL-termination handling from the regular fastpath into specials-handling code. Bits 31..8: Unused, 0. Not used by ICU4X.
Hangul = 12
Tag for a Hangul syllable. Bits 31..9: Unused, 0. Bit 8: HANGUL_NO_SPECIAL_JAMO flag. Not used by ICU4X, may get reused for compressing Hanja expansions.
LeadSurrogate = 13
Tag for a lead surrogate code unit. Optional optimization for UTF-16 string processing. Bits 31..10: Unused, 0. 9.. 8: =0: All associated supplementary code points are unassigned-implicit. =1: All associated supplementary code points fall back to the base data. else: (Normally 2) Look up the data for the supplementary code point. Not used by ICU4X.
Offset = 14
Tag for CEs with primary weights in code point order. Bits 31..13: Index into CE table, for one data “CE”. Bits 12.. 8: Unused, 0.
This data “CE” has the following bit fields: Bits 63..32: Three-byte primary pppppp00. 31.. 8: Start/base code point of the in-order range. 7: Flag isCompressible primary. 6.. 0: Per-code point primary-weight increment.
Implicit = 15
Implicit CE tag. Compute an unassigned-implicit CE. All bits are set (UNASSIGNED_CE32=0xffffffff).