Module icu_collator::elements
source · Expand description
This module holds the 64-bit CollationElement
struct used for
the actual comparison, the 32-bit CollationElement32
struct
that’s used for storage. (Strictly speaking, the storage is
RawBytesULE<4>
.) And the CollationElements
iterator adapter
that turns an iterator over char
into an iterator over
CollationElement
. (To match the structure of ICU4C, this isn’t
a real Rust Iterator
. Instead of signaling end by returning
None
, it signals end by returning NO_CE
.)
This module also declares various constants that are also used
by the comparison
module.
Structs§
- Pack a
char
and aCanonicalCombiningClass
in 32 bits (the former in the lower 24 bits and the latter in the high 8 bits). The latter can be initialized to 0xFF upon creation, in which case it can be actually set later by callingset_ccc_from_trie_if_not_already_set
. This is a micro optimization to avoid the Canonical Combining Class trie lookup when there is only one combining character in a sequence. This type is intentionally non-Copy
to get compiler help in making sure that the class is set on the instance on which it is intended to be set and not on a temporary copy. - This struct makes the handling of the
upcoming
buffer easily so that trie lookups are done at most once. However, whenupcoming[0]
is an undecomposed starter, we don’t need the ccc yet, and when lookahead has already done the trie lookups, we don’t needtrie_value
, as it is implied by ccc. - A collation element is a 64-bit value.
- A compressed form of a collation element as stored in the collation data.
- Iterator that transforms an iterator over
char
into an iterator overCollationElement
with a tailoring. Not a real Rust iterator: Instead ofNone
usesNO_CE
to indicate end of iteration to optimize comparison. - The purpose of grouping the non-primary bits into a struct is to allow for a future optimization that specializes code over whether storage for primary weights is needed or not. (I.e. whether to specialize on
CollationElement
orNonPrimary
.)
Enums§
- Tag 🔒Special-CE32 tags, from bits 3..0 of a special 32-bit CE. Bits 31..8 are available for tag-specific data. Bits 5..4: Reserved. May be used in the future to indicate lccc!=0 and tccc!=0.
Constants§
- Marker for starters that decompose to themselves but may combine backwards under canonical composition. (Main trie only; not used in the supplementary trie.)
- Set if at least one contraction suffix contains a starter
- Set if there is no match for the single (no-suffix) character itself. This is only possible if there is a prefix. In this case, discontiguous contraction matching cannot add combining marks starting from an empty suffix. The default CE32 is used anyway if there is no suffix match. Set if the first character of every contraction suffix has lccc!=0.
- Set if any contraction suffix ends with lccc!=0.
- Marker value for U+FDFA in NFKD
- FFFD_CE 🔒
- Marker that a complex decomposition isn’t round-trippable under re-composition.
- NO_CE 🔒
u16
version of the previous marker value.
Functions§
- Extracts a canonical combining class (possibly zero) from a trie value.
- Convert a
u16
obtained from data provider data tochar
. - Convert a
u32
obtained from data provider data tochar
. - Checks if a trie value signifies a character whose decomposition starts with a non-starter.
- Checks if a trie value carries a (non-zero) canonical combining class.
- Checks if the trie signifies a special non-starter decomposition.
- If
opt
isSome
, unwrap it. IfNone
, panic if debug assertions are enabled and returndefault
if debug assertions are not enabled.