Struct icu_casemap::casemapper::CaseMapper

source ·
pub struct CaseMapper {
    pub(crate) data: DataPayload<CaseMapV1Marker>,
}
Expand description

A struct with the ability to convert characters and strings to uppercase or lowercase, or fold them to a normalized form for case-insensitive comparison.

§Examples

use icu::casemap::CaseMapper;
use icu::locid::langid;

let cm = CaseMapper::new();

assert_eq!(
    cm.uppercase_to_string("hello world", &langid!("und")),
    "HELLO WORLD"
);
assert_eq!(
    cm.lowercase_to_string("Γειά σου Κόσμε", &langid!("und")),
    "γειά σου κόσμε"
);

Fields§

§data: DataPayload<CaseMapV1Marker>

Implementations§

source§

impl CaseMapper

source

pub const fn new() -> Self

Creates a CaseMapper using compiled data.

Enabled with the compiled_data Cargo feature.

📚 Help choosing a constructor

§Examples
use icu::casemap::CaseMapper;
use icu::locid::langid;

let cm = CaseMapper::new();

assert_eq!(
    cm.uppercase_to_string("hello world", &langid!("und")),
    "HELLO WORLD"
);
source

pub fn try_new_with_any_provider( provider: &(impl AnyProvider + ?Sized), ) -> Result<Self, DataError>

A version of Self::new that uses custom data provided by an AnyProvider.

📚 Help choosing a constructor

source

pub fn try_new_unstable<P>(provider: &P) -> Result<CaseMapper, DataError>

A version of Self::new that uses custom data provided by a DataProvider.

📚 Help choosing a constructor

⚠️ The bounds on provider may change over time, including in SemVer minor releases.
source

pub fn lowercase<'a>( &'a self, src: &'a str, langid: &LanguageIdentifier, ) -> impl Writeable + 'a

Returns the full lowercase mapping of the given string as a Writeable. This function is context and language sensitive. Callers should pass the text’s language as a LanguageIdentifier (usually the id field of the Locale) if available, or Default::default() for the root locale.

See Self::lowercase_to_string() for the equivalent convenience function that returns a String, as well as for an example.

source

pub fn uppercase<'a>( &'a self, src: &'a str, langid: &LanguageIdentifier, ) -> impl Writeable + 'a

Returns the full uppercase mapping of the given string as a Writeable. This function is context and language sensitive. Callers should pass the text’s language as a LanguageIdentifier (usually the id field of the Locale) if available, or Default::default() for the root locale.

See Self::uppercase_to_string() for the equivalent convenience function that returns a String, as well as for an example.

source

pub fn titlecase_segment_with_only_case_data<'a>( &'a self, src: &'a str, langid: &LanguageIdentifier, options: TitlecaseOptions, ) -> impl Writeable + 'a

Returns the full titlecase mapping of the given string as a Writeable, treating the string as a single segment (and thus only titlecasing the beginning of it). Performs the specified leading adjustment behavior from the options without loading additional data.

This should typically be used as a lower-level helper to construct the titlecasing operation desired by the application, for example one can titlecase on a per-word basis by mixing this with a WordSegmenter.

This function is context and language sensitive. Callers should pass the text’s language as a LanguageIdentifier (usually the id field of the Locale) if available, or Default::default() for the root locale.

This function performs “adjust to cased” leading adjustment behavior when LeadingAdjustment::Auto or LeadingAdjustment::ToCased is set. Auto mode is not able to pick the “adjust to letter/number/symbol” behavior as this type does not load the data to do so, use TitlecaseMapper if such behavior is desired. See the docs of TitlecaseMapper for more information on what this means. There is no difference between the behavior of this function and the equivalent ones on TitlecaseMapper when the head adjustment mode is LeadingAdjustment::None.

See Self::titlecase_segment_with_only_case_data_to_string() for the equivalent convenience function that returns a String, as well as for an example.

source

pub(crate) fn titlecase_segment_with_adjustment<'a>( &'a self, src: &'a str, langid: &LanguageIdentifier, options: TitlecaseOptions, char_is_lead: impl Fn(&CaseMapV1<'_>, char) -> bool, ) -> StringAndWriteable<'_, FullCaseWriteable<'a, true>>

Helper to support different leading adjustment behaviors, char_is_lead is a function that returns true for a character that is allowed to be the first relevant character in a titlecasing string, when leading_adjustment != None

We return a concrete type instead of impl Trait so the return value can be mixed with that of other calls to this function with different closures

source

pub fn fold<'a>(&'a self, src: &'a str) -> impl Writeable + 'a

Case-folds the characters in the given string as a Writeable. This function is locale-independent and context-insensitive.

Can be used to test if two strings are case-insensitively equivalent.

See Self::fold_string() for the equivalent convenience function that returns a String, as well as for an example.

source

pub fn fold_turkic<'a>(&'a self, src: &'a str) -> impl Writeable + 'a

Case-folds the characters in the given string as a Writeable, using Turkic (T) mappings for dotted/dotless I. This function is locale-independent and context-insensitive.

Can be used to test if two strings are case-insensitively equivalent.

See Self::fold_turkic_string() for the equivalent convenience function that returns a String, as well as for an example.

source

pub fn lowercase_to_string( &self, src: &str, langid: &LanguageIdentifier, ) -> String

Returns the full lowercase mapping of the given string as a String.

This function is context and language sensitive. Callers should pass the text’s language as a LanguageIdentifier (usually the id field of the Locale) if available, or Default::default() for the root locale.

See Self::lowercase() for the equivalent lower-level function that returns a Writeable

§Examples
use icu::casemap::CaseMapper;
use icu::locid::langid;

let cm = CaseMapper::new();
let root = langid!("und");

assert_eq!(cm.lowercase_to_string("hEllO WorLd", &root), "hello world");
assert_eq!(cm.lowercase_to_string("Γειά σου Κόσμε", &root), "γειά σου κόσμε");
assert_eq!(cm.lowercase_to_string("नमस्ते दुनिया", &root), "नमस्ते दुनिया");
assert_eq!(cm.lowercase_to_string("Привет мир", &root), "привет мир");

// Some behavior is language-sensitive
assert_eq!(cm.lowercase_to_string("CONSTANTINOPLE", &root), "constantinople");
assert_eq!(cm.lowercase_to_string("CONSTANTINOPLE", &langid!("tr")), "constantınople");
source

pub fn uppercase_to_string( &self, src: &str, langid: &LanguageIdentifier, ) -> String

Returns the full uppercase mapping of the given string as a String.

This function is context and language sensitive. Callers should pass the text’s language as a LanguageIdentifier (usually the id field of the Locale) if available, or Default::default() for the root locale.

See Self::uppercase() for the equivalent lower-level function that returns a Writeable

§Examples
use icu::casemap::CaseMapper;
use icu::locid::langid;

let cm = CaseMapper::new();
let root = langid!("und");

assert_eq!(cm.uppercase_to_string("hEllO WorLd", &root), "HELLO WORLD");
assert_eq!(cm.uppercase_to_string("Γειά σου Κόσμε", &root), "ΓΕΙΆ ΣΟΥ ΚΌΣΜΕ");
assert_eq!(cm.uppercase_to_string("नमस्ते दुनिया", &root), "नमस्ते दुनिया");
assert_eq!(cm.uppercase_to_string("Привет мир", &root), "ПРИВЕТ МИР");

// Some behavior is language-sensitive
assert_eq!(cm.uppercase_to_string("istanbul", &root), "ISTANBUL");
assert_eq!(cm.uppercase_to_string("istanbul", &langid!("tr")), "İSTANBUL"); // Turkish dotted i

assert_eq!(cm.uppercase_to_string("և Երևանի", &root), "ԵՒ ԵՐԵՒԱՆԻ");
assert_eq!(cm.uppercase_to_string("և Երևանի", &langid!("hy")), "ԵՎ ԵՐԵՎԱՆԻ"); // Eastern Armenian ech-yiwn ligature
source

pub fn titlecase_segment_with_only_case_data_to_string( &self, src: &str, langid: &LanguageIdentifier, options: TitlecaseOptions, ) -> String

Returns the full titlecase mapping of the given string as a Writeable, treating the string as a single segment (and thus only titlecasing the beginning of it). Performs the specified leading adjustment behavior from the options without loading additional data.

Note that TitlecaseMapper has better behavior, most users should consider using it instead. This method primarily exists for people who care about the amount of data being loaded.

This should typically be used as a lower-level helper to construct the titlecasing operation desired by the application, for example one can titlecase on a per-word basis by mixing this with a WordSegmenter.

This function is context and language sensitive. Callers should pass the text’s language as a LanguageIdentifier (usually the id field of the Locale) if available, or Default::default() for the root locale.

This function performs “adjust to cased” leading adjustment behavior when LeadingAdjustment::Auto or LeadingAdjustment::ToCased is set. Auto mode is not able to pick the “adjust to letter/number/symbol” behavior as this type does not load the data to do so, use TitlecaseMapper if such behavior is desired. See the docs of TitlecaseMapper for more information on what this means. There is no difference between the behavior of this function and the equivalent ones on TitlecaseMapper when the head adjustment mode is LeadingAdjustment::None.

See Self::titlecase_segment_with_only_case_data() for the equivalent lower-level function that returns a Writeable

§Examples
use icu::casemap::CaseMapper;
use icu::locid::langid;

let cm = CaseMapper::new();
let root = langid!("und");

let default_options = Default::default();

// note that the subsequent words are not titlecased, this function assumes
// that the entire string is a single segment and only titlecases at the beginning.
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("hEllO WorLd", &root, default_options), "Hello world");
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("Γειά σου Κόσμε", &root, default_options), "Γειά σου κόσμε");
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("नमस्ते दुनिया", &root, default_options), "नमस्ते दुनिया");
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("Привет мир", &root, default_options), "Привет мир");

// Some behavior is language-sensitive
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("istanbul", &root, default_options), "Istanbul");
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("istanbul", &langid!("tr"), default_options), "İstanbul"); // Turkish dotted i

assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("և Երևանի", &root, default_options), "Եւ երևանի");
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("և Երևանի", &langid!("hy"), default_options), "Եվ երևանի"); // Eastern Armenian ech-yiwn ligature

assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("ijkdijk", &root, default_options), "Ijkdijk");
assert_eq!(cm.titlecase_segment_with_only_case_data_to_string("ijkdijk", &langid!("nl"), default_options), "IJkdijk"); // Dutch IJ digraph
source

pub fn fold_string(&self, src: &str) -> String

Case-folds the characters in the given string as a String. This function is locale-independent and context-insensitive.

Can be used to test if two strings are case-insensitively equivalent.

See Self::fold() for the equivalent lower-level function that returns a Writeable s s

§Examples
use icu::casemap::CaseMapper;

let cm = CaseMapper::new();

// Check if two strings are equivalent case insensitively
assert_eq!(cm.fold_string("hEllO WorLd"), cm.fold_string("HELLO worlD"));

assert_eq!(cm.fold_string("hEllO WorLd"), "hello world");
assert_eq!(cm.fold_string("Γειά σου Κόσμε"), "γειά σου κόσμε");
assert_eq!(cm.fold_string("नमस्ते दुनिया"), "नमस्ते दुनिया");
assert_eq!(cm.fold_string("Привет мир"), "привет мир");
source

pub fn fold_turkic_string(&self, src: &str) -> String

Case-folds the characters in the given string as a String, using Turkic (T) mappings for dotted/dotless I. This function is locale-independent and context-insensitive.

Can be used to test if two strings are case-insensitively equivalent.

See Self::fold_turkic() for the equivalent lower-level function that returns a Writeable

§Examples
use icu::casemap::CaseMapper;

let cm = CaseMapper::new();

// Check if two strings are equivalent case insensitively
assert_eq!(cm.fold_turkic_string("İstanbul"), cm.fold_turkic_string("iSTANBUL"));

assert_eq!(cm.fold_turkic_string("İstanbul not Constantinople"), "istanbul not constantinople");
assert_eq!(cm.fold_turkic_string("Istanbul not Constantınople"), "ıstanbul not constantınople");

assert_eq!(cm.fold_turkic_string("hEllO WorLd"), "hello world");
assert_eq!(cm.fold_turkic_string("Γειά σου Κόσμε"), "γειά σου κόσμε");
assert_eq!(cm.fold_turkic_string("नमस्ते दुनिया"), "नमस्ते दुनिया");
assert_eq!(cm.fold_turkic_string("Привет мир"), "привет мир");
source

pub fn add_case_closure_to<S: ClosureSink>(&self, c: char, set: &mut S)

Adds all simple case mappings and the full case folding for c to set. Also adds special case closure mappings.

Identical to CaseMapCloser::add_case_closure_to(), see docs there for more information. This method is duplicated so that one does not need to load extra unfold data if they only need this and not also CaseMapCloser::add_string_case_closure_to().

§Examples
use icu::casemap::CaseMapper;
use icu::collections::codepointinvlist::CodePointInversionListBuilder;

let cm = CaseMapper::new();
let mut builder = CodePointInversionListBuilder::new();
cm.add_case_closure_to('s', &mut builder);

let set = builder.build();

assert!(set.contains('S'));
assert!(set.contains('ſ'));
assert!(!set.contains('s')); // does not contain itself
source

pub fn simple_lowercase(&self, c: char) -> char

Returns the lowercase mapping of the given char. This function only implements simple and common mappings. Full mappings, which can map one char to a string, are not included. For full mappings, use CaseMapper::lowercase.

§Examples
use icu::casemap::CaseMapper;

let cm = CaseMapper::new();

assert_eq!(cm.simple_lowercase('C'), 'c');
assert_eq!(cm.simple_lowercase('c'), 'c');
assert_eq!(cm.simple_lowercase('Ć'), 'ć');
assert_eq!(cm.simple_lowercase('Γ'), 'γ');
source

pub fn simple_uppercase(&self, c: char) -> char

Returns the uppercase mapping of the given char. This function only implements simple and common mappings. Full mappings, which can map one char to a string, are not included. For full mappings, use CaseMapper::uppercase.

§Examples
use icu::casemap::CaseMapper;

let cm = CaseMapper::new();

assert_eq!(cm.simple_uppercase('c'), 'C');
assert_eq!(cm.simple_uppercase('C'), 'C');
assert_eq!(cm.simple_uppercase('ć'), 'Ć');
assert_eq!(cm.simple_uppercase('γ'), 'Γ');

assert_eq!(cm.simple_uppercase('dz'), 'DZ');
source

pub fn simple_titlecase(&self, c: char) -> char

Returns the titlecase mapping of the given char. This function only implements simple and common mappings. Full mappings, which can map one char to a string, are not included.

§Examples
use icu::casemap::CaseMapper;

let cm = CaseMapper::new();

assert_eq!(cm.simple_titlecase('dz'), 'Dz');

assert_eq!(cm.simple_titlecase('c'), 'C');
assert_eq!(cm.simple_titlecase('C'), 'C');
assert_eq!(cm.simple_titlecase('ć'), 'Ć');
assert_eq!(cm.simple_titlecase('γ'), 'Γ');
source

pub fn simple_fold(&self, c: char) -> char

Returns the simple case folding of the given char. For full mappings, use CaseMapper::fold.

This function can be used to perform caseless matches on individual characters.

Note: With Unicode 15.0 data, there are three pairs of characters for which equivalence under this function is inconsistent with equivalence of the one-character strings under CaseMapper::fold. This is resolved in Unicode 15.1 and later.

For compatibility applications where simple case folding of strings is required, this function can be applied to each character of a string. Note that the resulting equivalence relation is different from that obtained by CaseMapper::fold: The strings “Straße” and “STRASSE” are distinct under simple case folding, but are equivalent under default (full) case folding.

§Examples
use icu::casemap::CaseMapper;

let cm = CaseMapper::new();

// perform case insensitive checks
assert_eq!(cm.simple_fold('σ'), cm.simple_fold('ς'));
assert_eq!(cm.simple_fold('Σ'), cm.simple_fold('ς'));

assert_eq!(cm.simple_fold('c'), 'c');
assert_eq!(cm.simple_fold('Ć'), 'ć');
assert_eq!(cm.simple_fold('Γ'), 'γ');
assert_eq!(cm.simple_fold('ς'), 'σ');

assert_eq!(cm.simple_fold('ß'), 'ß');
assert_eq!(cm.simple_fold('I'), 'i');
assert_eq!(cm.simple_fold('İ'), 'İ');
assert_eq!(cm.simple_fold('ı'), 'ı');
source

pub fn simple_fold_turkic(&self, c: char) -> char

Returns the simple case folding of the given char, using Turkic (T) mappings for dotted/dotless i. This function does not fold i and I to the same character. Instead, I will fold to ı, and İ will fold to i. Otherwise, this is the same as CaseMapper::fold().

You can use the case folding to perform Turkic caseless matches on characters provided they don’t full-casefold to strings. To avoid that situation, convert to a string and use CaseMapper::fold_turkic.

§Examples
use icu::casemap::CaseMapper;

let cm = CaseMapper::new();

assert_eq!(cm.simple_fold_turkic('I'), 'ı');
assert_eq!(cm.simple_fold_turkic('İ'), 'i');

Trait Implementations§

source§

impl AsRef<CaseMapper> for CaseMapper

source§

fn as_ref(&self) -> &CaseMapper

Converts this type into a shared reference of the (usually inferred) input type.
source§

impl Clone for CaseMapper

source§

fn clone(&self) -> CaseMapper

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for CaseMapper

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
source§

impl Default for CaseMapper

source§

fn default() -> Self

Returns the “default value” for a type. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.
source§

impl<T> ErasedDestructor for T
where T: 'static,

source§

impl<T> MaybeSendSync for T