Struct LanguageIdentifier

Source

pub struct LanguageIdentifier {
    pub language: Language,
    pub script: Option<Script>,
    pub region: Option<Region>,
    pub variants: Variants,
}

Expand description

A core struct representing a Unicode BCP47 Language Identifier.

§Examples

use icu::locid::{
    langid,
    subtags::{language, region},
};

let li = langid!("en-US");

assert_eq!(li.language, language!("en"));
assert_eq!(li.script, None);
assert_eq!(li.region, Some(region!("US")));
assert_eq!(li.variants.len(), 0);

§Parsing

Unicode recognizes three levels of standard conformance for any language identifier:

well-formed - syntactically correct
valid - well-formed and only uses registered language, region, script and variant subtags…
canonical - valid and no deprecated codes or structure.

At the moment parsing normalizes a well-formed language identifier converting _ separators to - and adjusting casing to conform to the Unicode standard.

Any bogus subtags will cause the parsing to fail with an error. No subtag validation is performed.

§Examples

use icu::locid::{
    langid,
    subtags::{language, region, script, variant},
};

let li = langid!("eN_latn_Us-Valencia");

assert_eq!(li.language, language!("en"));
assert_eq!(li.script, Some(script!("Latn")));
assert_eq!(li.region, Some(region!("US")));
assert_eq!(li.variants.get(0), Some(&variant!("valencia")));

Fields§

§language: Language

Language subtag of the language identifier.

§script: Option<Script>

Script subtag of the language identifier.

§region: Option<Region>

Region subtag of the language identifier.

§variants: Variants

Variant subtags of the language identifier.

Implementations§

Source §

impl LanguageIdentifier

Source

pub const UND: Self

The default undefined language “und”. Same as default().

§Examples

use icu::locid::LanguageIdentifier;

assert_eq!(LanguageIdentifier::default(), LanguageIdentifier::UND);

Source

pub fn try_from_bytes(v: &[u8]) -> Result<Self, ParserError>

A constructor which takes a utf8 slice, parses it and produces a well-formed LanguageIdentifier.

§Examples

use icu::locid::LanguageIdentifier;

LanguageIdentifier::try_from_bytes(b"en-US").expect("Parsing failed");

Source

pub fn try_from_locale_bytes(v: &[u8]) -> Result<Self, ParserError>

A constructor which takes a utf8 slice which may contain extension keys, parses it and produces a well-formed LanguageIdentifier.

§Examples

use icu::locid::{langid, LanguageIdentifier};

let li = LanguageIdentifier::try_from_locale_bytes(b"en-US-x-posix")
    .expect("Parsing failed.");

assert_eq!(li, langid!("en-US"));

This method should be used for input that may be a locale identifier. All extensions will be lost.

Source

pub fn canonicalize<S: AsRef<[u8]>>(input: S) -> Result<String, ParserError>

This is a best-effort operation that performs all available levels of canonicalization.

At the moment the operation will normalize casing and the separator, but in the future it may also validate and update from deprecated subtags to canonical ones.

§Examples

use icu::locid::LanguageIdentifier;

assert_eq!(
    LanguageIdentifier::canonicalize("pL_latn_pl").as_deref(),
    Ok("pl-Latn-PL")
);

Source

pub fn strict_cmp(&self, other: &[u8]) -> Ordering

Compare this LanguageIdentifier with BCP-47 bytes.

The return value is equivalent to what would happen if you first converted this LanguageIdentifier to a BCP-47 string and then performed a byte comparison.

This function is case-sensitive and results in a total order, so it is appropriate for binary search. The only argument producing Ordering::Equal is self.to_string().

§Examples

use icu::locid::LanguageIdentifier;
use std::cmp::Ordering;

let bcp47_strings: &[&str] = &[
    "pl-Latn-PL",
    "und",
    "und-Adlm",
    "und-GB",
    "und-ZA",
    "und-fonipa",
    "zh",
];

for ab in bcp47_strings.windows(2) {
    let a = ab[0];
    let b = ab[1];
    assert!(a.cmp(b) == Ordering::Less);
    let a_langid = a.parse::<LanguageIdentifier>().unwrap();
    assert!(a_langid.strict_cmp(a.as_bytes()) == Ordering::Equal);
    assert!(a_langid.strict_cmp(b.as_bytes()) == Ordering::Less);
}

Source

pub(crate) fn as_tuple( &self, ) -> (Language, Option<Script>, Option<Region>, &Variants)

Source

pub fn total_cmp(&self, other: &Self) -> Ordering

Compare this LanguageIdentifier with another LanguageIdentifier field-by-field. The result is a total ordering sufficient for use in a BTreeMap.

Unlike Self::strict_cmp, this function’s ordering may not equal string ordering.

Source

pub fn strict_cmp_iter<'l, I>(&self, subtags: I) -> SubtagOrderingResult<I>
where I: Iterator<Item = &'l [u8]>,

👎Deprecated since 1.5.0: if you need this, please file an issue

Compare this LanguageIdentifier with an iterator of BCP-47 subtags.

This function has the same equality semantics as LanguageIdentifier::strict_cmp. It is intended as a more modular version that allows multiple subtag iterators to be chained together.

For an additional example, see SubtagOrderingResult.

§Examples

use icu::locid::LanguageIdentifier;
use std::cmp::Ordering;

let subtags: &[&[u8]] = &[b"ca", b"ES", b"valencia"];

let loc = "ca-ES-valencia".parse::<LanguageIdentifier>().unwrap();
assert_eq!(
    Ordering::Equal,
    loc.strict_cmp_iter(subtags.iter().copied()).end()
);

let loc = "ca-ES".parse::<LanguageIdentifier>().unwrap();
assert_eq!(
    Ordering::Less,
    loc.strict_cmp_iter(subtags.iter().copied()).end()
);

let loc = "ca-ZA".parse::<LanguageIdentifier>().unwrap();
assert_eq!(
    Ordering::Greater,
    loc.strict_cmp_iter(subtags.iter().copied()).end()
);

Source

pub fn normalizing_eq(&self, other: &str) -> bool

Compare this LanguageIdentifier with a potentially unnormalized BCP-47 string.

The return value is equivalent to what would happen if you first parsed the BCP-47 string to a LanguageIdentifier and then performed a structural comparison.

§Examples

use icu::locid::LanguageIdentifier;

let bcp47_strings: &[&str] = &[
    "pl-LaTn-pL",
    "uNd",
    "UnD-adlm",
    "uNd-GB",
    "UND-FONIPA",
    "ZH",
];

for a in bcp47_strings {
    assert!(a.parse::<LanguageIdentifier>().unwrap().normalizing_eq(a));
}

Source

pub(crate) fn for_each_subtag_str<E, F>(&self, f: &mut F) -> Result<(), E>
where F: FnMut(&str) -> Result<(), E>,

Source

pub(crate) fn for_each_subtag_str_lowercased<E, F>( &self, f: &mut F, ) -> Result<(), E>
where F: FnMut(&str) -> Result<(), E>,

Executes f on each subtag string of this LanguageIdentifier, with every string in lowercase ascii form.

The default canonicalization of language identifiers uses titlecase scripts and uppercase regions. However, this differs from RFC6497 (BCP 47 Extension T), which specifies:

The canonical form for all subtags in the extension is lowercase, with the fields ordered by the separators, alphabetically.

Hence, this method is used inside Transform Extensions to be able to get the correct canonicalization of the language identifier.

As an example, the canonical form of locale EN-LATN-CA-T-EN-LATN-CA is en-Latn-CA-t-en-latn-ca, with the script and region parts lowercased inside T extensions, but titlecased and uppercased outside T extensions respectively.

Source

pub(crate) fn write_lowercased_to<W: Write + ?Sized>( &self, sink: &mut W, ) -> Result

Writes this LanguageIdentifier to a sink, replacing uppercase ascii chars with lowercase ascii chars.

The default canonicalization of language identifiers uses titlecase scripts and uppercase regions. However, this differs from RFC6497 (BCP 47 Extension T), which specifies:

The canonical form for all subtags in the extension is lowercase, with the fields ordered by the separators, alphabetically.

Hence, this method is used inside Transform Extensions to be able to get the correct canonicalization of the language identifier.

As an example, the canonical form of locale EN-LATN-CA-T-EN-LATN-CA is en-Latn-CA-t-en-latn-ca, with the script and region parts lowercased inside T extensions, but titlecased and uppercased outside T extensions respectively.

Trait Implementations§

Source §

impl AsMut<LanguageIdentifier> for LanguageIdentifier

Source §

fn as_mut(&mut self) -> &mut Self

Converts this type into a mutable reference of the (usually inferred) input type.

Source §

impl AsMut<LanguageIdentifier> for Locale

Source §

fn as_mut(&mut self) -> &mut LanguageIdentifier

Converts this type into a mutable reference of the (usually inferred) input type.

Source §

impl AsRef<LanguageIdentifier> for LanguageIdentifier

Source §

fn as_ref(&self) -> &Self

Converts this type into a shared reference of the (usually inferred) input type.

Source §

impl AsRef<LanguageIdentifier> for Locale

Source §

fn as_ref(&self) -> &LanguageIdentifier

Converts this type into a shared reference of the (usually inferred) input type.

Source §

impl Clone for LanguageIdentifier

Source §

fn clone(&self) -> LanguageIdentifier

Returns a duplicate of the value. Read more

1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more

Source §

impl Debug for LanguageIdentifier

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl Default for LanguageIdentifier

Source §

fn default() -> LanguageIdentifier

Returns the “default value” for a type. Read more

Source §

impl Display for LanguageIdentifier

This trait is implemented for compatibility with fmt!. To create a string, Writeable::write_to_string is usually more efficient.

Source §

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Source §

impl From<&LanguageIdentifier> for (Language, Option<Script>, Option<Region>)

Convert from a LanguageIdentifier to an LSR tuple.

§Examples

use icu::locid::{
    langid,
    subtags::{language, region, script},
};

let lid = langid!("en-Latn-US");
let (lang, script, region) = (&lid).into();

assert_eq!(lang, language!("en"));
assert_eq!(script, Some(script!("Latn")));
assert_eq!(region, Some(region!("US")));

Source §

fn from(langid: &LanguageIdentifier) -> Self

Converts to this type from the input type.

Source §

impl From<(Language, Option<Script>, Option<Region>)> for LanguageIdentifier

Convert from an LSR tuple to a LanguageIdentifier.

§Examples

use icu::locid::{
    langid,
    subtags::{language, region, script},
    LanguageIdentifier,
};

let lang = language!("en");
let script = script!("Latn");
let region = region!("US");
assert_eq!(
    LanguageIdentifier::from((lang, Some(script), Some(region))),
    langid!("en-Latn-US")
);

Source §

fn from(lsr: (Language, Option<Script>, Option<Region>)) -> Self

Converts to this type from the input type.

Source §

impl From<Language> for LanguageIdentifier

§Examples

use icu::locid::{langid, subtags::language, LanguageIdentifier};

assert_eq!(LanguageIdentifier::from(language!("en")), langid!("en"));

Source §

fn from(language: Language) -> Self

Converts to this type from the input type.

Source §

impl From<LanguageIdentifier> for Locale

Source §

fn from(id: LanguageIdentifier) -> Self

Converts to this type from the input type.

Source §

impl From<Locale> for LanguageIdentifier

Source §

fn from(loc: Locale) -> Self

Converts to this type from the input type.

Source §

impl From<Option<Region>> for LanguageIdentifier

§Examples

use icu::locid::{langid, subtags::region, LanguageIdentifier};

assert_eq!(
    LanguageIdentifier::from(Some(region!("US"))),
    langid!("und-US")
);

Source §

fn from(region: Option<Region>) -> Self

Converts to this type from the input type.

Source §

impl From<Option<Script>> for LanguageIdentifier

§Examples

use icu::locid::{langid, subtags::script, LanguageIdentifier};

assert_eq!(
    LanguageIdentifier::from(Some(script!("latn"))),
    langid!("und-Latn")
);

Source §

fn from(script: Option<Script>) -> Self

Converts to this type from the input type.

Source §

impl FromStr for LanguageIdentifier

Source §

type Err = ParserError

The associated error which can be returned from parsing.

Source §

fn from_str(source: &str) -> Result<Self, Self::Err>

Parses a string s to return a value of this type. Read more

Source §

impl Hash for LanguageIdentifier

Source §

fn hash<H: Hasher>(&self, state: &mut H)

Feeds this value into the given Hasher. Read more

1.3.0 · Source§

fn hash_slice<H>(data: &[Self], state: &mut H)
where H: Hasher, Self: Sized,

Feeds a slice of this type into the given Hasher. Read more

Source §

impl PartialEq for LanguageIdentifier

Source §

fn eq(&self, other: &LanguageIdentifier) -> bool

Tests for self and other values to be equal, and is used by ==.

1.0.0 · Source§

fn ne(&self, other: &Rhs) -> bool

Tests for !=. The default implementation is almost always sufficient, and should not be overridden without very good reason.

Source §

impl Writeable for LanguageIdentifier

Source §

fn write_to<W: Write + ?Sized>(&self, sink: &mut W) -> Result

Writes a string to the given sink. Errors from the sink are bubbled up. The default implementation delegates to write_to_parts, and discards any Part annotations.

Source §

fn writeable_length_hint(&self) -> LengthHint

Returns a hint for the number of UTF-8 bytes that will be written to the sink. Read more

Source §

fn write_to_string(&self) -> Cow<'_, str>

Creates a new String with the data from this Writeable. Like ToString, but smaller and faster. Read more

Source §

fn write_to_parts<S>(&self, sink: &mut S) -> Result<(), Error>
where S: PartsWrite + ?Sized,

Write bytes and Part annotations to the given sink. Errors from the sink are bubbled up. The default implementation delegates to write_to, and doesn’t produce any Part annotations.

Source §