Struct Uts46

source

pub struct Uts46 {
    data: Adapter,
}

Expand description

An implementation of UTS #46.

Fields§

§data: Adapter

Implementations§

source §

impl Uts46

source

pub const fn new() -> Self

Constructor using data compiled into the binary.

source

pub fn to_ascii<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, dns_length: DnsLength, ) -> Result<Cow<'a, str>, Errors>

Performs the ToASCII operation from UTS #46 with the options indicated.

§Arguments

domain_name - The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a &str, call .as_bytes() on it.)
ascii_deny_list - What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by using AsciiDenyList::URL here.
hyphens - The UTS 46 CheckHyphens flag. Most callers are probably the best off by using Hyphens::Allow here.
dns_length - The UTS 46 VerifyDNSLength flag.

source

pub fn to_unicode<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, ) -> (Cow<'a, str>, Result<(), Errors>)

Performs the ToUnicode operation from UTS #46 according to the options given. When there are errors, there is still output, which may be rendered user, even through the output must not be used in networking protocols. Errors are denoted by U+FFFD REPLACEMENT CHARACTERs in the output. (That is, if the second item of the return tuple is Err, the first item of the return tuple is guaranteed to contain at least one U+FFFD.)

Most applications probably shouldn’t use this method and should be using Uts46::to_user_interface instead.

§Arguments

domain_name - The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a &str, call .as_bytes() on it.)
ascii_deny_list - What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by using AsciiDenyList::URL here.
hyphens - The UTS 46 CheckHyphens flag. Most callers are probably the best off by using Hyphens::Allow here.

source

pub fn to_user_interface<'a, OutputUnicode: FnMut(&[char], &[char], bool) -> bool>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, output_as_unicode: OutputUnicode, ) -> (Cow<'a, str>, Result<(), Errors>)

Performs the ToUnicode operation from UTS #46 according to options given with some error-free Unicode labels output according to ToASCII instead as decided by application policy implemented via the output_as_unicode closure. The purpose is to convert user-visible domains to the Unicode form in general but to render potentially misleading labels as Punycode.

This is an imperfect security mechanism, because the Punycode form itself may be resemble a user-recognizable name. However, since this mechanism is common practice, this API provides support for The the mechanism.

ASCII labels always pass through as ASCII and labels with errors always pass through as Unicode. For non-erroneous labels that contain at least one non-ASCII character (implies non-empty), output_as_unicode is called with the Unicode form of the label, the TLD (potentially empty), and a flag indicating whether the domain name as a whole is a bidi domain name. If the return value is true, the label passes through as Unicode. If the return value is false, the label is converted to Punycode.

When there are errors, there is still output, which may be rendered user, even through the output must not be used in networking protocols. Errors are denoted by U+FFFD REPLACEMENT CHARACTERs in the output. (That is, if the second item of the return tuple is Err, the first item of the return tuple is guaranteed to contain at least one U+FFFD.) Labels that contain errors are not converted to Punycode.

§Arguments

domain_name - The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a &str, call .as_bytes() on it.)
ascii_deny_list - What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by using AsciiDenyList::URL here.
hyphens - The UTS 46 CheckHyphens flag. Most callers are probably the best off by using Hyphens::Allow here.
output_as_unicode - A closure for deciding if a label should be output as Unicode (as opposed to Punycode). The first argument is the label for which a decision is needed (always non-empty slice). The second argument is the TLD (potentially empty). The third argument is true iff the domain name as a whole is a bidi domain name. Only non-erroneous labels that contain at least one non-ASCII character are passed to the closure as the first argument. The second and third argument values are guaranteed to remain the same during a single call to process, and the closure may cache computations derived from the second and third argument (hence the FnMut type).

source

pub fn process<W: Write + ?Sized, OutputUnicode: FnMut(&[char], &[char], bool) -> bool>( &self, domain_name: &[u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, error_policy: ErrorPolicy, output_as_unicode: OutputUnicode, sink: &mut W, ascii_sink: Option<&mut W>, ) -> Result<ProcessingSuccess, ProcessingError>

The lower-level function that Uts46::to_ascii, Uts46::to_unicode, and Uts46::to_user_interface are built on to allow support for output types other than Cow<'a, str> (e.g. string types in a non-Rust programming language).

§Arguments

domain_name - The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a &str, call .as_bytes() on it.)
ascii_deny_list - What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by using AsciiDenyList::URL here.
hyphens - The UTS 46 CheckHyphens flag. Most callers are probably the best off by using Hyphens::Allow here.
error_policy - Whether to fail fast or to produce output that may be rendered for the user to examine in case of errors.
output_as_unicode - A closure for deciding if a label should be output as Unicode (as opposed to Punycode). The first argument is the label for which a decision is needed (always non-empty slice). The second argument is the TLD (potentially empty). The third argument is true iff the domain name as a whole is a bidi domain name. Only non-erroneous labels that contain at least one non-ASCII character are passed to the closure as the first argument. The second and third argument values are guaranteed to remain the same during a single call to process, and the closure may cache computations derived from the second and third argument (hence the FnMut type). To perform the ToASCII operation, |_, _, _| false must be passed as the closure. To perform the ToUnicode operation, |_, _, _| true must be passed as the closure. A more complex closure may be used to prepare a domain name for display in a user interface so that labels are converted to the Unicode form in general but potentially misleading labels are converted to the Punycode form.
sink - The object that receives the output (in the non-passthrough case).
ascii_sink - A second sink that receives the ToASCII form only if there were no errors and sink received at least one character of non-ASCII output. The purpose of this argument is to enable a user interface display form of the domain and the ToASCII form of the domain to be computed efficiently together. This argument is useless when output_as_unicode always returns false, in which case the ToASCII form ends up in sink already. If ascii_sink receives no output and the return value is Ok(ProcessingSuccess::WroteToSink), use the output received by sink also as the ToASCII result.

§Return value

Ok(ProcessingSuccess::Passthrough) - The caller must treat unsafe { core::str::from_utf8_unchecked(domain_name) } as the output. (This return value asserts that calling core::str::from_utf8_unchecked(domain_name) is safe.)
Ok(ProcessingSuccess::WroteToSink) - The caller must treat was was written to sink as the output. If another sink was passed as ascii_sink but it did not receive output, the caller must treat what was written to sink also as the ToASCII output. Otherwise, if ascii_sink received output, the caller must treat what was written to ascii_sink as the ToASCII output.
Err(ProcessingError::ValidityError) - The input was in error and must not be used for DNS lookup or otherwise in a network protocol. If error_policy was ErrorPolicy::MarkErrors, the output written to sink may be displayed to the user as an illustration of where the error was or the errors were.
Err(ProcessingError::SinkError) - Either sink or ascii_sink returned core::fmt::Error. The partial output written to sink ascii_sink must not be used. If W never returns core::fmt::Error, this method never returns Err(ProcessingError::SinkError).

§Safety-usable invariant

If the return value is Ok(ProcessingSuccess::Passthrough), domain_name is ASCII and core::str::from_utf8_unchecked(domain_name) is safe. (Note: Other return values do not imply that domain_name wasn’t ASCII!)

§Security considerations

Showing labels whose Unicode form might mislead the user as Punycode instead is an imperfect security mechanism, because the Punycode form itself may be resemble a user-recognizable name. However, since this mechanism is common practice, this API provides support for the the mechanism.

Punycode processing is quadratic, so to avoid denial of service, this method imposes length limits on Punycode treating especially long inputs as being in error. These limits are well higher than the DNS length limits and are not more restrictive than the limits imposed by ICU4C.

source

fn process_inner<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, fail_fast: bool, domain_buffer: &mut SmallVec<[char; 253]>, already_punycode: &mut SmallVec<[AlreadyAsciiLabel<'a>; 8]>, ) -> (usize, bool, bool)

The part of process that doesn’t need to be generic over the sink.

source

fn process_innermost<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, fail_fast: bool, domain_buffer: &mut SmallVec<[char; 253]>, already_punycode: &mut SmallVec<[AlreadyAsciiLabel<'a>; 8]>, tail: &'a [u8], ) -> (usize, bool, bool)

The part of process that doesn’t need to be generic over the sink and can avoid monomorphizing in the interest of code size. Separating this into a different stack frame compared to process_inner improves performance in the ICU4X case.

source