pub struct Uts46 {
data: Adapter,
}
Expand description
An implementation of UTS #46.
Fields§
§data: Adapter
Implementations§
source§impl Uts46
impl Uts46
sourcepub fn to_ascii<'a>(
&self,
domain_name: &'a [u8],
ascii_deny_list: AsciiDenyList,
hyphens: Hyphens,
dns_length: DnsLength,
) -> Result<Cow<'a, str>, Errors>
pub fn to_ascii<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, dns_length: DnsLength, ) -> Result<Cow<'a, str>, Errors>
Performs the ToASCII operation from UTS #46 with the options indicated.
§Arguments
domain_name
- The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a&str
, call.as_bytes()
on it.)ascii_deny_list
- What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by usingAsciiDenyList::URL
here.hyphens
- The UTS 46 CheckHyphens flag. Most callers are probably the best off by usingHyphens::Allow
here.dns_length
- The UTS 46 VerifyDNSLength flag.
sourcepub fn to_unicode<'a>(
&self,
domain_name: &'a [u8],
ascii_deny_list: AsciiDenyList,
hyphens: Hyphens,
) -> (Cow<'a, str>, Result<(), Errors>)
pub fn to_unicode<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, ) -> (Cow<'a, str>, Result<(), Errors>)
Performs the ToUnicode operation
from UTS #46 according to the options given. When there
are errors, there is still output, which may be rendered user, even through
the output must not be used in networking protocols. Errors are denoted
by U+FFFD REPLACEMENT CHARACTERs in the output. (That is, if the second item of the
return tuple is Err
, the first item of the return tuple is guaranteed to contain
at least one U+FFFD.)
Most applications probably shouldn’t use this method and should be using
Uts46::to_user_interface
instead.
§Arguments
domain_name
- The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a&str
, call.as_bytes()
on it.)ascii_deny_list
- What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by usingAsciiDenyList::URL
here.hyphens
- The UTS 46 CheckHyphens flag. Most callers are probably the best off by usingHyphens::Allow
here.
sourcepub fn to_user_interface<'a, OutputUnicode: FnMut(&[char], &[char], bool) -> bool>(
&self,
domain_name: &'a [u8],
ascii_deny_list: AsciiDenyList,
hyphens: Hyphens,
output_as_unicode: OutputUnicode,
) -> (Cow<'a, str>, Result<(), Errors>)
pub fn to_user_interface<'a, OutputUnicode: FnMut(&[char], &[char], bool) -> bool>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, output_as_unicode: OutputUnicode, ) -> (Cow<'a, str>, Result<(), Errors>)
Performs the ToUnicode operation
from UTS #46 according to options given with some
error-free Unicode labels output according to
ToASCII instead as decided by
application policy implemented via the output_as_unicode
closure. The purpose
is to convert user-visible domains to the Unicode form in general but to render
potentially misleading labels as Punycode.
This is an imperfect security mechanism, because the Punycode form itself may be resemble a user-recognizable name. However, since this mechanism is common practice, this API provides support for The the mechanism.
ASCII labels always pass through as ASCII and labels with errors always pass through
as Unicode. For non-erroneous labels that contain at least one non-ASCII character
(implies non-empty), output_as_unicode
is called with the Unicode form of the label,
the TLD (potentially empty), and a flag indicating whether the domain name as a whole
is a bidi domain name. If the return value is true
, the label passes through as
Unicode. If the return value is false
, the label is converted to Punycode.
When there are errors, there is still output, which may be rendered user, even through
the output must not be used in networking protocols. Errors are denoted by
U+FFFD REPLACEMENT CHARACTERs in the output. (That is, if the second item
of the return tuple is Err
, the first item of the return tuple is guaranteed to contain
at least one U+FFFD.) Labels that contain errors are not converted to Punycode.
§Arguments
domain_name
- The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a&str
, call.as_bytes()
on it.)ascii_deny_list
- What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by usingAsciiDenyList::URL
here.hyphens
- The UTS 46 CheckHyphens flag. Most callers are probably the best off by usingHyphens::Allow
here.output_as_unicode
- A closure for deciding if a label should be output as Unicode (as opposed to Punycode). The first argument is the label for which a decision is needed (always non-empty slice). The second argument is the TLD (potentially empty). The third argument istrue
iff the domain name as a whole is a bidi domain name. Only non-erroneous labels that contain at least one non-ASCII character are passed to the closure as the first argument. The second and third argument values are guaranteed to remain the same during a single call toprocess
, and the closure may cache computations derived from the second and third argument (hence theFnMut
type).
sourcepub fn process<W: Write + ?Sized, OutputUnicode: FnMut(&[char], &[char], bool) -> bool>(
&self,
domain_name: &[u8],
ascii_deny_list: AsciiDenyList,
hyphens: Hyphens,
error_policy: ErrorPolicy,
output_as_unicode: OutputUnicode,
sink: &mut W,
ascii_sink: Option<&mut W>,
) -> Result<ProcessingSuccess, ProcessingError>
pub fn process<W: Write + ?Sized, OutputUnicode: FnMut(&[char], &[char], bool) -> bool>( &self, domain_name: &[u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, error_policy: ErrorPolicy, output_as_unicode: OutputUnicode, sink: &mut W, ascii_sink: Option<&mut W>, ) -> Result<ProcessingSuccess, ProcessingError>
The lower-level function that Uts46::to_ascii
, Uts46::to_unicode
, and
Uts46::to_user_interface
are built on to allow support for output types other
than Cow<'a, str>
(e.g. string types in a non-Rust programming language).
§Arguments
domain_name
- The input domain name as UTF-8 bytes. (The UTF-8ness is checked by this method and input that is not well-formed UTF-8 is treated as an error. If you already have a&str
, call.as_bytes()
on it.)ascii_deny_list
- What ASCII deny list, if any, to apply. The UTS 46 UseSTD3ASCIIRules flag or the WHATWG URL Standard forbidden domain code point processing is handled via this argument. Most callers are probably the best off by usingAsciiDenyList::URL
here.hyphens
- The UTS 46 CheckHyphens flag. Most callers are probably the best off by usingHyphens::Allow
here.error_policy
- Whether to fail fast or to produce output that may be rendered for the user to examine in case of errors.output_as_unicode
- A closure for deciding if a label should be output as Unicode (as opposed to Punycode). The first argument is the label for which a decision is needed (always non-empty slice). The second argument is the TLD (potentially empty). The third argument istrue
iff the domain name as a whole is a bidi domain name. Only non-erroneous labels that contain at least one non-ASCII character are passed to the closure as the first argument. The second and third argument values are guaranteed to remain the same during a single call toprocess
, and the closure may cache computations derived from the second and third argument (hence theFnMut
type). To perform the ToASCII operation,|_, _, _| false
must be passed as the closure. To perform the ToUnicode operation,|_, _, _| true
must be passed as the closure. A more complex closure may be used to prepare a domain name for display in a user interface so that labels are converted to the Unicode form in general but potentially misleading labels are converted to the Punycode form.sink
- The object that receives the output (in the non-passthrough case).ascii_sink
- A second sink that receives the ToASCII form only if there were no errors andsink
received at least one character of non-ASCII output. The purpose of this argument is to enable a user interface display form of the domain and the ToASCII form of the domain to be computed efficiently together. This argument is useless whenoutput_as_unicode
always returnsfalse
, in which case the ToASCII form ends up insink
already. Ifascii_sink
receives no output and the return value isOk(ProcessingSuccess::WroteToSink)
, use the output received bysink
also as the ToASCII result.
§Return value
Ok(ProcessingSuccess::Passthrough)
- The caller must treatunsafe { core::str::from_utf8_unchecked(domain_name) }
as the output. (This return value asserts that callingcore::str::from_utf8_unchecked(domain_name)
is safe.)Ok(ProcessingSuccess::WroteToSink)
- The caller must treat was was written tosink
as the output. If another sink was passed asascii_sink
but it did not receive output, the caller must treat what was written tosink
also as the ToASCII output. Otherwise, ifascii_sink
received output, the caller must treat what was written toascii_sink
as the ToASCII output.Err(ProcessingError::ValidityError)
- The input was in error and must not be used for DNS lookup or otherwise in a network protocol. Iferror_policy
wasErrorPolicy::MarkErrors
, the output written tosink
may be displayed to the user as an illustration of where the error was or the errors were.Err(ProcessingError::SinkError)
- Eithersink
orascii_sink
returnedcore::fmt::Error
. The partial output written tosink
ascii_sink
must not be used. IfW
never returnscore::fmt::Error
, this method never returnsErr(ProcessingError::SinkError)
.
§Safety-usable invariant
If the return value is Ok(ProcessingSuccess::Passthrough)
, domain_name
is
ASCII and core::str::from_utf8_unchecked(domain_name)
is safe. (Note:
Other return values do not imply that domain_name
wasn’t ASCII!)
§Security considerations
Showing labels whose Unicode form might mislead the user as Punycode instead is an imperfect security mechanism, because the Punycode form itself may be resemble a user-recognizable name. However, since this mechanism is common practice, this API provides support for the the mechanism.
Punycode processing is quadratic, so to avoid denial of service, this method imposes length limits on Punycode treating especially long inputs as being in error. These limits are well higher than the DNS length limits and are not more restrictive than the limits imposed by ICU4C.
sourcefn process_inner<'a>(
&self,
domain_name: &'a [u8],
ascii_deny_list: AsciiDenyList,
hyphens: Hyphens,
fail_fast: bool,
domain_buffer: &mut SmallVec<[char; 253]>,
already_punycode: &mut SmallVec<[AlreadyAsciiLabel<'a>; 8]>,
) -> (usize, bool, bool)
fn process_inner<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, fail_fast: bool, domain_buffer: &mut SmallVec<[char; 253]>, already_punycode: &mut SmallVec<[AlreadyAsciiLabel<'a>; 8]>, ) -> (usize, bool, bool)
The part of process
that doesn’t need to be generic over the sink.
sourcefn process_innermost<'a>(
&self,
domain_name: &'a [u8],
ascii_deny_list: AsciiDenyList,
hyphens: Hyphens,
fail_fast: bool,
domain_buffer: &mut SmallVec<[char; 253]>,
already_punycode: &mut SmallVec<[AlreadyAsciiLabel<'a>; 8]>,
tail: &'a [u8],
) -> (usize, bool, bool)
fn process_innermost<'a>( &self, domain_name: &'a [u8], ascii_deny_list: AsciiDenyList, hyphens: Hyphens, fail_fast: bool, domain_buffer: &mut SmallVec<[char; 253]>, already_punycode: &mut SmallVec<[AlreadyAsciiLabel<'a>; 8]>, tail: &'a [u8], ) -> (usize, bool, bool)
The part of process
that doesn’t need to be generic over the sink and
can avoid monomorphizing in the interest of code size.
Separating this into a different stack frame compared to process_inner
improves performance in the ICU4X case.