Expand description
Functions for converting between different in-RAM representations of text and for quickly checking if the Unicode Bidirectional Algorithm can be avoided.
By using slices for output, the functions here seek to enable by-register (ALU register or SIMD register as available) operations in order to outperform iterator-based conversions available in the Rust standard library.
Note: “Latin1” in this module refers to the Unicode range from U+0000 to U+00FF, inclusive, and does not refer to the windows-1252 range. This in-memory encoding is sometimes used as a storage optimization of text when UTF-16 indexing and length semantics are exposed.
The FFI binding for this module are in the encoding_c_mem crate.
Enums§
- Latin1
Bidi  - Classification of text as Latin1 (all code points are below U+0100), left-to-right with some non-Latin1 characters or as containing at least some right-to-left characters.
 
Functions§
- check_
str_ for_ latin1_ and_ bidi  - Checks whether a valid UTF-8 buffer contains code points that trigger right-to-left processing or is all-Latin1.
 - check_
utf8_ for_ latin1_ and_ bidi  - Checks whether a potentially invalid UTF-8 buffer contains code points that trigger right-to-left processing or is all-Latin1.
 - check_
utf16_ for_ latin1_ and_ bidi  - Checks whether a potentially invalid UTF-16 buffer contains code points that trigger right-to-left processing or is all-Latin1.
 - convert_
latin1_ to_ str  - Converts bytes whose unsigned value is interpreted as Unicode code point (i.e. U+0000 to U+00FF, inclusive) to UTF-8 such that the validity of the output is signaled using the Rust type system.
 - convert_
latin1_ to_ str_ partial  - Converts bytes whose unsigned value is interpreted as Unicode code point (i.e. U+0000 to U+00FF, inclusive) to UTF-8 such that the validity of the output is signaled using the Rust type system with potentially insufficient output space.
 - convert_
latin1_ to_ utf8  - Converts bytes whose unsigned value is interpreted as Unicode code point (i.e. U+0000 to U+00FF, inclusive) to UTF-8.
 - convert_
latin1_ to_ utf8_ partial  - Converts bytes whose unsigned value is interpreted as Unicode code point (i.e. U+0000 to U+00FF, inclusive) to UTF-8 with potentially insufficient output space.
 - convert_
latin1_ to_ utf16  - Converts bytes whose unsigned value is interpreted as Unicode code point (i.e. U+0000 to U+00FF, inclusive) to UTF-16.
 - convert_
str_ to_ utf16  - Converts valid UTF-8 to valid UTF-16.
 - convert_
utf8_ to_ latin1_ lossy  - If the input is valid UTF-8 representing only Unicode code points from U+0000 to U+00FF, inclusive, converts the input into output that represents the value of each code point as the unsigned byte value of each output byte.
 - convert_
utf8_ to_ utf16  - Converts potentially-invalid UTF-8 to valid UTF-16 with errors replaced with the REPLACEMENT CHARACTER.
 - convert_
utf8_ to_ utf16_ without_ replacement  - Converts potentially-invalid UTF-8 to valid UTF-16 signaling on error.
 - convert_
utf16_ to_ latin1_ lossy  - If the input is valid UTF-16 representing only Unicode code points from U+0000 to U+00FF, inclusive, converts the input into output that represents the value of each code point as the unsigned byte value of each output byte.
 - convert_
utf16_ to_ str  - Converts potentially-invalid UTF-16 to valid UTF-8 with errors replaced with the REPLACEMENT CHARACTER such that the validity of the output is signaled using the Rust type system.
 - convert_
utf16_ to_ str_ partial  - Converts potentially-invalid UTF-16 to valid UTF-8 with errors replaced with the REPLACEMENT CHARACTER such that the validity of the output is signaled using the Rust type system with potentially insufficient output space.
 - convert_
utf16_ to_ utf8  - Converts potentially-invalid UTF-16 to valid UTF-8 with errors replaced with the REPLACEMENT CHARACTER.
 - convert_
utf16_ to_ utf8_ partial  - Converts potentially-invalid UTF-16 to valid UTF-8 with errors replaced with the REPLACEMENT CHARACTER with potentially insufficient output space.
 - copy_
ascii_ to_ ascii  - Copies ASCII from source to destination up to the first non-ASCII byte (or the end of the input if it is ASCII in its entirety).
 - copy_
ascii_ to_ basic_ latin  - Copies ASCII from source to destination zero-extending it to UTF-16 up to the first non-ASCII byte (or the end of the input if it is ASCII in its entirety).
 - copy_
basic_ latin_ to_ ascii  - Copies Basic Latin from source to destination narrowing it to ASCII up to the first non-Basic Latin code unit (or the end of the input if it is Basic Latin in its entirety).
 - decode_
latin1  - Converts bytes whose unsigned value is interpreted as Unicode code point (i.e. U+0000 to U+00FF, inclusive) to UTF-8.
 - encode_
latin1_ lossy  - If the input is valid UTF-8 representing only Unicode code points from U+0000 to U+00FF, inclusive, converts the input into output that represents the value of each code point as the unsigned byte value of each output byte.
 - ensure_
utf16_ validity  - Replaces unpaired surrogates in the input with the REPLACEMENT CHARACTER.
 - is_
ascii  - Checks whether the buffer is all-ASCII.
 - is_
basic_ latin  - Checks whether the buffer is all-Basic Latin (i.e. UTF-16 representing only ASCII characters).
 - is_
char_ bidi  - Checks whether a scalar value triggers right-to-left processing.
 - is_
str_ bidi  - Checks whether a valid UTF-8 buffer contains code points that trigger right-to-left processing.
 - is_
str_ latin1  - Checks whether the buffer represents only code points less than or equal to U+00FF.
 - is_
utf8_ bidi  - Checks whether a potentially-invalid UTF-8 buffer contains code points that trigger right-to-left processing.
 - is_
utf8_ latin1  - Checks whether the buffer is valid UTF-8 representing only code points less than or equal to U+00FF.
 - is_
utf16_ bidi  - Checks whether a UTF-16 buffer contains code points that trigger right-to-left processing.
 - is_
utf16_ code_ unit_ bidi  - Checks whether a UTF-16 code unit triggers right-to-left processing.
 - is_
utf16_ latin1  - Checks whether the buffer represents only code point less than or equal to U+00FF.
 - str_
latin1_ up_ to  - Returns the index of first byte that starts a non-Latin1 byte sequence, or the length of the string if there are none.
 - utf8_
latin1_ up_ to  - Returns the index of first byte that starts an invalid byte sequence or a non-Latin1 byte sequence, or the length of the string if there are neither.
 - utf16_
valid_ up_ to  - Returns the index of the first unpaired surrogate or, if the input is valid UTF-16 in its entirety, the length of the input.