Module regex_automata::util::utf8

source ยท
Expand description

Utilities for dealing with UTF-8.

This module provides some UTF-8 related helper routines, including an incremental decoder.

Functionsยง

  • decode ๐Ÿ”’
    Decodes the next UTF-8 encoded codepoint from the given byte slice.
  • decode_last ๐Ÿ”’
    Decodes the last UTF-8 encoded codepoint from the given byte slice.
  • is_boundary ๐Ÿ”’
    Returns true if and only if the given offset in the given bytes falls on a valid UTF-8 encoded codepoint boundary.
  • Returns true if and only if the given byte is either a valid leading UTF-8 byte, or is otherwise an invalid byte that can never appear anywhere in a valid UTF-8 sequence.
  • is_word_byte ๐Ÿ”’
    Returns true if and only if the given byte is considered a word character. This only applies to ASCII.
  • len ๐Ÿ”’
    Given a UTF-8 leading byte, this returns the total number of code units in the following encoded codepoint.