Module regex_syntax::hir

source ·
Expand description

Defines a high-level intermediate (HIR) representation for regular expressions.

The HIR is represented by the Hir type, and it principally constructed via translation from an Ast. Alternatively, users may use the smart constructors defined on Hir to build their own by hand. The smart constructors simultaneously simplify and “optimize” the HIR, and are also the same routines used by translation.

Most regex engines only have an HIR like this, and usually construct it directly from the concrete syntax. This crate however first parses the concrete syntax into an Ast, and only then creates the HIR from the Ast, as mentioned above. It’s done this way to facilitate better error reporting, and to have a structured representation of a regex that faithfully represents its concrete syntax. Namely, while an Hir value can be converted back to an equivalent regex pattern string, it is unlikely to look like the original due to its simplified structure.

Modules

  • interval 🔒
  • Provides literal extraction from Hir expressions.
  • This module provides a regular expression printer for Hir.
  • Defines a translator that converts an Ast to an Hir.
  • visitor 🔒

Structs

  • The high-level intermediate representation for a capturing group.
  • An error that occurs when Unicode-aware simple case folding fails.
  • A set of characters represented by arbitrary bytes.
  • An iterator over all ranges in a byte character class.
  • A single range of characters represented by arbitrary bytes.
  • A set of characters represented by Unicode scalar values.
  • An iterator over all ranges in a Unicode character class.
  • A single range of characters represented by Unicode scalar values.
  • An error that can occur while translating an Ast to a Hir.
  • A high-level intermediate representation (HIR) for a regular expression.
  • The high-level intermediate representation of a literal.
  • A set of look-around assertions.
  • An iterator over all look-around assertions in a LookSet.
  • A type that collects various properties of an HIR value.
  • The property definition. It is split out so that we can box it, and there by make Properties use less stack size. This is kind-of important because every HIR value has a Properties attached to it.
  • The high-level intermediate representation of a repetition operator.

Enums

  • The high-level intermediate representation of a character class.
  • A type describing the different flavors of ..
  • The type of an error that occurred while building an Hir.
  • The underlying kind of an arbitrary Hir expression.
  • The high-level intermediate representation for a look-around assertion.

Traits

  • A trait for visiting the high-level IR (HIR) in depth first order.

Functions

  • Given a sequence of HIR values where each value corresponds to a byte class (or an all-ASCII Unicode class), return a single byte class corresponding to the union of the classes found.
  • Given a sequence of HIR values where each value corresponds to a Unicode class (or an all-ASCII byte class), return a single Unicode class corresponding to the union of the classes found.
  • Looks for a common prefix in the list of alternation branches given. If one is found, then an equivalent but (hopefully) simplified Hir is returned. Otherwise, the original given list of branches is returned unmodified.
  • Given a sequence of HIR values where each value corresponds to a literal that is a single byte, return that sequence of bytes. Otherwise return None. No deduplication is done.
  • Given a sequence of HIR values where each value corresponds to a literal that is a single char, return that sequence of chars. Otherwise return None. No deduplication is done.
  • Executes an implementation of Visitor in constant stack space.