Module regex_syntax::hir
source · Expand description
Defines a high-level intermediate (HIR) representation for regular expressions.
The HIR is represented by the Hir
type, and it principally constructed via
translation from an Ast
. Alternatively, users
may use the smart constructors defined on Hir
to build their own by hand. The
smart constructors simultaneously simplify and “optimize” the HIR, and are also
the same routines used by translation.
Most regex engines only have an HIR like this, and usually construct it
directly from the concrete syntax. This crate however first parses the
concrete syntax into an Ast
, and only then creates the HIR from the Ast
,
as mentioned above. It’s done this way to facilitate better error reporting,
and to have a structured representation of a regex that faithfully represents
its concrete syntax. Namely, while an Hir
value can be converted back to an
equivalent regex pattern string, it is unlikely to look like the original due
to its simplified structure.
Modules§
- interval 🔒
- Provides literal extraction from
Hir
expressions. - This module provides a regular expression printer for
Hir
. - Defines a translator that converts an
Ast
to anHir
. - visitor 🔒
Structs§
- The high-level intermediate representation for a capturing group.
- An error that occurs when Unicode-aware simple case folding fails.
- A set of characters represented by arbitrary bytes.
- An iterator over all ranges in a byte character class.
- A single range of characters represented by arbitrary bytes.
- A set of characters represented by Unicode scalar values.
- An iterator over all ranges in a Unicode character class.
- A single range of characters represented by Unicode scalar values.
- An error that can occur while translating an
Ast
to aHir
. - A high-level intermediate representation (HIR) for a regular expression.
- The high-level intermediate representation of a literal.
- A set of look-around assertions.
- An iterator over all look-around assertions in a
LookSet
. - A type that collects various properties of an HIR value.
- The property definition. It is split out so that we can box it, and there by make
Properties
use less stack size. This is kind-of important because every HIR value has aProperties
attached to it. - The high-level intermediate representation of a repetition operator.
Enums§
- The high-level intermediate representation of a character class.
- A type describing the different flavors of
.
. - The type of an error that occurred while building an
Hir
. - The underlying kind of an arbitrary
Hir
expression. - The high-level intermediate representation for a look-around assertion.
Traits§
- A trait for visiting the high-level IR (HIR) in depth first order.
Functions§
- Given a sequence of HIR values where each value corresponds to a byte class (or an all-ASCII Unicode class), return a single byte class corresponding to the union of the classes found.
- Given a sequence of HIR values where each value corresponds to a Unicode class (or an all-ASCII byte class), return a single Unicode class corresponding to the union of the classes found.
- Looks for a common prefix in the list of alternation branches given. If one is found, then an equivalent but (hopefully) simplified Hir is returned. Otherwise, the original given list of branches is returned unmodified.
- Given a sequence of HIR values where each value corresponds to a literal that is a single byte, return that sequence of bytes. Otherwise return None. No deduplication is done.
- Given a sequence of HIR values where each value corresponds to a literal that is a single
char
, return that sequence ofchar
s. Otherwise return None. No deduplication is done. - Executes an implementation of
Visitor
in constant stack space.