Module regex_automata::meta

source ยท
Expand description

Provides a regex matcher that composes several other regex matchers automatically.

This module is home to a meta Regex, which provides a convenient high level API for executing regular expressions in linear time.

ยงComparison with the regex crate

A meta Regex is the implementation used directly by the regex crate. Indeed, the regex crate API is essentially just a light wrapper over a meta Regex. This means that if you need the full flexibility offered by this API, then you should be able to switch to using this API directly without any changes in match semantics or syntax. However, there are some API level differences:

  • The regex crate API returns match objects that include references to the haystack itself, which in turn makes it easy to access the matching strings without having to slice the haystack yourself. In contrast, a meta Regex returns match objects that only have offsets in them.
  • At time of writing, a meta Regex doesnโ€™t have some of the convenience routines that the regex crate has, such as replacements. Note though that Captures::interpolate_string will handle the replacement string interpolation for you.
  • A meta Regex supports the Input abstraction, which provides a way to configure a search in more ways than is supported by the regex crate. For example, Input::anchored can be used to run an anchored search, regardless of whether the pattern is itself anchored with a ^.
  • A meta Regex supports multi-pattern searching everywhere. Indeed, every Match returned by the search APIs include a PatternID indicating which pattern matched. In the single pattern case, all matches correspond to PatternID::ZERO. In contrast, the regex crate has distinct Regex and a RegexSet APIs. The former only supports a single pattern, while the latter supports multiple patterns but cannot report the offsets of a match.
  • A meta Regex provides the explicit capability of bypassing its internal memory pool for automatically acquiring mutable scratch space required by its internal regex engines. Namely, a Cache can be explicitly provided to lower level routines such as Regex::search_with.

Modulesยง

  • error ๐Ÿ”’
  • limited ๐Ÿ”’
    This module defines two bespoke reverse DFA searching routines. (One for the lazy DFA and one for the fully compiled DFA.) These routines differ from the usual ones by permitting the caller to specify a minimum starting position. That is, the search will begin at input.end() and will usually stop at input.start(), unless min_start > input.start(), in which case, the search will stop at min_start.
  • literal ๐Ÿ”’
  • regex ๐Ÿ”’
  • reverse_inner ๐Ÿ”’
    A module dedicated to plucking inner literals out of a regex pattern, and then constructing a prefilter for them. We also include a regex pattern โ€œprefixโ€ that corresponds to the bits of the regex that need to match before the literals do. The reverse inner optimization then proceeds by looking for matches of the inner literal(s), and then doing a reverse search of the prefix from the start of the literal match to find the overall start position of the match.
  • stopat ๐Ÿ”’
    This module defines two bespoke forward DFA search routines. One for the lazy DFA and one for the fully compiled DFA. These routines differ from the normal ones by reporting the position at which the search terminates when a match isnโ€™t found.
  • strategy ๐Ÿ”’
  • wrappers ๐Ÿ”’
    This module contains a boat load of wrappers around each of our internal regex engines. They encapsulate a few things:

Structsยง

  • An error that occurs when construction of a Regex fails.
  • A builder for configuring and constructing a Regex.
  • Represents mutable scratch space used by regex engines during a search.
  • An iterator over all non-overlapping leftmost matches with their capturing groups.
  • An object describing the configuration of a Regex.
  • An iterator over all non-overlapping matches.
  • A regex matcher that works by composing several other regex matchers automatically.
  • Yields all substrings delimited by a regular expression match.
  • Yields at most N spans delimited by a regular expression match.