Struct regex_automata::meta::regex::Builder

source ·
pub struct Builder {
    config: Config,
    ast: ParserBuilder,
    hir: TranslatorBuilder,
}
Expand description

A builder for configuring and constructing a Regex.

The builder permits configuring two different aspects of a Regex:

Once configured, the builder can then be used to construct a Regex from one of 4 different inputs:

The latter two methods in particular provide a way to construct a fully feature regular expression matcher directly from an Hir expression without having to first convert it to a string. (This is in contrast to the top-level regex crate which intentionally provides no such API in order to avoid making regex-syntax a public dependency.)

As a convenience, this builder may be created via Regex::builder, which may help avoid an extra import.

§Example: change the line terminator

This example shows how to enable multi-line mode by default and change the line terminator to the NUL byte:

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().multi_line(true))
    .configure(Regex::config().line_terminator(b'\x00'))
    .build(r"^foo$")?;
let hay = "\x00foo\x00";
assert_eq!(Some(Match::must(0, 1..4)), re.find(hay));

§Example: disable UTF-8 requirement

By default, regex patterns are required to match UTF-8. This includes regex patterns that can produce matches of length zero. In the case of an empty match, by default, matches will not appear between the code units of a UTF-8 encoded codepoint.

However, it can be useful to disable this requirement, particularly if you’re searching things like &[u8] that are not known to be valid UTF-8.

use regex_automata::{meta::Regex, util::syntax, Match};

let mut builder = Regex::builder();
// Disables the requirement that non-empty matches match UTF-8.
builder.syntax(syntax::Config::new().utf8(false));
// Disables the requirement that empty matches match UTF-8 boundaries.
builder.configure(Regex::config().utf8_empty(false));

// We can match raw bytes via \xZZ syntax, but we need to disable
// Unicode mode to do that. We could disable it everywhere, or just
// selectively, as shown here.
let re = builder.build(r"(?-u:\xFF)foo(?-u:\xFF)")?;
let hay = b"\xFFfoo\xFF";
assert_eq!(Some(Match::must(0, 0..5)), re.find(hay));

// We can also match between code units.
let re = builder.build(r"")?;
let hay = "☃";
assert_eq!(re.find_iter(hay).collect::<Vec<Match>>(), vec![
    Match::must(0, 0..0),
    Match::must(0, 1..1),
    Match::must(0, 2..2),
    Match::must(0, 3..3),
]);

Fields§

§config: Config§ast: ParserBuilder§hir: TranslatorBuilder

Implementations§

source§

impl Builder

source

pub fn new() -> Builder

Creates a new builder for configuring and constructing a Regex.

source

pub fn build(&self, pattern: &str) -> Result<Regex, BuildError>

Builds a Regex from a single pattern string.

If there was a problem parsing the pattern or a problem turning it into a regex matcher, then an error is returned.

§Example

This example shows how to configure syntax options.

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().crlf(true).multi_line(true))
    .build(r"^foo$")?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));
source

pub fn build_many<P: AsRef<str>>( &self, patterns: &[P], ) -> Result<Regex, BuildError>

Builds a Regex from many pattern strings.

If there was a problem parsing any of the patterns or a problem turning them into a regex matcher, then an error is returned.

§Example: finding the pattern that caused an error

When a syntax error occurs, it is possible to ask which pattern caused the syntax error.

use regex_automata::{meta::Regex, PatternID};

let err = Regex::builder()
    .build_many(&["a", "b", r"\p{Foo}", "c"])
    .unwrap_err();
assert_eq!(Some(PatternID::must(2)), err.pattern());
§Example: zero patterns is valid

Building a regex with zero patterns results in a regex that never matches anything. Because this routine is generic, passing an empty slice usually requires a turbo-fish (or something else to help type inference).

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .build_many::<&str>(&[])?;
assert_eq!(None, re.find(""));
source

pub fn build_from_hir(&self, hir: &Hir) -> Result<Regex, BuildError>

Builds a Regex directly from an Hir expression.

This is useful if you needed to parse a pattern string into an Hir for other reasons (such as analysis or transformations). This routine permits building a Regex directly from the Hir expression instead of first converting the Hir back to a pattern string.

When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn’t relevant here.

If there was a problem building the underlying regex matcher for the given Hir, then an error is returned.

§Example

This example shows how one can hand-construct an Hir expression and build a regex from it without doing any parsing at all.

use {
    regex_automata::{meta::Regex, Match},
    regex_syntax::hir::{Hir, Look},
};

// (?Rm)^foo$
let hir = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("foo".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
    .build_from_hir(&hir)?;
let hay = "\r\nfoo\r\n";
assert_eq!(Some(Match::must(0, 2..5)), re.find(hay));

Ok::<(), Box<dyn std::error::Error>>(())
source

pub fn build_many_from_hir<H: Borrow<Hir>>( &self, hirs: &[H], ) -> Result<Regex, BuildError>

Builds a Regex directly from many Hir expressions.

This is useful if you needed to parse pattern strings into Hir expressions for other reasons (such as analysis or transformations). This routine permits building a Regex directly from the Hir expressions instead of first converting the Hir expressions back to pattern strings.

When using this method, any options set via Builder::syntax are ignored. Namely, the syntax options only apply when parsing a pattern string, which isn’t relevant here.

If there was a problem building the underlying regex matcher for the given Hir expressions, then an error is returned.

Note that unlike Builder::build_many, this can only fail as a result of building the underlying matcher. In that case, there is no single Hir expression that can be isolated as a reason for the failure. So if this routine fails, it’s not possible to determine which Hir expression caused the failure.

§Example

This example shows how one can hand-construct multiple Hir expressions and build a single regex from them without doing any parsing at all.

use {
    regex_automata::{meta::Regex, Match},
    regex_syntax::hir::{Hir, Look},
};

// (?Rm)^foo$
let hir1 = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("foo".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
// (?Rm)^bar$
let hir2 = Hir::concat(vec![
    Hir::look(Look::StartCRLF),
    Hir::literal("bar".as_bytes()),
    Hir::look(Look::EndCRLF),
]);
let re = Regex::builder()
    .build_many_from_hir(&[&hir1, &hir2])?;
let hay = "\r\nfoo\r\nbar";
let got: Vec<Match> = re.find_iter(hay).collect();
let expected = vec![
    Match::must(0, 2..5),
    Match::must(1, 7..10),
];
assert_eq!(expected, got);

Ok::<(), Box<dyn std::error::Error>>(())
source

pub fn configure(&mut self, config: Config) -> &mut Builder

Configure the behavior of a Regex.

This configuration controls non-syntax options related to the behavior of a Regex. This includes things like whether empty matches can split a codepoint, prefilters, line terminators and a long list of options for configuring which regex engines the meta regex engine will be able to use internally.

§Example

This example shows how to disable UTF-8 empty mode. This will permit empty matches to occur between the UTF-8 encoding of a codepoint.

use regex_automata::{meta::Regex, Match};

let re = Regex::new("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches only occur at the beginning and end of the snowman.
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 3..3),
]);

let re = Regex::builder()
    .configure(Regex::config().utf8_empty(false))
    .build("")?;
let got: Vec<Match> = re.find_iter("☃").collect();
// Matches now occur at every position!
assert_eq!(got, vec![
    Match::must(0, 0..0),
    Match::must(0, 1..1),
    Match::must(0, 2..2),
    Match::must(0, 3..3),
]);

Ok::<(), Box<dyn std::error::Error>>(())
source

pub fn syntax(&mut self, config: Config) -> &mut Builder

Configure the syntax options when parsing a pattern string while building a Regex.

These options only apply when Builder::build or Builder::build_many are used. The other build methods accept Hir values, which have already been parsed.

§Example

This example shows how to enable case insensitive mode.

use regex_automata::{meta::Regex, util::syntax, Match};

let re = Regex::builder()
    .syntax(syntax::Config::new().case_insensitive(true))
    .build(r"δ")?;
assert_eq!(Some(Match::must(0, 0..2)), re.find(r"Δ"));

Ok::<(), Box<dyn std::error::Error>>(())

Trait Implementations§

source§

impl Clone for Builder

source§

fn clone(&self) -> Builder

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for Builder

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.