Struct regex_syntax::ast::parse::ParserI

source ·
struct ParserI<'s, P> {
    parser: P,
    pattern: &'s str,
}
Expand description

ParserI is the internal parser implementation.

We use this separate type so that we can carry the provided pattern string along with us. In particular, a Parser internal state is not tied to any one pattern, but ParserI is.

This type also lets us use ParserI<&Parser> in production code while retaining the convenience of ParserI<Parser> for tests, which sometimes work against the internal interface of the parser.

Fields§

§parser: P

The parser state/configuration.

§pattern: &'s str

The full regular expression provided by the user.

Implementations§

source§

impl<'s, P: Borrow<Parser>> ParserI<'s, P>

source

fn new(parser: P, pattern: &'s str) -> ParserI<'s, P>

Build an internal parser from a parser configuration and a pattern.

source

fn parser(&self) -> &Parser

Return a reference to the parser state.

source

fn pattern(&self) -> &str

Return a reference to the pattern being parsed.

source

fn error(&self, span: Span, kind: ErrorKind) -> Error

Create a new error with the given span and error type.

source

fn offset(&self) -> usize

Return the current offset of the parser.

The offset starts at 0 from the beginning of the regular expression pattern string.

source

fn line(&self) -> usize

Return the current line number of the parser.

The line number starts at 1.

source

fn column(&self) -> usize

Return the current column of the parser.

The column number starts at 1 and is reset whenever a \n is seen.

source

fn next_capture_index(&self, span: Span) -> Result<u32, Error>

Return the next capturing index. Each subsequent call increments the internal index.

The span given should correspond to the location of the opening parenthesis.

If the capture limit is exceeded, then an error is returned.

source

fn add_capture_name(&self, cap: &CaptureName) -> Result<(), Error>

Adds the given capture name to this parser. If this capture name has already been used, then an error is returned.

source

fn ignore_whitespace(&self) -> bool

Return whether the parser should ignore whitespace or not.

source

fn char(&self) -> char

Return the character at the current position of the parser.

This panics if the current position does not point to a valid char.

source

fn char_at(&self, i: usize) -> char

Return the character at the given position.

This panics if the given position does not point to a valid char.

source

fn bump(&self) -> bool

Bump the parser to the next Unicode scalar value.

If the end of the input has been reached, then false is returned.

source

fn bump_if(&self, prefix: &str) -> bool

If the substring starting at the current position of the parser has the given prefix, then bump the parser to the character immediately following the prefix and return true. Otherwise, don’t bump the parser and return false.

source

fn is_lookaround_prefix(&self) -> bool

Returns true if and only if the parser is positioned at a look-around prefix. The conditions under which this returns true must always correspond to a regular expression that would otherwise be consider invalid.

This should only be called immediately after parsing the opening of a group or a set of flags.

source

fn bump_and_bump_space(&self) -> bool

Bump the parser, and if the x flag is enabled, bump through any subsequent spaces. Return true if and only if the parser is not at EOF.

source

fn bump_space(&self)

If the x flag is enabled (i.e., whitespace insensitivity with comments), then this will advance the parser through all whitespace and comments to the next non-whitespace non-comment byte.

If the x flag is disabled, then this is a no-op.

This should be used selectively throughout the parser where arbitrary whitespace is permitted when the x flag is enabled. For example, { 5 , 6} is equivalent to {5,6}.

source

fn peek(&self) -> Option<char>

Peek at the next character in the input without advancing the parser.

If the input has been exhausted, then this returns None.

source

fn peek_space(&self) -> Option<char>

Like peek, but will ignore spaces when the parser is in whitespace insensitive mode.

source

fn is_eof(&self) -> bool

Returns true if the next call to bump would return false.

source

fn pos(&self) -> Position

Return the current position of the parser, which includes the offset, line and column.

source

fn span(&self) -> Span

Create a span at the current position of the parser. Both the start and end of the span are set.

source

fn span_char(&self) -> Span

Create a span that covers the current character.

source

fn push_alternate(&self, concat: Concat) -> Result<Concat, Error>

Parse and push a single alternation on to the parser’s internal stack. If the top of the stack already has an alternation, then add to that instead of pushing a new one.

The concatenation given corresponds to a single alternation branch. The concatenation returned starts the next branch and is empty.

This assumes the parser is currently positioned at | and will advance the parser to the character following |.

source

fn push_or_add_alternation(&self, concat: Concat)

Pushes or adds the given branch of an alternation to the parser’s internal stack of state.

source

fn push_group(&self, concat: Concat) -> Result<Concat, Error>

Parse and push a group AST (and its parent concatenation) on to the parser’s internal stack. Return a fresh concatenation corresponding to the group’s sub-AST.

If a set of flags was found (with no group), then the concatenation is returned with that set of flags added.

This assumes that the parser is currently positioned on the opening parenthesis. It advances the parser to the character at the start of the sub-expression (or adjoining expression).

If there was a problem parsing the start of the group, then an error is returned.

source

fn pop_group(&self, group_concat: Concat) -> Result<Concat, Error>

Pop a group AST from the parser’s internal stack and set the group’s AST to the given concatenation. Return the concatenation containing the group.

This assumes that the parser is currently positioned on the closing parenthesis and advances the parser to the character following the ).

If no such group could be popped, then an unopened group error is returned.

source

fn pop_group_end(&self, concat: Concat) -> Result<Ast, Error>

Pop the last state from the parser’s internal stack, if it exists, and add the given concatenation to it. There either must be no state or a single alternation item on the stack. Any other scenario produces an error.

This assumes that the parser has advanced to the end.

source

fn push_class_open( &self, parent_union: ClassSetUnion, ) -> Result<ClassSetUnion, Error>

Parse the opening of a character class and push the current class parsing context onto the parser’s stack. This assumes that the parser is positioned at an opening [. The given union should correspond to the union of set items built up before seeing the [.

If there was a problem parsing the opening of the class, then an error is returned. Otherwise, a new union of set items for the class is returned (which may be populated with either a ] or a -).

source

fn pop_class( &self, nested_union: ClassSetUnion, ) -> Result<Either<ClassSetUnion, ClassBracketed>, Error>

Parse the end of a character class set and pop the character class parser stack. The union given corresponds to the last union built before seeing the closing ]. The union returned corresponds to the parent character class set with the nested class added to it.

This assumes that the parser is positioned at a ] and will advance the parser to the byte immediately following the ].

If the stack is empty after popping, then this returns the final “top-level” character class AST (where a “top-level” character class is one that is not nested inside any other character class).

If there is no corresponding opening bracket on the parser’s stack, then an error is returned.

source

fn unclosed_class_error(&self) -> Error

Return an “unclosed class” error whose span points to the most recently opened class.

This should only be called while parsing a character class.

source

fn push_class_op( &self, next_kind: ClassSetBinaryOpKind, next_union: ClassSetUnion, ) -> ClassSetUnion

Push the current set of class items on to the class parser’s stack as the left hand side of the given operator.

A fresh set union is returned, which should be used to build the right hand side of this operator.

source

fn pop_class_op(&self, rhs: ClassSet) -> ClassSet

Pop a character class set from the character class parser stack. If the top of the stack is just an item (not an operation), then return the given set unchanged. If the top of the stack is an operation, then the given set will be used as the rhs of the operation on the top of the stack. In that case, the binary operation is returned as a set.

source§

impl<'s, P: Borrow<Parser>> ParserI<'s, P>

source

fn parse(&self) -> Result<Ast, Error>

Parse the regular expression into an abstract syntax tree.

source

fn parse_with_comments(&self) -> Result<WithComments, Error>

Parse the regular expression and return an abstract syntax tree with all of the comments found in the pattern.

source

fn parse_uncounted_repetition( &self, concat: Concat, kind: RepetitionKind, ) -> Result<Concat, Error>

Parses an uncounted repetition operation. An uncounted repetition operator includes ?, * and +, but does not include the {m,n} syntax. The given kind should correspond to the operator observed by the caller.

This assumes that the parser is currently positioned at the repetition operator and advances the parser to the first character after the operator. (Note that the operator may include a single additional ?, which makes the operator ungreedy.)

The caller should include the concatenation that is being built. The concatenation returned includes the repetition operator applied to the last expression in the given concatenation.

source

fn parse_counted_repetition(&self, concat: Concat) -> Result<Concat, Error>

Parses a counted repetition operation. A counted repetition operator corresponds to the {m,n} syntax, and does not include the ?, * or + operators.

This assumes that the parser is currently positioned at the opening { and advances the parser to the first character after the operator. (Note that the operator may include a single additional ?, which makes the operator ungreedy.)

The caller should include the concatenation that is being built. The concatenation returned includes the repetition operator applied to the last expression in the given concatenation.

source

fn parse_group(&self) -> Result<Either<SetFlags, Group>, Error>

Parse a group (which contains a sub-expression) or a set of flags.

If a group was found, then it is returned with an empty AST. If a set of flags is found, then that set is returned.

The parser should be positioned at the opening parenthesis.

This advances the parser to the character before the start of the sub-expression (in the case of a group) or to the closing parenthesis immediately following the set of flags.

§Errors

If flags are given and incorrectly specified, then a corresponding error is returned.

If a capture name is given and it is incorrectly specified, then a corresponding error is returned.

source

fn parse_capture_name(&self, capture_index: u32) -> Result<CaptureName, Error>

Parses a capture group name. Assumes that the parser is positioned at the first character in the name following the opening < (and may possibly be EOF). This advances the parser to the first character following the closing >.

The caller must provide the capture index of the group for this name.

source

fn parse_flags(&self) -> Result<Flags, Error>

Parse a sequence of flags starting at the current character.

This advances the parser to the character immediately following the flags, which is guaranteed to be either : or ).

§Errors

If any flags are duplicated, then an error is returned.

If the negation operator is used more than once, then an error is returned.

If no flags could be found or if the negation operation is not followed by any flags, then an error is returned.

source

fn parse_flag(&self) -> Result<Flag, Error>

Parse the current character as a flag. Do not advance the parser.

§Errors

If the flag is not recognized, then an error is returned.

source

fn parse_primitive(&self) -> Result<Primitive, Error>

Parse a primitive AST. e.g., A literal, non-set character class or assertion.

This assumes that the parser expects a primitive at the current location. i.e., All other non-primitive cases have been handled. For example, if the parser’s position is at |, then | will be treated as a literal (e.g., inside a character class).

This advances the parser to the first character immediately following the primitive.

source

fn parse_escape(&self) -> Result<Primitive, Error>

Parse an escape sequence as a primitive AST.

This assumes the parser is positioned at the start of the escape sequence, i.e., \. It advances the parser to the first position immediately following the escape sequence.

source

fn maybe_parse_special_word_boundary( &self, wb_start: Position, ) -> Result<Option<AssertionKind>, Error>

Attempt to parse a specialty word boundary. That is, \b{start}, \b{end}, \b{start-half} or \b{end-half}.

This is similar to maybe_parse_ascii_class in that, in most cases, if it fails it will just return None with no error. This is done because \b{5} is a valid expression and we want to let that be parsed by the existing counted repetition parsing code. (I thought about just invoking the counted repetition code from here, but it seemed a little ham-fisted.)

Unlike maybe_parse_ascii_class though, this can return an error. Namely, if we definitely know it isn’t a counted repetition, then we return an error specific to the specialty word boundaries.

This assumes the parser is positioned at a { immediately following a \b. When None is returned, the parser is returned to the position at which it started: pointing at a {.

The position given should correspond to the start of the \b.

source

fn parse_octal(&self) -> Literal

Parse an octal representation of a Unicode codepoint up to 3 digits long. This expects the parser to be positioned at the first octal digit and advances the parser to the first character immediately following the octal number. This also assumes that parsing octal escapes is enabled.

Assuming the preconditions are met, this routine can never fail.

source

fn parse_hex(&self) -> Result<Literal, Error>

Parse a hex representation of a Unicode codepoint. This handles both hex notations, i.e., \xFF and \x{FFFF}. This expects the parser to be positioned at the x, u or U prefix. The parser is advanced to the first character immediately following the hexadecimal literal.

source

fn parse_hex_digits(&self, kind: HexLiteralKind) -> Result<Literal, Error>

Parse an N-digit hex representation of a Unicode codepoint. This expects the parser to be positioned at the first digit and will advance the parser to the first character immediately following the escape sequence.

The number of digits given must be 2 (for \xNN), 4 (for \uNNNN) or 8 (for \UNNNNNNNN).

source

fn parse_hex_brace(&self, kind: HexLiteralKind) -> Result<Literal, Error>

Parse a hex representation of any Unicode scalar value. This expects the parser to be positioned at the opening brace { and will advance the parser to the first character following the closing brace }.

source

fn parse_decimal(&self) -> Result<u32, Error>

Parse a decimal number into a u32 while trimming leading and trailing whitespace.

This expects the parser to be positioned at the first position where a decimal digit could occur. This will advance the parser to the byte immediately following the last contiguous decimal digit.

If no decimal digit could be found or if there was a problem parsing the complete set of digits into a u32, then an error is returned.

source

fn parse_set_class(&self) -> Result<ClassBracketed, Error>

Parse a standard character class consisting primarily of characters or character ranges, but can also contain nested character classes of any type (sans .).

This assumes the parser is positioned at the opening [. If parsing is successful, then the parser is advanced to the position immediately following the closing ].

source

fn parse_set_class_range(&self) -> Result<ClassSetItem, Error>

Parse a single primitive item in a character class set. The item to be parsed can either be one of a simple literal character, a range between two simple literal characters or a “primitive” character class like \w or \p{Greek}.

If an invalid escape is found, or if a character class is found where a simple literal is expected (e.g., in a range), then an error is returned.

source

fn parse_set_class_item(&self) -> Result<Primitive, Error>

Parse a single item in a character class as a primitive, where the primitive either consists of a verbatim literal or a single escape sequence.

This assumes the parser is positioned at the beginning of a primitive, and advances the parser to the first position after the primitive if successful.

Note that it is the caller’s responsibility to report an error if an illegal primitive was parsed.

source

fn parse_set_class_open(&self) -> Result<(ClassBracketed, ClassSetUnion), Error>

Parses the opening of a character class set. This includes the opening bracket along with ^ if present to indicate negation. This also starts parsing the opening set of unioned items if applicable, since there are special rules applied to certain characters in the opening of a character class. For example, [^]] is the class of all characters not equal to ]. (] would need to be escaped in any other position.) Similarly for -.

In all cases, the op inside the returned ast::ClassBracketed is an empty union. This empty union should be replaced with the actual item when it is popped from the parser’s stack.

This assumes the parser is positioned at the opening [ and advances the parser to the first non-special byte of the character class.

An error is returned if EOF is found.

source

fn maybe_parse_ascii_class(&self) -> Option<ClassAscii>

Attempt to parse an ASCII character class, e.g., [:alnum:].

This assumes the parser is positioned at the opening [.

If no valid ASCII character class could be found, then this does not advance the parser and None is returned. Otherwise, the parser is advanced to the first byte following the closing ] and the corresponding ASCII class is returned.

source

fn parse_unicode_class(&self) -> Result<ClassUnicode, Error>

Parse a Unicode class in either the single character notation, \pN or the multi-character bracketed notation, \p{Greek}. This assumes the parser is positioned at the p (or P for negation) and will advance the parser to the character immediately following the class.

Note that this does not check whether the class name is valid or not.

source

fn parse_perl_class(&self) -> ClassPerl

Parse a Perl character class, e.g., \d or \W. This assumes the parser is currently at a valid character class name and will be advanced to the character immediately following the class.

Trait Implementations§

source§

impl<'s, P: Clone> Clone for ParserI<'s, P>

source§

fn clone(&self) -> ParserI<'s, P>

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl<'s, P: Debug> Debug for ParserI<'s, P>

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

§

impl<'s, P> Freeze for ParserI<'s, P>
where P: Freeze,

§

impl<'s, P> RefUnwindSafe for ParserI<'s, P>
where P: RefUnwindSafe,

§

impl<'s, P> Send for ParserI<'s, P>
where P: Send,

§

impl<'s, P> Sync for ParserI<'s, P>
where P: Sync,

§

impl<'s, P> Unpin for ParserI<'s, P>
where P: Unpin,

§

impl<'s, P> UnwindSafe for ParserI<'s, P>
where P: UnwindSafe,

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T> ToOwned for T
where T: Clone,

§

type Owned = T

The resulting type after obtaining ownership.
source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.