Struct regex_syntax::ast::parse::ParserI
source · struct ParserI<'s, P> {
parser: P,
pattern: &'s str,
}
Expand description
ParserI is the internal parser implementation.
We use this separate type so that we can carry the provided pattern string
along with us. In particular, a Parser
internal state is not tied to any
one pattern, but ParserI
is.
This type also lets us use ParserI<&Parser>
in production code while
retaining the convenience of ParserI<Parser>
for tests, which sometimes
work against the internal interface of the parser.
Fields§
§parser: P
The parser state/configuration.
pattern: &'s str
The full regular expression provided by the user.
Implementations§
source§impl<'s, P: Borrow<Parser>> ParserI<'s, P>
impl<'s, P: Borrow<Parser>> ParserI<'s, P>
sourcefn new(parser: P, pattern: &'s str) -> ParserI<'s, P>
fn new(parser: P, pattern: &'s str) -> ParserI<'s, P>
Build an internal parser from a parser configuration and a pattern.
sourcefn error(&self, span: Span, kind: ErrorKind) -> Error
fn error(&self, span: Span, kind: ErrorKind) -> Error
Create a new error with the given span and error type.
sourcefn offset(&self) -> usize
fn offset(&self) -> usize
Return the current offset of the parser.
The offset starts at 0
from the beginning of the regular expression
pattern string.
sourcefn line(&self) -> usize
fn line(&self) -> usize
Return the current line number of the parser.
The line number starts at 1
.
sourcefn column(&self) -> usize
fn column(&self) -> usize
Return the current column of the parser.
The column number starts at 1
and is reset whenever a \n
is seen.
sourcefn next_capture_index(&self, span: Span) -> Result<u32, Error>
fn next_capture_index(&self, span: Span) -> Result<u32, Error>
Return the next capturing index. Each subsequent call increments the internal index.
The span given should correspond to the location of the opening parenthesis.
If the capture limit is exceeded, then an error is returned.
sourcefn add_capture_name(&self, cap: &CaptureName) -> Result<(), Error>
fn add_capture_name(&self, cap: &CaptureName) -> Result<(), Error>
Adds the given capture name to this parser. If this capture name has already been used, then an error is returned.
sourcefn ignore_whitespace(&self) -> bool
fn ignore_whitespace(&self) -> bool
Return whether the parser should ignore whitespace or not.
sourcefn char(&self) -> char
fn char(&self) -> char
Return the character at the current position of the parser.
This panics if the current position does not point to a valid char.
sourcefn char_at(&self, i: usize) -> char
fn char_at(&self, i: usize) -> char
Return the character at the given position.
This panics if the given position does not point to a valid char.
sourcefn bump(&self) -> bool
fn bump(&self) -> bool
Bump the parser to the next Unicode scalar value.
If the end of the input has been reached, then false
is returned.
sourcefn bump_if(&self, prefix: &str) -> bool
fn bump_if(&self, prefix: &str) -> bool
If the substring starting at the current position of the parser has the given prefix, then bump the parser to the character immediately following the prefix and return true. Otherwise, don’t bump the parser and return false.
sourcefn is_lookaround_prefix(&self) -> bool
fn is_lookaround_prefix(&self) -> bool
Returns true if and only if the parser is positioned at a look-around prefix. The conditions under which this returns true must always correspond to a regular expression that would otherwise be consider invalid.
This should only be called immediately after parsing the opening of a group or a set of flags.
sourcefn bump_and_bump_space(&self) -> bool
fn bump_and_bump_space(&self) -> bool
Bump the parser, and if the x
flag is enabled, bump through any
subsequent spaces. Return true if and only if the parser is not at
EOF.
sourcefn bump_space(&self)
fn bump_space(&self)
If the x
flag is enabled (i.e., whitespace insensitivity with
comments), then this will advance the parser through all whitespace
and comments to the next non-whitespace non-comment byte.
If the x
flag is disabled, then this is a no-op.
This should be used selectively throughout the parser where
arbitrary whitespace is permitted when the x
flag is enabled. For
example, { 5 , 6}
is equivalent to {5,6}
.
sourcefn peek(&self) -> Option<char>
fn peek(&self) -> Option<char>
Peek at the next character in the input without advancing the parser.
If the input has been exhausted, then this returns None
.
sourcefn peek_space(&self) -> Option<char>
fn peek_space(&self) -> Option<char>
Like peek, but will ignore spaces when the parser is in whitespace insensitive mode.
sourcefn pos(&self) -> Position
fn pos(&self) -> Position
Return the current position of the parser, which includes the offset, line and column.
sourcefn span(&self) -> Span
fn span(&self) -> Span
Create a span at the current position of the parser. Both the start and end of the span are set.
sourcefn push_alternate(&self, concat: Concat) -> Result<Concat, Error>
fn push_alternate(&self, concat: Concat) -> Result<Concat, Error>
Parse and push a single alternation on to the parser’s internal stack. If the top of the stack already has an alternation, then add to that instead of pushing a new one.
The concatenation given corresponds to a single alternation branch. The concatenation returned starts the next branch and is empty.
This assumes the parser is currently positioned at |
and will advance
the parser to the character following |
.
sourcefn push_or_add_alternation(&self, concat: Concat)
fn push_or_add_alternation(&self, concat: Concat)
Pushes or adds the given branch of an alternation to the parser’s internal stack of state.
sourcefn push_group(&self, concat: Concat) -> Result<Concat, Error>
fn push_group(&self, concat: Concat) -> Result<Concat, Error>
Parse and push a group AST (and its parent concatenation) on to the parser’s internal stack. Return a fresh concatenation corresponding to the group’s sub-AST.
If a set of flags was found (with no group), then the concatenation is returned with that set of flags added.
This assumes that the parser is currently positioned on the opening parenthesis. It advances the parser to the character at the start of the sub-expression (or adjoining expression).
If there was a problem parsing the start of the group, then an error is returned.
sourcefn pop_group(&self, group_concat: Concat) -> Result<Concat, Error>
fn pop_group(&self, group_concat: Concat) -> Result<Concat, Error>
Pop a group AST from the parser’s internal stack and set the group’s AST to the given concatenation. Return the concatenation containing the group.
This assumes that the parser is currently positioned on the closing
parenthesis and advances the parser to the character following the )
.
If no such group could be popped, then an unopened group error is returned.
sourcefn pop_group_end(&self, concat: Concat) -> Result<Ast, Error>
fn pop_group_end(&self, concat: Concat) -> Result<Ast, Error>
Pop the last state from the parser’s internal stack, if it exists, and add the given concatenation to it. There either must be no state or a single alternation item on the stack. Any other scenario produces an error.
This assumes that the parser has advanced to the end.
sourcefn push_class_open(
&self,
parent_union: ClassSetUnion,
) -> Result<ClassSetUnion, Error>
fn push_class_open( &self, parent_union: ClassSetUnion, ) -> Result<ClassSetUnion, Error>
Parse the opening of a character class and push the current class
parsing context onto the parser’s stack. This assumes that the parser
is positioned at an opening [
. The given union should correspond to
the union of set items built up before seeing the [
.
If there was a problem parsing the opening of the class, then an error
is returned. Otherwise, a new union of set items for the class is
returned (which may be populated with either a ]
or a -
).
sourcefn pop_class(
&self,
nested_union: ClassSetUnion,
) -> Result<Either<ClassSetUnion, ClassBracketed>, Error>
fn pop_class( &self, nested_union: ClassSetUnion, ) -> Result<Either<ClassSetUnion, ClassBracketed>, Error>
Parse the end of a character class set and pop the character class
parser stack. The union given corresponds to the last union built
before seeing the closing ]
. The union returned corresponds to the
parent character class set with the nested class added to it.
This assumes that the parser is positioned at a ]
and will advance
the parser to the byte immediately following the ]
.
If the stack is empty after popping, then this returns the final “top-level” character class AST (where a “top-level” character class is one that is not nested inside any other character class).
If there is no corresponding opening bracket on the parser’s stack, then an error is returned.
sourcefn unclosed_class_error(&self) -> Error
fn unclosed_class_error(&self) -> Error
Return an “unclosed class” error whose span points to the most recently opened class.
This should only be called while parsing a character class.
sourcefn push_class_op(
&self,
next_kind: ClassSetBinaryOpKind,
next_union: ClassSetUnion,
) -> ClassSetUnion
fn push_class_op( &self, next_kind: ClassSetBinaryOpKind, next_union: ClassSetUnion, ) -> ClassSetUnion
Push the current set of class items on to the class parser’s stack as the left hand side of the given operator.
A fresh set union is returned, which should be used to build the right hand side of this operator.
sourcefn pop_class_op(&self, rhs: ClassSet) -> ClassSet
fn pop_class_op(&self, rhs: ClassSet) -> ClassSet
Pop a character class set from the character class parser stack. If the top of the stack is just an item (not an operation), then return the given set unchanged. If the top of the stack is an operation, then the given set will be used as the rhs of the operation on the top of the stack. In that case, the binary operation is returned as a set.
source§impl<'s, P: Borrow<Parser>> ParserI<'s, P>
impl<'s, P: Borrow<Parser>> ParserI<'s, P>
sourcefn parse(&self) -> Result<Ast, Error>
fn parse(&self) -> Result<Ast, Error>
Parse the regular expression into an abstract syntax tree.
sourcefn parse_with_comments(&self) -> Result<WithComments, Error>
fn parse_with_comments(&self) -> Result<WithComments, Error>
Parse the regular expression and return an abstract syntax tree with all of the comments found in the pattern.
sourcefn parse_uncounted_repetition(
&self,
concat: Concat,
kind: RepetitionKind,
) -> Result<Concat, Error>
fn parse_uncounted_repetition( &self, concat: Concat, kind: RepetitionKind, ) -> Result<Concat, Error>
Parses an uncounted repetition operation. An uncounted repetition
operator includes ?, * and +, but does not include the {m,n} syntax.
The given kind
should correspond to the operator observed by the
caller.
This assumes that the parser is currently positioned at the repetition
operator and advances the parser to the first character after the
operator. (Note that the operator may include a single additional ?
,
which makes the operator ungreedy.)
The caller should include the concatenation that is being built. The concatenation returned includes the repetition operator applied to the last expression in the given concatenation.
sourcefn parse_counted_repetition(&self, concat: Concat) -> Result<Concat, Error>
fn parse_counted_repetition(&self, concat: Concat) -> Result<Concat, Error>
Parses a counted repetition operation. A counted repetition operator corresponds to the {m,n} syntax, and does not include the ?, * or + operators.
This assumes that the parser is currently positioned at the opening {
and advances the parser to the first character after the operator.
(Note that the operator may include a single additional ?
, which
makes the operator ungreedy.)
The caller should include the concatenation that is being built. The concatenation returned includes the repetition operator applied to the last expression in the given concatenation.
sourcefn parse_group(&self) -> Result<Either<SetFlags, Group>, Error>
fn parse_group(&self) -> Result<Either<SetFlags, Group>, Error>
Parse a group (which contains a sub-expression) or a set of flags.
If a group was found, then it is returned with an empty AST. If a set of flags is found, then that set is returned.
The parser should be positioned at the opening parenthesis.
This advances the parser to the character before the start of the sub-expression (in the case of a group) or to the closing parenthesis immediately following the set of flags.
§Errors
If flags are given and incorrectly specified, then a corresponding error is returned.
If a capture name is given and it is incorrectly specified, then a corresponding error is returned.
sourcefn parse_capture_name(&self, capture_index: u32) -> Result<CaptureName, Error>
fn parse_capture_name(&self, capture_index: u32) -> Result<CaptureName, Error>
Parses a capture group name. Assumes that the parser is positioned at
the first character in the name following the opening <
(and may
possibly be EOF). This advances the parser to the first character
following the closing >
.
The caller must provide the capture index of the group for this name.
sourcefn parse_flags(&self) -> Result<Flags, Error>
fn parse_flags(&self) -> Result<Flags, Error>
Parse a sequence of flags starting at the current character.
This advances the parser to the character immediately following the
flags, which is guaranteed to be either :
or )
.
§Errors
If any flags are duplicated, then an error is returned.
If the negation operator is used more than once, then an error is returned.
If no flags could be found or if the negation operation is not followed by any flags, then an error is returned.
sourcefn parse_flag(&self) -> Result<Flag, Error>
fn parse_flag(&self) -> Result<Flag, Error>
Parse the current character as a flag. Do not advance the parser.
§Errors
If the flag is not recognized, then an error is returned.
sourcefn parse_primitive(&self) -> Result<Primitive, Error>
fn parse_primitive(&self) -> Result<Primitive, Error>
Parse a primitive AST. e.g., A literal, non-set character class or assertion.
This assumes that the parser expects a primitive at the current
location. i.e., All other non-primitive cases have been handled.
For example, if the parser’s position is at |
, then |
will be
treated as a literal (e.g., inside a character class).
This advances the parser to the first character immediately following the primitive.
sourcefn parse_escape(&self) -> Result<Primitive, Error>
fn parse_escape(&self) -> Result<Primitive, Error>
Parse an escape sequence as a primitive AST.
This assumes the parser is positioned at the start of the escape
sequence, i.e., \
. It advances the parser to the first position
immediately following the escape sequence.
sourcefn maybe_parse_special_word_boundary(
&self,
wb_start: Position,
) -> Result<Option<AssertionKind>, Error>
fn maybe_parse_special_word_boundary( &self, wb_start: Position, ) -> Result<Option<AssertionKind>, Error>
Attempt to parse a specialty word boundary. That is, \b{start}
,
\b{end}
, \b{start-half}
or \b{end-half}
.
This is similar to maybe_parse_ascii_class
in that, in most cases,
if it fails it will just return None
with no error. This is done
because \b{5}
is a valid expression and we want to let that be parsed
by the existing counted repetition parsing code. (I thought about just
invoking the counted repetition code from here, but it seemed a little
ham-fisted.)
Unlike maybe_parse_ascii_class
though, this can return an error.
Namely, if we definitely know it isn’t a counted repetition, then we
return an error specific to the specialty word boundaries.
This assumes the parser is positioned at a {
immediately following
a \b
. When None
is returned, the parser is returned to the position
at which it started: pointing at a {
.
The position given should correspond to the start of the \b
.
sourcefn parse_octal(&self) -> Literal
fn parse_octal(&self) -> Literal
Parse an octal representation of a Unicode codepoint up to 3 digits long. This expects the parser to be positioned at the first octal digit and advances the parser to the first character immediately following the octal number. This also assumes that parsing octal escapes is enabled.
Assuming the preconditions are met, this routine can never fail.
sourcefn parse_hex(&self) -> Result<Literal, Error>
fn parse_hex(&self) -> Result<Literal, Error>
Parse a hex representation of a Unicode codepoint. This handles both
hex notations, i.e., \xFF
and \x{FFFF}
. This expects the parser to
be positioned at the x
, u
or U
prefix. The parser is advanced to
the first character immediately following the hexadecimal literal.
sourcefn parse_hex_digits(&self, kind: HexLiteralKind) -> Result<Literal, Error>
fn parse_hex_digits(&self, kind: HexLiteralKind) -> Result<Literal, Error>
Parse an N-digit hex representation of a Unicode codepoint. This expects the parser to be positioned at the first digit and will advance the parser to the first character immediately following the escape sequence.
The number of digits given must be 2 (for \xNN
), 4 (for \uNNNN
)
or 8 (for \UNNNNNNNN
).
sourcefn parse_hex_brace(&self, kind: HexLiteralKind) -> Result<Literal, Error>
fn parse_hex_brace(&self, kind: HexLiteralKind) -> Result<Literal, Error>
Parse a hex representation of any Unicode scalar value. This expects
the parser to be positioned at the opening brace {
and will advance
the parser to the first character following the closing brace }
.
sourcefn parse_decimal(&self) -> Result<u32, Error>
fn parse_decimal(&self) -> Result<u32, Error>
Parse a decimal number into a u32 while trimming leading and trailing whitespace.
This expects the parser to be positioned at the first position where a decimal digit could occur. This will advance the parser to the byte immediately following the last contiguous decimal digit.
If no decimal digit could be found or if there was a problem parsing the complete set of digits into a u32, then an error is returned.
sourcefn parse_set_class(&self) -> Result<ClassBracketed, Error>
fn parse_set_class(&self) -> Result<ClassBracketed, Error>
Parse a standard character class consisting primarily of characters or
character ranges, but can also contain nested character classes of
any type (sans .
).
This assumes the parser is positioned at the opening [
. If parsing
is successful, then the parser is advanced to the position immediately
following the closing ]
.
sourcefn parse_set_class_range(&self) -> Result<ClassSetItem, Error>
fn parse_set_class_range(&self) -> Result<ClassSetItem, Error>
Parse a single primitive item in a character class set. The item to be parsed can either be one of a simple literal character, a range between two simple literal characters or a “primitive” character class like \w or \p{Greek}.
If an invalid escape is found, or if a character class is found where a simple literal is expected (e.g., in a range), then an error is returned.
sourcefn parse_set_class_item(&self) -> Result<Primitive, Error>
fn parse_set_class_item(&self) -> Result<Primitive, Error>
Parse a single item in a character class as a primitive, where the primitive either consists of a verbatim literal or a single escape sequence.
This assumes the parser is positioned at the beginning of a primitive, and advances the parser to the first position after the primitive if successful.
Note that it is the caller’s responsibility to report an error if an illegal primitive was parsed.
sourcefn parse_set_class_open(&self) -> Result<(ClassBracketed, ClassSetUnion), Error>
fn parse_set_class_open(&self) -> Result<(ClassBracketed, ClassSetUnion), Error>
Parses the opening of a character class set. This includes the opening
bracket along with ^
if present to indicate negation. This also
starts parsing the opening set of unioned items if applicable, since
there are special rules applied to certain characters in the opening
of a character class. For example, [^]]
is the class of all
characters not equal to ]
. (]
would need to be escaped in any other
position.) Similarly for -
.
In all cases, the op inside the returned ast::ClassBracketed
is an
empty union. This empty union should be replaced with the actual item
when it is popped from the parser’s stack.
This assumes the parser is positioned at the opening [
and advances
the parser to the first non-special byte of the character class.
An error is returned if EOF is found.
sourcefn maybe_parse_ascii_class(&self) -> Option<ClassAscii>
fn maybe_parse_ascii_class(&self) -> Option<ClassAscii>
Attempt to parse an ASCII character class, e.g., [:alnum:]
.
This assumes the parser is positioned at the opening [
.
If no valid ASCII character class could be found, then this does not
advance the parser and None
is returned. Otherwise, the parser is
advanced to the first byte following the closing ]
and the
corresponding ASCII class is returned.
sourcefn parse_unicode_class(&self) -> Result<ClassUnicode, Error>
fn parse_unicode_class(&self) -> Result<ClassUnicode, Error>
Parse a Unicode class in either the single character notation, \pN
or the multi-character bracketed notation, \p{Greek}
. This assumes
the parser is positioned at the p
(or P
for negation) and will
advance the parser to the character immediately following the class.
Note that this does not check whether the class name is valid or not.
sourcefn parse_perl_class(&self) -> ClassPerl
fn parse_perl_class(&self) -> ClassPerl
Parse a Perl character class, e.g., \d
or \W
. This assumes the
parser is currently at a valid character class name and will be
advanced to the character immediately following the class.