Struct aho_corasick::automaton::StreamChunkIter
source · struct StreamChunkIter<'a, A, R> {
aut: &'a A,
rdr: R,
buf: Buffer,
start: StateID,
sid: StateID,
absolute_pos: usize,
buffer_pos: usize,
buffer_reported_pos: usize,
}
Expand description
An iterator that reports matches in a stream.
(This doesn’t actually implement the Iterator
trait because it returns
something with a lifetime attached to a buffer it owns, but that’s OK. It
still has a next
method and is iterator-like enough to be fine.)
This iterator yields elements of type io::Result<StreamChunk>
, where
an error is reported if there was a problem reading from the underlying
stream. The iterator terminates only when the underlying stream reaches
EOF
.
The idea here is that each chunk represents either a match or a non-match, and if you concatenated all of the chunks together, you’d reproduce the entire contents of the stream, byte-for-byte.
This chunk machinery is a bit complicated and it isn’t strictly required for a stream searcher that just reports matches. But we do need something like this to deal with the “replacement” API, which needs to know which chunks it can copy and which it needs to replace.
Fields§
§aut: &'a A
The underlying automaton to do the search.
rdr: R
The source of bytes we read from.
buf: Buffer
A roll buffer for managing bytes from rdr
. Basically, this is used
to handle the case of a match that is split by two different
calls to rdr.read()
. This isn’t strictly needed if all we needed to
do was report matches, but here we are reporting chunks of non-matches
and matches and in order to do that, we really just cannot treat our
stream as non-overlapping blocks of bytes. We need to permit some
overlap while we retain bytes from a previous read
call in memory.
start: StateID
The unanchored starting state of this automaton.
sid: StateID
The state of the automaton.
absolute_pos: usize
The absolute position over the entire stream.
buffer_pos: usize
The position we’re currently at within buf
.
buffer_reported_pos: usize
The buffer position of the end of the bytes that we last returned to the caller. Basically, whenever we find a match, we look to see if there is a difference between where the match started and the position of the last byte we returned to the caller. If there’s a difference, then we need to return a ‘NonMatch’ chunk.
Implementations§
source§impl<'a, A: Automaton, R: Read> StreamChunkIter<'a, A, R>
impl<'a, A: Automaton, R: Read> StreamChunkIter<'a, A, R>
fn new(aut: &'a A, rdr: R) -> Result<StreamChunkIter<'a, A, R>, MatchError>
fn next(&mut self) -> Option<Result<StreamChunk<'_>>>
sourcefn get_match_chunk(&self, mat: Match) -> Range<usize>
fn get_match_chunk(&self, mat: Match) -> Range<usize>
Return a match chunk for the given match. It is assumed that the match
ends at the current buffer_pos
.
sourcefn get_non_match_chunk(&self, mat: Match) -> Option<Range<usize>>
fn get_non_match_chunk(&self, mat: Match) -> Option<Range<usize>>
Return a non-match chunk, if necessary, just before reporting a match.
This returns None
if there is nothing to report. Otherwise, this
assumes that the given match ends at the current buffer_pos
.
sourcefn get_pre_roll_non_match_chunk(&self) -> Option<Range<usize>>
fn get_pre_roll_non_match_chunk(&self) -> Option<Range<usize>>
Look for any bytes that should be reported as a non-match just before rolling the buffer.
Note that this only reports bytes up to buffer.len() - min_buffer_len
, as it’s not possible to know whether the bytes
following that will participate in a match or not.
sourcefn get_eof_non_match_chunk(&self) -> Option<Range<usize>>
fn get_eof_non_match_chunk(&self) -> Option<Range<usize>>
Return any unreported bytes as a non-match up to the end of the buffer.
This should only be called when the entire contents of the buffer have been searched and EOF has been hit when trying to fill the buffer.