Struct aho_corasick::ahocorasick::AhoCorasick
source · pub struct AhoCorasick {
aut: Arc<dyn AcAutomaton>,
kind: AhoCorasickKind,
start_kind: StartKind,
}
Expand description
An automaton for searching multiple strings in linear time.
The AhoCorasick
type supports a few basic ways of constructing an
automaton, with the default being AhoCorasick::new
. However, there
are a fair number of configurable options that can be set by using
AhoCorasickBuilder
instead. Such options include, but are not limited
to, how matches are determined, simple case insensitivity, whether to use a
DFA or not and various knobs for controlling the space-vs-time trade offs
taken when building the automaton.
§Resource usage
Aho-Corasick automatons are always constructed in O(p)
time, where
p
is the combined length of all patterns being searched. With that
said, building an automaton can be fairly costly because of high constant
factors, particularly when enabling the DFA option
with AhoCorasickBuilder::kind
. For this reason, it’s generally a good
idea to build an automaton once and reuse it as much as possible.
Aho-Corasick automatons can also use a fair bit of memory. To get
a concrete idea of how much memory is being used, try using the
AhoCorasick::memory_usage
method.
To give a quick idea of the differences between Aho-Corasick implementations and their resource usage, here’s a sample of construction times and heap memory used after building an automaton from 100,000 randomly selected titles from Wikipedia:
- 99MB for a
noncontiguous::NFA
in 240ms. - 21MB for a
contiguous::NFA
in 275ms. - 1.6GB for a
dfa::DFA
in 1.88s.
(Note that the memory usage above reflects the size of each automaton and not peak memory usage. For example, building a contiguous NFA requires first building a noncontiguous NFA. Once the contiguous NFA is built, the noncontiguous NFA is freed.)
This experiment very strongly argues that a contiguous NFA is often the
best balance in terms of resource usage. It takes a little longer to build,
but its memory usage is quite small. Its search speed (not listed) is
also often faster than a noncontiguous NFA, but a little slower than a
DFA. Indeed, when no specific AhoCorasickKind
is used (which is the
default), a contiguous NFA is used in most cases.
The only “catch” to using a contiguous NFA is that, because of its variety
of compression tricks, it may not be able to support automatons as large as
what the noncontiguous NFA supports. In which case, building a contiguous
NFA will fail and (by default) AhoCorasick
will automatically fall
back to a noncontiguous NFA. (This typically only happens when building
automatons from millions of patterns.) Otherwise, the small additional time
for building a contiguous NFA is almost certainly worth it.
§Cloning
The AhoCorasick
type uses thread safe reference counting internally. It
is guaranteed that it is cheap to clone.
§Search configuration
Most of the search routines accept anything that can be cheaply converted
to an Input
. This includes &[u8]
, &str
and Input
itself.
§Construction failure
It is generally possible for building an Aho-Corasick automaton to fail. Construction can fail in generally one way: when the inputs provided are too big. Whether that’s a pattern that is too long, too many patterns or some combination of both. A first approximation for the scale at which construction can fail is somewhere around “millions of patterns.”
For that reason, if you’re building an Aho-Corasick automaton from untrusted input (or input that doesn’t have any reasonable bounds on its size), then it is strongly recommended to handle the possibility of an error.
If you’re constructing an Aho-Corasick automaton from static or trusted
data, then it is likely acceptable to panic (by calling unwrap()
or
expect()
) if construction fails.
§Fallibility
The AhoCorasick
type provides a number of methods for searching, as one
might expect. Depending on how the Aho-Corasick automaton was built and
depending on the search configuration, it is possible for a search to
return an error. Since an error is never dependent on the actual contents
of the haystack, this type provides both infallible and fallible methods
for searching. The infallible methods panic if an error occurs, and can be
used for convenience and when you know the search will never return an
error.
For example, the AhoCorasick::find_iter
method is the infallible
version of the AhoCorasick::try_find_iter
method.
Examples of errors that can occur:
- Running a search that requires
MatchKind::Standard
semantics (such as a stream or overlapping search) with an automaton that was built withMatchKind::LeftmostFirst
orMatchKind::LeftmostLongest
semantics. - Running an anchored search with an automaton that only supports
unanchored searches. (By default,
AhoCorasick
only supports unanchored searches. But this can be toggled withAhoCorasickBuilder::start_kind
.) - Running an unanchored search with an automaton that only supports anchored searches.
The common thread between the different types of errors is that they are all rooted in the automaton construction and search configurations. If those configurations are a static property of your program, then it is reasonable to call infallible routines since you know an error will never occur. And if one does occur, then it’s a bug in your program.
To re-iterate, if the patterns, build or search configuration come from
user or untrusted data, then you should handle errors at build or search
time. If only the haystack comes from user or untrusted data, then there
should be no need to handle errors anywhere and it is generally encouraged
to unwrap()
(or expect()
) both build and search time calls.
§Examples
This example shows how to search for occurrences of multiple patterns simultaneously in a case insensitive fashion. Each match includes the pattern that matched along with the byte offsets of the match.
use aho_corasick::{AhoCorasick, PatternID};
let patterns = &["apple", "maple", "snapple"];
let haystack = "Nobody likes maple in their apple flavored Snapple.";
let ac = AhoCorasick::builder()
.ascii_case_insensitive(true)
.build(patterns)
.unwrap();
let mut matches = vec![];
for mat in ac.find_iter(haystack) {
matches.push((mat.pattern(), mat.start(), mat.end()));
}
assert_eq!(matches, vec![
(PatternID::must(1), 13, 18),
(PatternID::must(0), 28, 33),
(PatternID::must(2), 43, 50),
]);
This example shows how to replace matches with some other string:
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];
let ac = AhoCorasick::new(patterns).unwrap();
let result = ac.replace_all(haystack, replace_with);
assert_eq!(result, "The slow grey sloth.");
Fields§
§aut: Arc<dyn AcAutomaton>
The underlying Aho-Corasick automaton. It’s one of nfa::noncontiguous::NFA, nfa::contiguous::NFA or dfa::DFA.
kind: AhoCorasickKind
The specific Aho-Corasick kind chosen. This makes it possible to
inspect any AhoCorasick
and know what kind of search strategy it
uses.
start_kind: StartKind
The start kind of this automaton as configured by the caller.
We don’t really need to put this here, since the underlying automaton will correctly return errors if the caller requests an unsupported search type. But we do keep this here for API behavior consistency. Namely, the NFAs in this crate support both unanchored and anchored searches unconditionally. There’s no way to disable one or the other. They always both work. But the DFA in this crate specifically only supports both unanchored and anchored searches if it’s configured to do so. Why? Because for the DFA, supporting both essentially requires two copies of the transition table: one generated by following failure transitions from the original NFA and one generated by not following those failure transitions.
So why record the start kind here? Well, consider what happens when no specific ‘AhoCorasickKind’ is selected by the caller and ‘StartKind::Unanchored’ is used (both are the default). It might result in using a DFA or it might pick an NFA. If it picks an NFA, the caller would then be able to run anchored searches, even though the caller only asked for support for unanchored searches. Maybe that’s fine, but what if the DFA was chosen instead? Oops, the caller would get an error.
Basically, it seems bad to return an error or not based on some internal implementation choice. So we smooth things out and ensure anchored searches always report an error when only unanchored support was asked for (and vice versa), even if the underlying automaton supports it.
Implementations§
source§impl AhoCorasick
impl AhoCorasick
Convenience constructors for an Aho-Corasick searcher. To configure the
searcher, use an AhoCorasickBuilder
instead.
sourcepub fn new<I, P>(patterns: I) -> Result<AhoCorasick, BuildError>
pub fn new<I, P>(patterns: I) -> Result<AhoCorasick, BuildError>
Create a new Aho-Corasick automaton using the default configuration.
The default configuration optimizes for less space usage, but at the
expense of longer search times. To change the configuration, use
AhoCorasickBuilder
.
This uses the default MatchKind::Standard
match semantics, which
reports a match as soon as it is found. This corresponds to the
standard match semantics supported by textbook descriptions of the
Aho-Corasick algorithm.
§Examples
Basic usage:
use aho_corasick::{AhoCorasick, PatternID};
let ac = AhoCorasick::new(&["foo", "bar", "baz"]).unwrap();
assert_eq!(
Some(PatternID::must(1)),
ac.find("xxx bar xxx").map(|m| m.pattern()),
);
sourcepub fn builder() -> AhoCorasickBuilder
pub fn builder() -> AhoCorasickBuilder
A convenience method for returning a new Aho-Corasick builder.
This usually permits one to just import the AhoCorasick
type.
§Examples
Basic usage:
use aho_corasick::{AhoCorasick, Match, MatchKind};
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(&["samwise", "sam"])
.unwrap();
assert_eq!(Some(Match::must(0, 0..7)), ac.find("samwise"));
source§impl AhoCorasick
impl AhoCorasick
Infallible search routines. These APIs panic when the underlying search
would otherwise fail. Infallible routines are useful because the errors are
a result of both search-time configuration and what configuration is used
to build the Aho-Corasick searcher. Both of these things are not usually
the result of user input, and thus, an error is typically indicative of a
programmer error. In cases where callers want errors instead of panics, use
the corresponding try
method in the section below.
sourcepub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
pub fn is_match<'h, I: Into<Input<'h>>>(&self, input: I) -> bool
Returns true if and only if this automaton matches the haystack at any position.
input
may be any type that is cheaply convertible to an Input
. This
includes, but is not limited to, &str
and &[u8]
.
Aside from convenience, when AhoCorasick
was built with
leftmost-first or leftmost-longest semantics, this might result in a
search that visits less of the haystack than AhoCorasick::find
would otherwise. (For standard semantics, matches are always
immediately returned once they are seen, so there is no way for this to
do less work in that case.)
Note that there is no corresponding fallible routine for this method.
If you need a fallible version of this, then AhoCorasick::try_find
can be used with Input::earliest
enabled.
§Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&[
"foo", "bar", "quux", "baz",
]).unwrap();
assert!(ac.is_match("xxx bar xxx"));
assert!(!ac.is_match("xxx qux xxx"));
sourcepub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
pub fn find<'h, I: Into<Input<'h>>>(&self, input: I) -> Option<Match>
Returns the location of the first match according to the match semantics that this automaton was constructed with.
input
may be any type that is cheaply convertible to an Input
. This
includes, but is not limited to, &str
and &[u8]
.
This is the infallible version of AhoCorasick::try_find
.
§Panics
This panics when AhoCorasick::try_find
would return an error.
§Examples
Basic usage, with standard semantics:
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::Standard) // default, not necessary
.build(patterns)
.unwrap();
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("b", &haystack[mat.start()..mat.end()]);
Now with leftmost-first semantics:
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let mat = ac.find(haystack).expect("should have a match");
assert_eq!("abc", &haystack[mat.start()..mat.end()]);
And finally, leftmost-longest semantics:
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostLongest)
.build(patterns)
.unwrap();
let mat = ac.find(haystack).expect("should have a match");
§Example: configuring a search
Because this method accepts anything that can be turned into an
Input
, it’s possible to provide an Input
directly in order to
configure the search. In this example, we show how to use the
earliest
option to force the search to return as soon as it knows
a match has occurred.
use aho_corasick::{AhoCorasick, Input, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostLongest)
.build(patterns)
.unwrap();
let mat = ac.find(Input::new(haystack).earliest(true))
.expect("should have a match");
// The correct leftmost-longest match here is 'abcd', but since we
// told the search to quit as soon as it knows a match has occurred,
// we get a different match back.
assert_eq!("b", &haystack[mat.start()..mat.end()]);
sourcepub fn find_overlapping<'h, I: Into<Input<'h>>>(
&self,
input: I,
state: &mut OverlappingState,
)
pub fn find_overlapping<'h, I: Into<Input<'h>>>( &self, input: I, state: &mut OverlappingState, )
Returns the location of the first overlapping match in the given input with respect to the current state of the underlying searcher.
input
may be any type that is cheaply convertible to an Input
. This
includes, but is not limited to, &str
and &[u8]
.
Overlapping searches do not report matches in their return value.
Instead, matches can be accessed via OverlappingState::get_match
after a search call.
This is the infallible version of
AhoCorasick::try_find_overlapping
.
§Panics
This panics when AhoCorasick::try_find_overlapping
would
return an error. For example, when the Aho-Corasick searcher
doesn’t support overlapping searches. (Only searchers built with
MatchKind::Standard
semantics support overlapping searches.)
§Example
This shows how we can repeatedly call an overlapping search without ever needing to explicitly re-slice the haystack. Overlapping search works this way because searches depend on state saved during the previous search.
use aho_corasick::{
automaton::OverlappingState,
AhoCorasick, Input, Match,
};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns).unwrap();
let mut state = OverlappingState::start();
ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(2, 0..3)), state.get_match());
ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(0, 0..6)), state.get_match());
ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(2, 11..14)), state.get_match());
ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(2, 22..25)), state.get_match());
ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(0, 22..28)), state.get_match());
ac.find_overlapping(haystack, &mut state);
assert_eq!(Some(Match::must(1, 22..31)), state.get_match());
// No more match matches to be found.
ac.find_overlapping(haystack, &mut state);
assert_eq!(None, state.get_match());
sourcepub fn find_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> FindIter<'a, 'h> ⓘ
pub fn find_iter<'a, 'h, I: Into<Input<'h>>>( &'a self, input: I, ) -> FindIter<'a, 'h> ⓘ
Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.
input
may be any type that is cheaply convertible to an Input
. This
includes, but is not limited to, &str
and &[u8]
.
This is the infallible version of AhoCorasick::try_find_iter
.
§Panics
This panics when AhoCorasick::try_find_iter
would return an error.
§Examples
Basic usage, with standard semantics:
use aho_corasick::{AhoCorasick, MatchKind, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::Standard) // default, not necessary
.build(patterns)
.unwrap();
let matches: Vec<PatternID> = ac
.find_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![
PatternID::must(2),
PatternID::must(2),
PatternID::must(2),
], matches);
Now with leftmost-first semantics:
use aho_corasick::{AhoCorasick, MatchKind, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let matches: Vec<PatternID> = ac
.find_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![
PatternID::must(0),
PatternID::must(2),
PatternID::must(0),
], matches);
And finally, leftmost-longest semantics:
use aho_corasick::{AhoCorasick, MatchKind, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostLongest)
.build(patterns)
.unwrap();
let matches: Vec<PatternID> = ac
.find_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![
PatternID::must(0),
PatternID::must(2),
PatternID::must(1),
], matches);
sourcepub fn find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> FindOverlappingIter<'a, 'h> ⓘ
pub fn find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>( &'a self, input: I, ) -> FindOverlappingIter<'a, 'h> ⓘ
Returns an iterator of overlapping matches. Stated differently, this returns an iterator of all possible matches at every position.
input
may be any type that is cheaply convertible to an Input
. This
includes, but is not limited to, &str
and &[u8]
.
This is the infallible version of
AhoCorasick::try_find_overlapping_iter
.
§Panics
This panics when AhoCorasick::try_find_overlapping_iter
would return
an error. For example, when the Aho-Corasick searcher is built with
either leftmost-first or leftmost-longest match semantics. Stated
differently, overlapping searches require one to build the searcher
with MatchKind::Standard
(it is the default).
§Example: basic usage
use aho_corasick::{AhoCorasick, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns).unwrap();
let matches: Vec<PatternID> = ac
.find_overlapping_iter(haystack)
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![
PatternID::must(2),
PatternID::must(0),
PatternID::must(2),
PatternID::must(2),
PatternID::must(0),
PatternID::must(1),
], matches);
sourcepub fn replace_all<B>(&self, haystack: &str, replace_with: &[B]) -> String
pub fn replace_all<B>(&self, haystack: &str, replace_with: &[B]) -> String
Replace all matches with a corresponding value in the replace_with
slice given. Matches correspond to the same matches as reported by
AhoCorasick::find_iter
.
Replacements are determined by the index of the matching pattern.
For example, if the pattern with index 2
is found, then it is
replaced by replace_with[2]
.
This is the infallible version of AhoCorasick::try_replace_all
.
§Panics
This panics when AhoCorasick::try_replace_all
would return an
error.
This also panics when replace_with.len()
does not equal
AhoCorasick::patterns_len
.
§Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let result = ac.replace_all(haystack, &["x", "y", "z"]);
assert_eq!("x the z to the xage", result);
sourcepub fn replace_all_bytes<B>(
&self,
haystack: &[u8],
replace_with: &[B],
) -> Vec<u8>
pub fn replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Vec<u8>
Replace all matches using raw bytes with a corresponding value in the
replace_with
slice given. Matches correspond to the same matches as
reported by AhoCorasick::find_iter
.
Replacements are determined by the index of the matching pattern.
For example, if the pattern with index 2
is found, then it is
replaced by replace_with[2]
.
This is the infallible version of
AhoCorasick::try_replace_all_bytes
.
§Panics
This panics when AhoCorasick::try_replace_all_bytes
would return an
error.
This also panics when replace_with.len()
does not equal
AhoCorasick::patterns_len
.
§Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let result = ac.replace_all_bytes(haystack, &["x", "y", "z"]);
assert_eq!(b"x the z to the xage".to_vec(), result);
sourcepub fn replace_all_with<F>(
&self,
haystack: &str,
dst: &mut String,
replace_with: F,
)
pub fn replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, )
Replace all matches using a closure called on each match.
Matches correspond to the same matches as reported by
AhoCorasick::find_iter
.
The closure accepts three parameters: the match found, the text of
the match and a string buffer with which to write the replaced text
(if any). If the closure returns true
, then it continues to the next
match. If the closure returns false
, then searching is stopped.
Note that any matches with boundaries that don’t fall on a valid UTF-8 boundary are silently skipped.
This is the infallible version of
AhoCorasick::try_replace_all_with
.
§Panics
This panics when AhoCorasick::try_replace_all_with
would return an
error.
§Examples
Basic usage:
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let mut result = String::new();
ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
dst.push_str(&mat.pattern().as_usize().to_string());
true
});
assert_eq!("0 the 2 to the 0age", result);
Stopping the replacement by returning false
(continued from the
example above):
let mut result = String::new();
ac.replace_all_with(haystack, &mut result, |mat, _, dst| {
dst.push_str(&mat.pattern().as_usize().to_string());
mat.pattern() != PatternID::must(2)
});
assert_eq!("0 the 2 to the appendage", result);
sourcepub fn replace_all_with_bytes<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
replace_with: F,
)
pub fn replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, )
Replace all matches using raw bytes with a closure called on each
match. Matches correspond to the same matches as reported by
AhoCorasick::find_iter
.
The closure accepts three parameters: the match found, the text of
the match and a byte buffer with which to write the replaced text
(if any). If the closure returns true
, then it continues to the next
match. If the closure returns false
, then searching is stopped.
This is the infallible version of
AhoCorasick::try_replace_all_with_bytes
.
§Panics
This panics when AhoCorasick::try_replace_all_with_bytes
would
return an error.
§Examples
Basic usage:
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let mut result = vec![];
ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
dst.extend(mat.pattern().as_usize().to_string().bytes());
true
});
assert_eq!(b"0 the 2 to the 0age".to_vec(), result);
Stopping the replacement by returning false
(continued from the
example above):
let mut result = vec![];
ac.replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
dst.extend(mat.pattern().as_usize().to_string().bytes());
mat.pattern() != PatternID::must(2)
});
assert_eq!(b"0 the 2 to the appendage".to_vec(), result);
sourcepub fn stream_find_iter<'a, R: Read>(&'a self, rdr: R) -> StreamFindIter<'a, R> ⓘ
pub fn stream_find_iter<'a, R: Read>(&'a self, rdr: R) -> StreamFindIter<'a, R> ⓘ
Returns an iterator of non-overlapping matches in the given
stream. Matches correspond to the same matches as reported by
AhoCorasick::find_iter
.
The matches yielded by this iterator use absolute position offsets in
the stream given, where the first byte has index 0
. Matches are
yieled until the stream is exhausted.
Each item yielded by the iterator is an Result<Match, std::io::Error>
, where an error is yielded if there was a problem
reading from the reader given.
When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.
This is the infallible version of
AhoCorasick::try_stream_find_iter
. Note that both methods return
iterators that produce Result
values. The difference is that this
routine panics if construction of the iterator failed. The Result
values yield by the iterator come from whether the given reader returns
an error or not during the search.
§Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
§Panics
This panics when AhoCorasick::try_stream_find_iter
would return
an error. For example, when the Aho-Corasick searcher doesn’t support
stream searches. (Only searchers built with MatchKind::Standard
semantics support stream searches.)
§Example: basic usage
use aho_corasick::{AhoCorasick, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns).unwrap();
let mut matches = vec![];
for result in ac.stream_find_iter(haystack.as_bytes()) {
let mat = result?;
matches.push(mat.pattern());
}
assert_eq!(vec![
PatternID::must(2),
PatternID::must(2),
PatternID::must(2),
], matches);
source§impl AhoCorasick
impl AhoCorasick
Fallible search routines. These APIs return an error in cases where the infallible routines would panic.
sourcepub fn try_find<'h, I: Into<Input<'h>>>(
&self,
input: I,
) -> Result<Option<Match>, MatchError>
pub fn try_find<'h, I: Into<Input<'h>>>( &self, input: I, ) -> Result<Option<Match>, MatchError>
Returns the location of the first match according to the match
semantics that this automaton was constructed with, and according
to the given Input
configuration.
This is the fallible version of AhoCorasick::find
.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the given Input
configuration.
For example, if the Aho-Corasick searcher only supports anchored
searches or only supports unanchored searches, then providing an
Input
that requests an anchored (or unanchored) search when it isn’t
supported would result in an error.
§Example: leftmost-first searching
Basic usage with leftmost-first semantics:
use aho_corasick::{AhoCorasick, MatchKind, Input};
let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let mat = ac.try_find(haystack)?.expect("should have a match");
assert_eq!("abc", &haystack[mat.span()]);
§Example: anchored leftmost-first searching
This shows how to anchor the search, so that even if the haystack contains a match somewhere, a match won’t be reported unless one can be found that starts at the beginning of the search:
use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.start_kind(StartKind::Anchored)
.build(patterns)
.unwrap();
let input = Input::new(haystack).anchored(Anchored::Yes);
assert_eq!(None, ac.try_find(input)?);
If the beginning of the search is changed to where a match begins, then it will be found:
use aho_corasick::{AhoCorasick, Anchored, Input, MatchKind, StartKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.start_kind(StartKind::Anchored)
.build(patterns)
.unwrap();
let input = Input::new(haystack).range(4..).anchored(Anchored::Yes);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("abc", &haystack[mat.span()]);
§Example: earliest leftmost-first searching
This shows how to run an “earliest” search even when the Aho-Corasick searcher was compiled with leftmost-first match semantics. In this case, the search is stopped as soon as it is known that a match has occurred, even if it doesn’t correspond to the leftmost-first match.
use aho_corasick::{AhoCorasick, Input, MatchKind};
let patterns = &["b", "abc", "abcd"];
let haystack = "foo abcd";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let input = Input::new(haystack).earliest(true);
let mat = ac.try_find(input)?.expect("should have a match");
assert_eq!("b", &haystack[mat.span()]);
sourcepub fn try_find_overlapping<'h, I: Into<Input<'h>>>(
&self,
input: I,
state: &mut OverlappingState,
) -> Result<(), MatchError>
pub fn try_find_overlapping<'h, I: Into<Input<'h>>>( &self, input: I, state: &mut OverlappingState, ) -> Result<(), MatchError>
Returns the location of the first overlapping match in the given input with respect to the current state of the underlying searcher.
Overlapping searches do not report matches in their return value.
Instead, matches can be accessed via OverlappingState::get_match
after a search call.
This is the fallible version of AhoCorasick::find_overlapping
.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the given Input
configuration or if overlapping search is not
supported.
One example is that only Aho-Corasicker searchers built with
MatchKind::Standard
semantics support overlapping searches. Using
any other match semantics will result in this returning an error.
§Example: basic usage
This shows how we can repeatedly call an overlapping search without ever needing to explicitly re-slice the haystack. Overlapping search works this way because searches depend on state saved during the previous search.
use aho_corasick::{
automaton::OverlappingState,
AhoCorasick, Input, Match,
};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns).unwrap();
let mut state = OverlappingState::start();
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(2, 0..3)), state.get_match());
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(0, 0..6)), state.get_match());
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(2, 11..14)), state.get_match());
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(2, 22..25)), state.get_match());
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(0, 22..28)), state.get_match());
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(Some(Match::must(1, 22..31)), state.get_match());
// No more match matches to be found.
ac.try_find_overlapping(haystack, &mut state)?;
assert_eq!(None, state.get_match());
§Example: implementing your own overlapping iteration
The previous example can be easily adapted to implement your own
iteration by repeatedly calling try_find_overlapping
until either
an error occurs or no more matches are reported.
This is effectively equivalent to the iterator returned by
AhoCorasick::try_find_overlapping_iter
, with the only difference
being that the iterator checks for errors before construction and
absolves the caller of needing to check for errors on every search
call. (Indeed, if the first try_find_overlapping
call succeeds and
the same Input
is given to subsequent calls, then all subsequent
calls are guaranteed to succeed.)
use aho_corasick::{
automaton::OverlappingState,
AhoCorasick, Input, Match,
};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns).unwrap();
let mut state = OverlappingState::start();
let mut matches = vec![];
loop {
ac.try_find_overlapping(haystack, &mut state)?;
let mat = match state.get_match() {
None => break,
Some(mat) => mat,
};
matches.push(mat);
}
let expected = vec![
Match::must(2, 0..3),
Match::must(0, 0..6),
Match::must(2, 11..14),
Match::must(2, 22..25),
Match::must(0, 22..28),
Match::must(1, 22..31),
];
assert_eq!(expected, matches);
§Example: anchored iteration
The previous example can also be adapted to implement
iteration over all anchored matches. In particular,
AhoCorasick::try_find_overlapping_iter
does not support this
because it isn’t totally clear what the match semantics ought to be.
In this example, we will find all overlapping matches that start at the beginning of our search.
use aho_corasick::{
automaton::OverlappingState,
AhoCorasick, Anchored, Input, Match, StartKind,
};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.start_kind(StartKind::Anchored)
.build(patterns)
.unwrap();
let input = Input::new(haystack).anchored(Anchored::Yes);
let mut state = OverlappingState::start();
let mut matches = vec![];
loop {
ac.try_find_overlapping(input.clone(), &mut state)?;
let mat = match state.get_match() {
None => break,
Some(mat) => mat,
};
matches.push(mat);
}
let expected = vec![
Match::must(2, 0..3),
Match::must(0, 0..6),
];
assert_eq!(expected, matches);
sourcepub fn try_find_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> Result<FindIter<'a, 'h>, MatchError>
pub fn try_find_iter<'a, 'h, I: Into<Input<'h>>>( &'a self, input: I, ) -> Result<FindIter<'a, 'h>, MatchError>
Returns an iterator of non-overlapping matches, using the match semantics that this automaton was constructed with.
This is the fallible version of AhoCorasick::find_iter
.
Note that the error returned by this method occurs during construction
of the iterator. The iterator itself yields Match
values. That is,
once the iterator is constructed, the iteration itself will never
report an error.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the given Input
configuration.
For example, if the Aho-Corasick searcher only supports anchored
searches or only supports unanchored searches, then providing an
Input
that requests an anchored (or unanchored) search when it isn’t
supported would result in an error.
§Example: leftmost-first searching
Basic usage with leftmost-first semantics:
use aho_corasick::{AhoCorasick, Input, MatchKind, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let matches: Vec<PatternID> = ac
.try_find_iter(Input::new(haystack))?
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![
PatternID::must(0),
PatternID::must(2),
PatternID::must(0),
], matches);
§Example: anchored leftmost-first searching
This shows how to anchor the search, such that all matches must begin at the starting location of the search. For an iterator, an anchored search implies that all matches are adjacent.
use aho_corasick::{
AhoCorasick, Anchored, Input, MatchKind, PatternID, StartKind,
};
let patterns = &["foo", "bar", "quux"];
let haystack = "fooquuxbar foo";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.start_kind(StartKind::Anchored)
.build(patterns)
.unwrap();
let matches: Vec<PatternID> = ac
.try_find_iter(Input::new(haystack).anchored(Anchored::Yes))?
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![
PatternID::must(0),
PatternID::must(2),
PatternID::must(1),
// The final 'foo' is not found because it is not adjacent to the
// 'bar' match. It needs to be adjacent because our search is
// anchored.
], matches);
sourcepub fn try_find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>(
&'a self,
input: I,
) -> Result<FindOverlappingIter<'a, 'h>, MatchError>
pub fn try_find_overlapping_iter<'a, 'h, I: Into<Input<'h>>>( &'a self, input: I, ) -> Result<FindOverlappingIter<'a, 'h>, MatchError>
Returns an iterator of overlapping matches.
This is the fallible version of AhoCorasick::find_overlapping_iter
.
Note that the error returned by this method occurs during construction
of the iterator. The iterator itself yields Match
values. That is,
once the iterator is constructed, the iteration itself will never
report an error.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the given Input
configuration or does not support overlapping
searches.
One example is that only Aho-Corasicker searchers built with
MatchKind::Standard
semantics support overlapping searches. Using
any other match semantics will result in this returning an error.
§Example: basic usage
use aho_corasick::{AhoCorasick, Input, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns).unwrap();
let matches: Vec<PatternID> = ac
.try_find_overlapping_iter(Input::new(haystack))?
.map(|mat| mat.pattern())
.collect();
assert_eq!(vec![
PatternID::must(2),
PatternID::must(0),
PatternID::must(2),
PatternID::must(2),
PatternID::must(0),
PatternID::must(1),
], matches);
§Example: anchored overlapping search returns an error
It isn’t clear what the match semantics for anchored overlapping
iterators ought to be, so currently an error is returned. Callers
may use AhoCorasick::try_find_overlapping
to implement their own
semantics if desired.
use aho_corasick::{AhoCorasick, Anchored, Input, StartKind};
let patterns = &["append", "appendage", "app"];
let haystack = "appendappendage app";
let ac = AhoCorasick::builder()
.start_kind(StartKind::Anchored)
.build(patterns)
.unwrap();
let input = Input::new(haystack).anchored(Anchored::Yes);
assert!(ac.try_find_overlapping_iter(input).is_err());
sourcepub fn try_replace_all<B>(
&self,
haystack: &str,
replace_with: &[B],
) -> Result<String, MatchError>
pub fn try_replace_all<B>( &self, haystack: &str, replace_with: &[B], ) -> Result<String, MatchError>
Replace all matches with a corresponding value in the replace_with
slice given. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter
.
Replacements are determined by the index of the matching pattern.
For example, if the pattern with index 2
is found, then it is
replaced by replace_with[2]
.
§Panics
This panics when replace_with.len()
does not equal
AhoCorasick::patterns_len
.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the default Input
configuration. More specifically, this occurs only
when the Aho-Corasick searcher does not support unanchored searches
since this replacement routine always does an unanchored search.
§Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let result = ac.try_replace_all(haystack, &["x", "y", "z"])?;
assert_eq!("x the z to the xage", result);
sourcepub fn try_replace_all_bytes<B>(
&self,
haystack: &[u8],
replace_with: &[B],
) -> Result<Vec<u8>, MatchError>
pub fn try_replace_all_bytes<B>( &self, haystack: &[u8], replace_with: &[B], ) -> Result<Vec<u8>, MatchError>
Replace all matches using raw bytes with a corresponding value in the
replace_with
slice given. Matches correspond to the same matches as
reported by AhoCorasick::try_find_iter
.
Replacements are determined by the index of the matching pattern.
For example, if the pattern with index 2
is found, then it is
replaced by replace_with[2]
.
This is the fallible version of AhoCorasick::replace_all_bytes
.
§Panics
This panics when replace_with.len()
does not equal
AhoCorasick::patterns_len
.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the default Input
configuration. More specifically, this occurs only
when the Aho-Corasick searcher does not support unanchored searches
since this replacement routine always does an unanchored search.
§Example: basic usage
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let result = ac.try_replace_all_bytes(haystack, &["x", "y", "z"])?;
assert_eq!(b"x the z to the xage".to_vec(), result);
sourcepub fn try_replace_all_with<F>(
&self,
haystack: &str,
dst: &mut String,
replace_with: F,
) -> Result<(), MatchError>
pub fn try_replace_all_with<F>( &self, haystack: &str, dst: &mut String, replace_with: F, ) -> Result<(), MatchError>
Replace all matches using a closure called on each match.
Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter
.
The closure accepts three parameters: the match found, the text of
the match and a string buffer with which to write the replaced text
(if any). If the closure returns true
, then it continues to the next
match. If the closure returns false
, then searching is stopped.
Note that any matches with boundaries that don’t fall on a valid UTF-8 boundary are silently skipped.
This is the fallible version of AhoCorasick::replace_all_with
.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the default Input
configuration. More specifically, this occurs only
when the Aho-Corasick searcher does not support unanchored searches
since this replacement routine always does an unanchored search.
§Examples
Basic usage:
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let mut result = String::new();
ac.try_replace_all_with(haystack, &mut result, |mat, _, dst| {
dst.push_str(&mat.pattern().as_usize().to_string());
true
})?;
assert_eq!("0 the 2 to the 0age", result);
Stopping the replacement by returning false
(continued from the
example above):
let mut result = String::new();
ac.try_replace_all_with(haystack, &mut result, |mat, _, dst| {
dst.push_str(&mat.pattern().as_usize().to_string());
mat.pattern() != PatternID::must(2)
})?;
assert_eq!("0 the 2 to the appendage", result);
sourcepub fn try_replace_all_with_bytes<F>(
&self,
haystack: &[u8],
dst: &mut Vec<u8>,
replace_with: F,
) -> Result<(), MatchError>
pub fn try_replace_all_with_bytes<F>( &self, haystack: &[u8], dst: &mut Vec<u8>, replace_with: F, ) -> Result<(), MatchError>
Replace all matches using raw bytes with a closure called on each
match. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter
.
The closure accepts three parameters: the match found, the text of
the match and a byte buffer with which to write the replaced text
(if any). If the closure returns true
, then it continues to the next
match. If the closure returns false
, then searching is stopped.
This is the fallible version of
AhoCorasick::replace_all_with_bytes
.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the default Input
configuration. More specifically, this occurs only
when the Aho-Corasick searcher does not support unanchored searches
since this replacement routine always does an unanchored search.
§Examples
Basic usage:
use aho_corasick::{AhoCorasick, MatchKind};
let patterns = &["append", "appendage", "app"];
let haystack = b"append the app to the appendage";
let ac = AhoCorasick::builder()
.match_kind(MatchKind::LeftmostFirst)
.build(patterns)
.unwrap();
let mut result = vec![];
ac.try_replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
dst.extend(mat.pattern().as_usize().to_string().bytes());
true
})?;
assert_eq!(b"0 the 2 to the 0age".to_vec(), result);
Stopping the replacement by returning false
(continued from the
example above):
let mut result = vec![];
ac.try_replace_all_with_bytes(haystack, &mut result, |mat, _, dst| {
dst.extend(mat.pattern().as_usize().to_string().bytes());
mat.pattern() != PatternID::must(2)
})?;
assert_eq!(b"0 the 2 to the appendage".to_vec(), result);
sourcepub fn try_stream_find_iter<'a, R: Read>(
&'a self,
rdr: R,
) -> Result<StreamFindIter<'a, R>, MatchError>
pub fn try_stream_find_iter<'a, R: Read>( &'a self, rdr: R, ) -> Result<StreamFindIter<'a, R>, MatchError>
Returns an iterator of non-overlapping matches in the given
stream. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter
.
The matches yielded by this iterator use absolute position offsets in
the stream given, where the first byte has index 0
. Matches are
yieled until the stream is exhausted.
Each item yielded by the iterator is an Result<Match, std::io::Error>
, where an error is yielded if there was a problem
reading from the reader given.
When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible.
This is the fallible version of AhoCorasick::stream_find_iter
.
Note that both methods return iterators that produce Result
values.
The difference is that this routine returns an error if construction
of the iterator failed. The Result
values yield by the iterator
come from whether the given reader returns an error or not during the
search.
§Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the default Input
configuration. More specifically, this occurs only
when the Aho-Corasick searcher does not support unanchored searches
since this stream searching routine always does an unanchored search.
This also returns an error if the searcher does not support stream
searches. Only searchers built with MatchKind::Standard
semantics
support stream searches.
§Example: basic usage
use aho_corasick::{AhoCorasick, PatternID};
let patterns = &["append", "appendage", "app"];
let haystack = "append the app to the appendage";
let ac = AhoCorasick::new(patterns).unwrap();
let mut matches = vec![];
for result in ac.try_stream_find_iter(haystack.as_bytes())? {
let mat = result?;
matches.push(mat.pattern());
}
assert_eq!(vec![
PatternID::must(2),
PatternID::must(2),
PatternID::must(2),
], matches);
sourcepub fn try_stream_replace_all<R, W, B>(
&self,
rdr: R,
wtr: W,
replace_with: &[B],
) -> Result<(), Error>
pub fn try_stream_replace_all<R, W, B>( &self, rdr: R, wtr: W, replace_with: &[B], ) -> Result<(), Error>
Search for and replace all matches of this automaton in
the given reader, and write the replacements to the given
writer. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter
.
Replacements are determined by the index of the matching pattern. For
example, if the pattern with index 2
is found, then it is replaced by
replace_with[2]
.
After all matches are replaced, the writer is not flushed.
If there was a problem reading from the given reader or writing to the
given writer, then the corresponding io::Error
is returned and all
replacement is stopped.
When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.
Note that there is currently no infallible version of this routine.
§Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
§Panics
This panics when replace_with.len()
does not equal
AhoCorasick::patterns_len
.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the default Input
configuration. More specifically, this occurs only
when the Aho-Corasick searcher does not support unanchored searches
since this stream searching routine always does an unanchored search.
This also returns an error if the searcher does not support stream
searches. Only searchers built with MatchKind::Standard
semantics
support stream searches.
§Example: basic usage
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let replace_with = &["sloth", "grey", "slow"];
let ac = AhoCorasick::new(patterns).unwrap();
let mut result = vec![];
ac.try_stream_replace_all(
haystack.as_bytes(),
&mut result,
replace_with,
)?;
assert_eq!(b"The slow grey sloth.".to_vec(), result);
sourcepub fn try_stream_replace_all_with<R, W, F>(
&self,
rdr: R,
wtr: W,
replace_with: F,
) -> Result<(), Error>
pub fn try_stream_replace_all_with<R, W, F>( &self, rdr: R, wtr: W, replace_with: F, ) -> Result<(), Error>
Search the given reader and replace all matches of this automaton
using the given closure. The result is written to the given
writer. Matches correspond to the same matches as reported by
AhoCorasick::try_find_iter
.
The closure accepts three parameters: the match found, the text of the match and the writer with which to write the replaced text (if any).
After all matches are replaced, the writer is not flushed.
If there was a problem reading from the given reader or writing to the
given writer, then the corresponding io::Error
is returned and all
replacement is stopped.
When searching a stream, an internal buffer is used. Therefore, callers should avoiding providing a buffered reader, if possible. However, callers may want to provide a buffered writer.
Note that there is currently no infallible version of this routine.
§Memory usage
In general, searching streams will use a constant amount of memory for its internal buffer. The one requirement is that the internal buffer must be at least the size of the longest possible match. In most use cases, the default buffer size will be much larger than any individual match.
§Errors
This returns an error when this Aho-Corasick searcher does not support
the default Input
configuration. More specifically, this occurs only
when the Aho-Corasick searcher does not support unanchored searches
since this stream searching routine always does an unanchored search.
This also returns an error if the searcher does not support stream
searches. Only searchers built with MatchKind::Standard
semantics
support stream searches.
§Example: basic usage
use std::io::Write;
use aho_corasick::AhoCorasick;
let patterns = &["fox", "brown", "quick"];
let haystack = "The quick brown fox.";
let ac = AhoCorasick::new(patterns).unwrap();
let mut result = vec![];
ac.try_stream_replace_all_with(
haystack.as_bytes(),
&mut result,
|mat, _, wtr| {
wtr.write_all(mat.pattern().as_usize().to_string().as_bytes())
},
)?;
assert_eq!(b"The 2 1 0.".to_vec(), result);
source§impl AhoCorasick
impl AhoCorasick
Routines for querying information about the Aho-Corasick automaton.
sourcepub fn kind(&self) -> AhoCorasickKind
pub fn kind(&self) -> AhoCorasickKind
Returns the kind of the Aho-Corasick automaton used by this searcher.
Knowing the Aho-Corasick kind is principally useful for diagnostic
purposes. In particular, if no specific kind was given to
AhoCorasickBuilder::kind
, then one is automatically chosen and
this routine will report which one.
Note that the heuristics used for choosing which AhoCorasickKind
may be changed in a semver compatible release.
§Examples
use aho_corasick::{AhoCorasick, AhoCorasickKind};
let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
// The specific Aho-Corasick kind chosen is not guaranteed!
assert_eq!(AhoCorasickKind::DFA, ac.kind());
sourcepub fn start_kind(&self) -> StartKind
pub fn start_kind(&self) -> StartKind
Returns the type of starting search configuration supported by this Aho-Corasick automaton.
§Examples
use aho_corasick::{AhoCorasick, StartKind};
let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(StartKind::Unanchored, ac.start_kind());
sourcepub fn match_kind(&self) -> MatchKind
pub fn match_kind(&self) -> MatchKind
Returns the match kind used by this automaton.
The match kind is important because it determines what kinds of
matches are returned. Also, some operations (such as overlapping
search and stream searching) are only supported when using the
MatchKind::Standard
match kind.
§Examples
use aho_corasick::{AhoCorasick, MatchKind};
let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(MatchKind::Standard, ac.match_kind());
sourcepub fn min_pattern_len(&self) -> usize
pub fn min_pattern_len(&self) -> usize
Returns the length of the shortest pattern matched by this automaton.
§Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(3, ac.min_pattern_len());
Note that an AhoCorasick
automaton has a minimum length of 0
if
and only if it can match the empty string:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&["foo", "", "quux", "baz"]).unwrap();
assert_eq!(0, ac.min_pattern_len());
sourcepub fn max_pattern_len(&self) -> usize
pub fn max_pattern_len(&self) -> usize
Returns the length of the longest pattern matched by this automaton.
§Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&["foo", "bar", "quux", "baz"]).unwrap();
assert_eq!(4, ac.max_pattern_len());
sourcepub fn patterns_len(&self) -> usize
pub fn patterns_len(&self) -> usize
Return the total number of patterns matched by this automaton.
This includes patterns that may never participate in a match. For
example, if MatchKind::LeftmostFirst
match semantics are used, and
the patterns Sam
and Samwise
were used to build the automaton (in
that order), then Samwise
can never participate in a match because
Sam
will always take priority.
§Examples
Basic usage:
use aho_corasick::AhoCorasick;
let ac = AhoCorasick::new(&["foo", "bar", "baz"]).unwrap();
assert_eq!(3, ac.patterns_len());
sourcepub fn memory_usage(&self) -> usize
pub fn memory_usage(&self) -> usize
Returns the approximate total amount of heap used by this automaton, in units of bytes.
§Examples
This example shows the difference in heap usage between a few configurations:
use aho_corasick::{AhoCorasick, AhoCorasickKind, MatchKind};
let ac = AhoCorasick::builder()
.kind(None) // default
.build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
.unwrap();
assert_eq!(5_632, ac.memory_usage());
let ac = AhoCorasick::builder()
.kind(None) // default
.ascii_case_insensitive(true)
.build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
.unwrap();
assert_eq!(11_136, ac.memory_usage());
let ac = AhoCorasick::builder()
.kind(Some(AhoCorasickKind::NoncontiguousNFA))
.ascii_case_insensitive(true)
.build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
.unwrap();
assert_eq!(10_879, ac.memory_usage());
let ac = AhoCorasick::builder()
.kind(Some(AhoCorasickKind::ContiguousNFA))
.ascii_case_insensitive(true)
.build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
.unwrap();
assert_eq!(2_584, ac.memory_usage());
let ac = AhoCorasick::builder()
.kind(Some(AhoCorasickKind::DFA))
.ascii_case_insensitive(true)
.build(&["foobar", "bruce", "triskaidekaphobia", "springsteen"])
.unwrap();
// While this shows the DFA being the biggest here by a small margin,
// don't let the difference fool you. With such a small number of
// patterns, the difference is small, but a bigger number of patterns
// will reveal that the rate of growth of the DFA is far bigger than
// the NFAs above. For a large number of patterns, it is easy for the
// DFA to take an order of magnitude more heap space (or more!).
assert_eq!(11_136, ac.memory_usage());
Trait Implementations§
source§impl Clone for AhoCorasick
impl Clone for AhoCorasick
source§fn clone(&self) -> AhoCorasick
fn clone(&self) -> AhoCorasick
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read more