Struct Set

Source
pub struct Set<D>(Fst<D>);
Expand description

Set is a lexicographically ordered set of byte strings.

A Set is constructed with the SetBuilder type. Alternatively, a Set can be constructed in memory from a lexicographically ordered iterator of byte strings (Set::from_iter).

A key feature of Set is that it can be serialized to disk compactly. Its underlying representation is built such that the Set can be memory mapped and searched without necessarily loading the entire set into memory.

It supports most common operations associated with sets, such as membership, union, intersection, subset/superset, etc. It also supports range queries and automata based searches (e.g. a regular expression).

Sets are represented by a finite state transducer where output values are always zero. As such, sets have the following invariants:

  1. Once constructed, a Set can never be modified.
  2. Sets must be constructed with lexicographically ordered byte sequences.

Tuple Fields§

§0: Fst<D>

Implementations§

Source§

impl Set<Vec<u8>>

Source

pub fn from_iter<T, I>(iter: I) -> Result<Set<Vec<u8>>>
where T: AsRef<[u8]>, I: IntoIterator<Item = T>,

Create a Set from an iterator of lexicographically ordered byte strings.

If the iterator does not yield values in lexicographic order, then an error is returned.

Note that this is a convenience function to build a set in memory. To build a set that streams to an arbitrary io::Write, use SetBuilder.

Source§

impl<D: AsRef<[u8]>> Set<D>

Source

pub fn new(data: D) -> Result<Set<D>>

Creates a set from its representation as a raw byte sequence.

This accepts anything that can be cheaply converted to a &[u8]. The caller is responsible for guaranteeing that the given bytes refer to a valid FST. While memory safety will not be violated by invalid input, a panic could occur while reading the FST at any point.

§Example
use fst::Set;

// File written from a build script using SetBuilder.
static FST: &[u8] = include_bytes!(concat!(env!("OUT_DIR"), "/set.fst"));

let set = Set::new(FST).unwrap();
Source

pub fn contains<K: AsRef<[u8]>>(&self, key: K) -> bool

Tests the membership of a single key.

§Example
use fst::Set;

let set = Set::from_iter(&["a", "b", "c"]).unwrap();

assert_eq!(set.contains("b"), true);
assert_eq!(set.contains("z"), false);
Source

pub fn stream(&self) -> Stream<'_>

Return a lexicographically ordered stream of all keys in this set.

While this is a stream, it does require heap space proportional to the longest key in the set.

If the set is memory mapped, then no further heap space is needed. Note though that your operating system may fill your page cache (which will cause the resident memory usage of the process to go up correspondingly).

§Example

Since streams are not iterators, the traditional for loop cannot be used. while let is useful instead:

use fst::{IntoStreamer, Streamer, Set};

let set = Set::from_iter(&["a", "b", "c"]).unwrap();
let mut stream = set.stream();

let mut keys = vec![];
while let Some(key) = stream.next() {
    keys.push(key.to_vec());
}
assert_eq!(keys, vec![b"a", b"b", b"c"]);
Source

pub fn range(&self) -> StreamBuilder<'_>

Return a builder for range queries.

A range query returns a subset of keys in this set in a range given in lexicographic order.

Memory requirements are the same as described on Set::stream. Notably, only the keys in the range are read; keys outside the range are not.

§Example

Returns only the keys in the range given.

use fst::{IntoStreamer, Streamer, Set};

let set = Set::from_iter(&["a", "b", "c", "d", "e"]).unwrap();
let mut stream = set.range().ge("b").lt("e").into_stream();

let mut keys = vec![];
while let Some(key) = stream.next() {
    keys.push(key.to_vec());
}
assert_eq!(keys, vec![b"b", b"c", b"d"]);
Source

pub fn search<A: Automaton>(&self, aut: A) -> StreamBuilder<'_, A>

Executes an automaton on the keys of this set.

Note that this returns a StreamBuilder, which can be used to add a range query to the search (see the range method).

Memory requirements are the same as described on Set::stream.

§Example

An implementation of subsequence search for Automaton can be used to search sets:

use fst::automaton::Subsequence;
use fst::{IntoStreamer, Streamer, Set};

fn example() -> Result<(), Box<dyn std::error::Error>> {
    let set = Set::from_iter(&[
        "a foo bar", "foo", "foo1", "foo2", "foo3", "foobar",
    ]).unwrap();

    let matcher = Subsequence::new("for");
    let mut stream = set.search(&matcher).into_stream();

    let mut keys = vec![];
    while let Some(key) = stream.next() {
        keys.push(String::from_utf8(key.to_vec())?);
    }
    assert_eq!(keys, vec![
        "a foo bar", "foobar",
    ]);

    Ok(())
}
Source

pub fn search_with_state<A: Automaton>( &self, aut: A, ) -> StreamWithStateBuilder<'_, A>

Executes an automaton on the values of this set and yields matching values along with the corresponding matching states in the given automaton.

Note that this returns a StreamWithStateBuilder, which can be used to add a range query to the search (see the range method).

Memory requirements are the same as described on Map::stream.

Source

pub fn len(&self) -> usize

Returns the number of elements in this set.

Source

pub fn is_empty(&self) -> bool

Returns true if and only if this set is empty.

Source

pub fn op(&self) -> OpBuilder<'_>

Creates a new set operation with this set added to it.

The OpBuilder type can be used to add additional set streams and perform set operations like union, intersection, difference and symmetric difference.

§Example
use fst::{IntoStreamer, Streamer, Set};

let set1 = Set::from_iter(&["a", "b", "c"]).unwrap();
let set2 = Set::from_iter(&["a", "y", "z"]).unwrap();

let mut union = set1.op().add(&set2).union();

let mut keys = vec![];
while let Some(key) = union.next() {
    keys.push(key.to_vec());
}
assert_eq!(keys, vec![b"a", b"b", b"c", b"y", b"z"]);
Source

pub fn is_disjoint<'f, I, S>(&self, stream: I) -> bool
where I: for<'a> IntoStreamer<'a, Into = S, Item = &'a [u8]>, S: 'f + for<'a> Streamer<'a, Item = &'a [u8]>,

Returns true if and only if the self set is disjoint with the set stream.

stream must be a lexicographically ordered sequence of byte strings.

§Example
use fst::{IntoStreamer, Streamer, Set};

let set1 = Set::from_iter(&["a", "b", "c"]).unwrap();
let set2 = Set::from_iter(&["x", "y", "z"]).unwrap();

assert_eq!(set1.is_disjoint(&set2), true);

let set3 = Set::from_iter(&["a", "c"]).unwrap();

assert_eq!(set1.is_disjoint(&set3), false);
Source

pub fn is_subset<'f, I, S>(&self, stream: I) -> bool
where I: for<'a> IntoStreamer<'a, Into = S, Item = &'a [u8]>, S: 'f + for<'a> Streamer<'a, Item = &'a [u8]>,

Returns true if and only if the self set is a subset of stream.

stream must be a lexicographically ordered sequence of byte strings.

§Example
use fst::Set;

let set1 = Set::from_iter(&["a", "b", "c"]).unwrap();
let set2 = Set::from_iter(&["x", "y", "z"]).unwrap();

assert_eq!(set1.is_subset(&set2), false);

let set3 = Set::from_iter(&["a", "c"]).unwrap();

assert_eq!(set1.is_subset(&set3), false);
assert_eq!(set3.is_subset(&set1), true);
Source

pub fn is_superset<'f, I, S>(&self, stream: I) -> bool
where I: for<'a> IntoStreamer<'a, Into = S, Item = &'a [u8]>, S: 'f + for<'a> Streamer<'a, Item = &'a [u8]>,

Returns true if and only if the self set is a superset of stream.

stream must be a lexicographically ordered sequence of byte strings.

§Example
use fst::Set;

let set1 = Set::from_iter(&["a", "b", "c"]).unwrap();
let set2 = Set::from_iter(&["x", "y", "z"]).unwrap();

assert_eq!(set1.is_superset(&set2), false);

let set3 = Set::from_iter(&["a", "c"]).unwrap();

assert_eq!(set1.is_superset(&set3), true);
assert_eq!(set3.is_superset(&set1), false);
Source

pub fn as_fst(&self) -> &Fst<D>

Returns a reference to the underlying raw finite state transducer.

Source

pub fn into_fst(self) -> Fst<D>

Returns the underlying raw finite state transducer.

Source

pub fn map_data<F, T>(self, f: F) -> Result<Set<T>>
where F: FnMut(D) -> T, T: AsRef<[u8]>,

Maps the underlying data of the fst Set to another data type.

§Example

This example shows that you can map an fst Set based on a Vec<u8> into an fst Set based on a Cow<[u8]>, it can also work with a reference counted type (e.g. Arc, Rc).

use std::borrow::Cow;

use fst::Set;

let set: Set<Vec<u8>> = Set::from_iter(
    &["hello", "world"],
).unwrap();

let set_on_cow: Set<Cow<[u8]>> = set.map_data(Cow::Owned).unwrap();

Trait Implementations§

Source§

impl<D: AsRef<[u8]>> AsRef<Fst<D>> for Set<D>

Returns the underlying finite state transducer.

Source§

fn as_ref(&self) -> &Fst<D>

Converts this type into a shared reference of the (usually inferred) input type.
Source§

impl<D: Clone> Clone for Set<D>

Source§

fn clone(&self) -> Set<D>

Returns a duplicate of the value. Read more
1.0.0 · Source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
Source§

impl<D: AsRef<[u8]>> Debug for Set<D>

Source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more
Source§

impl Default for Set<Vec<u8>>

Source§

fn default() -> Set<Vec<u8>>

Returns the “default value” for a type. Read more
Source§

impl<D: AsRef<[u8]>> From<Fst<D>> for Set<D>

Source§

fn from(fst: Fst<D>) -> Set<D>

Converts to this type from the input type.
Source§

impl<'s, 'a, D: AsRef<[u8]>> IntoStreamer<'a> for &'s Set<D>

Source§

type Item = &'a [u8]

The type of the item emitted by the stream.
Source§

type Into = Stream<'s>

The type of the stream to be constructed.
Source§

fn into_stream(self) -> Stream<'s>

Construct a stream from Self.

Auto Trait Implementations§

§

impl<D> Freeze for Set<D>
where D: Freeze,

§

impl<D> RefUnwindSafe for Set<D>
where D: RefUnwindSafe,

§

impl<D> Send for Set<D>
where D: Send,

§

impl<D> Sync for Set<D>
where D: Sync,

§

impl<D> Unpin for Set<D>
where D: Unpin,

§

impl<D> UnwindSafe for Set<D>
where D: UnwindSafe,

Blanket Implementations§

Source§

impl<T> Any for T
where T: 'static + ?Sized,

Source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
Source§

impl<T> Borrow<T> for T
where T: ?Sized,

Source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
Source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

Source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
Source§

impl<T> CloneToUninit for T
where T: Clone,

Source§

unsafe fn clone_to_uninit(&self, dest: *mut u8)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dest. Read more
Source§

impl<T> From<T> for T

Source§

fn from(t: T) -> T

Returns the argument unchanged.

Source§

impl<T, U> Into<U> for T
where U: From<T>,

Source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

Source§

impl<T> ToOwned for T
where T: Clone,

Source§

type Owned = T

The resulting type after obtaining ownership.
Source§

fn to_owned(&self) -> T

Creates owned data from borrowed data, usually by cloning. Read more
Source§

fn clone_into(&self, target: &mut T)

Uses borrowed data to replace owned data, usually by cloning. Read more
Source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

Source§

type Error = Infallible

The type returned in the event of a conversion error.
Source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
Source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

Source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
Source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.