unicode_segmentation::grapheme

Struct GraphemeCursor

source
pub struct GraphemeCursor {
    offset: usize,
    len: usize,
    is_extended: bool,
    state: GraphemeState,
    cat_before: Option<GraphemeCat>,
    cat_after: Option<GraphemeCat>,
    pre_context_offset: Option<usize>,
    incb_linker_count: Option<usize>,
    ris_count: Option<usize>,
    resuming: bool,
    grapheme_cat_cache: (u32, u32, GraphemeCat),
}
Expand description

Cursor-based segmenter for grapheme clusters.

This allows working with ropes and other datastructures where the string is not contiguous or fully known at initialization time.

Fields§

§offset: usize

Current cursor position.

§len: usize

Total length of the string.

§is_extended: bool

A config flag indicating whether this cursor computes legacy or extended grapheme cluster boundaries (enables GB9a and GB9b if set).

§state: GraphemeState

Information about the potential boundary at offset

§cat_before: Option<GraphemeCat>

Category of codepoint immediately preceding cursor, if known.

§cat_after: Option<GraphemeCat>

Category of codepoint immediately after cursor, if known.

§pre_context_offset: Option<usize>

If set, at least one more codepoint immediately preceding this offset is needed to resolve whether there’s a boundary at offset.

§incb_linker_count: Option<usize>

The number of InCB=Linker codepoints preceding offset (potentially intermingled with InCB=Extend).

§ris_count: Option<usize>

The number of RIS codepoints preceding offset. If pre_context_offset is set, then counts the number of RIS between that and offset, otherwise is an accurate count relative to the string.

§resuming: bool

Set if a call to prev_boundary or next_boundary was suspended due to needing more input.

§grapheme_cat_cache: (u32, u32, GraphemeCat)

Cached grapheme category and associated scalar value range.

Implementations§

source§

impl GraphemeCursor

source

pub fn new(offset: usize, len: usize, is_extended: bool) -> GraphemeCursor

Create a new cursor. The string and initial offset are given at creation time, but the contents of the string are not. The is_extended parameter controls whether extended grapheme clusters are selected.

The offset parameter must be on a codepoint boundary.

let s = "हिन्दी";
let mut legacy = GraphemeCursor::new(0, s.len(), false);
assert_eq!(legacy.next_boundary(s, 0), Ok(Some("ह".len())));
let mut extended = GraphemeCursor::new(0, s.len(), true);
assert_eq!(extended.next_boundary(s, 0), Ok(Some("हि".len())));
source

fn grapheme_category(&mut self, ch: char) -> GraphemeCat

source

pub fn set_cursor(&mut self, offset: usize)

Set the cursor to a new location in the same string.

let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.cur_cursor(), 0);
cursor.set_cursor(2);
assert_eq!(cursor.cur_cursor(), 2);
source

pub fn cur_cursor(&self) -> usize

The current offset of the cursor. Equal to the last value provided to new() or set_cursor(), or returned from next_boundary() or prev_boundary().

// Two flags (🇷🇸🇮🇴), each flag is two RIS codepoints, each RIS is 4 bytes.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.cur_cursor(), 4);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.cur_cursor(), 8);
source

pub fn provide_context(&mut self, chunk: &str, chunk_start: usize)

Provide additional pre-context when it is needed to decide a boundary. The end of the chunk must coincide with the value given in the GraphemeIncomplete::PreContext request.

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
// Not enough pre-context to decide if there's a boundary between the two flags.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(8)));
// Provide one more Regional Indicator Symbol of pre-context
cursor.provide_context(&flags[4..8], 4);
// Still not enough context to decide.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(4)));
// Provide additional requested context.
cursor.provide_context(&flags[0..4], 0);
// That's enough to decide (it always is when context goes to the start of the string)
assert_eq!(cursor.is_boundary(&flags[8..], 8), Ok(true));
source

fn decide(&mut self, is_break: bool)

source

fn decision(&mut self, is_break: bool) -> Result<bool, GraphemeIncomplete>

source

fn is_boundary_result(&self) -> Result<bool, GraphemeIncomplete>

source

fn handle_incb_consonant(&mut self, chunk: &str, chunk_start: usize)

For handling rule GB9c:

There’s an InCB=Consonant after this, and we need to look back to verify whether there should be a break.

Seek backward to find an InCB=Linker preceded by an InCB=Consonsnt (potentially separated by some number of InCB=Linker or InCB=Extend). If we find the consonant in question, then there’s no break; if we find a consonant with no linker, or a non-linker non-extend non-consonant, or the start of text, there’s a break; otherwise we need more context

source

fn handle_regional(&mut self, chunk: &str, chunk_start: usize)

source

fn handle_emoji(&mut self, chunk: &str, chunk_start: usize)

source

pub fn is_boundary( &mut self, chunk: &str, chunk_start: usize, ) -> Result<bool, GraphemeIncomplete>

Determine whether the current cursor location is a grapheme cluster boundary. Only a part of the string need be supplied. If chunk_start is nonzero or the length of chunk is not equal to len on creation, then this method may return GraphemeIncomplete::PreContext. The caller should then call provide_context with the requested chunk, then retry calling this method.

For partial chunks, if the cursor is not at the beginning or end of the string, the chunk should contain at least the codepoint following the cursor. If the string is nonempty, the chunk must be nonempty.

All calls should have consistent chunk contents (ie, if a chunk provides content for a given slice, all further chunks covering that slice must have the same content for it).

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
assert_eq!(cursor.is_boundary(flags, 0), Ok(true));
cursor.set_cursor(12);
assert_eq!(cursor.is_boundary(flags, 0), Ok(false));
source

pub fn next_boundary( &mut self, chunk: &str, chunk_start: usize, ) -> Result<Option<usize>, GraphemeIncomplete>

Find the next boundary after the current cursor position. Only a part of the string need be supplied. If the chunk is incomplete, then this method might return GraphemeIncomplete::PreContext or GraphemeIncomplete::NextChunk. In the former case, the caller should call provide_context with the requested chunk, then retry. In the latter case, the caller should provide the chunk following the one given, then retry.

See is_boundary for expectations on the provided chunk.

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(16)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(None));

And an example that uses partial strings:

let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.next_boundary(&s[..2], 0), Ok(Some(1)));
assert_eq!(cursor.next_boundary(&s[..2], 0), Err(GraphemeIncomplete::NextChunk));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(2)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(4)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(None));
source

pub fn prev_boundary( &mut self, chunk: &str, chunk_start: usize, ) -> Result<Option<usize>, GraphemeIncomplete>

Find the previous boundary after the current cursor position. Only a part of the string need be supplied. If the chunk is incomplete, then this method might return GraphemeIncomplete::PreContext or GraphemeIncomplete::PrevChunk. In the former case, the caller should call provide_context with the requested chunk, then retry. In the latter case, the caller should provide the chunk preceding the one given, then retry.

See is_boundary for expectations on the provided chunk.

let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(12, flags.len(), false);
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(None));

And an example that uses partial strings (note the exact return is not guaranteed, and may be PrevChunk or PreContext arbitrarily):

let s = "abcd";
let mut cursor = GraphemeCursor::new(4, s.len(), false);
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Err(GraphemeIncomplete::PrevChunk));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(2)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(1)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(None));

Trait Implementations§

source§

impl Clone for GraphemeCursor

source§

fn clone(&self) -> GraphemeCursor

Returns a copy of the value. Read more
1.0.0 · source§

fn clone_from(&mut self, source: &Self)

Performs copy-assignment from source. Read more
source§

impl Debug for GraphemeCursor

source§

fn fmt(&self, f: &mut Formatter<'_>) -> Result

Formats the value using the given formatter. Read more

Auto Trait Implementations§

Blanket Implementations§

source§

impl<T> Any for T
where T: 'static + ?Sized,

source§

fn type_id(&self) -> TypeId

Gets the TypeId of self. Read more
source§

impl<T> Borrow<T> for T
where T: ?Sized,

source§

fn borrow(&self) -> &T

Immutably borrows from an owned value. Read more
source§

impl<T> BorrowMut<T> for T
where T: ?Sized,

source§

fn borrow_mut(&mut self) -> &mut T

Mutably borrows from an owned value. Read more
source§

impl<T> CloneToUninit for T
where T: Clone,

source§

unsafe fn clone_to_uninit(&self, dst: *mut T)

🔬This is a nightly-only experimental API. (clone_to_uninit)
Performs copy-assignment from self to dst. Read more
source§

impl<T> From<T> for T

source§

fn from(t: T) -> T

Returns the argument unchanged.

source§

impl<T, U> Into<U> for T
where U: From<T>,

source§

fn into(self) -> U

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

source§

impl<T, U> TryFrom<U> for T
where U: Into<T>,

source§

type Error = Infallible

The type returned in the event of a conversion error.
source§

fn try_from(value: U) -> Result<T, <T as TryFrom<U>>::Error>

Performs the conversion.
source§

impl<T, U> TryInto<U> for T
where U: TryFrom<T>,

source§

type Error = <U as TryFrom<T>>::Error

The type returned in the event of a conversion error.
source§

fn try_into(self) -> Result<U, <U as TryFrom<T>>::Error>

Performs the conversion.