pub struct GraphemeCursor {
offset: usize,
len: usize,
is_extended: bool,
state: GraphemeState,
cat_before: Option<GraphemeCat>,
cat_after: Option<GraphemeCat>,
pre_context_offset: Option<usize>,
incb_linker_count: Option<usize>,
ris_count: Option<usize>,
resuming: bool,
grapheme_cat_cache: (u32, u32, GraphemeCat),
}
Expand description
Cursor-based segmenter for grapheme clusters.
This allows working with ropes and other datastructures where the string is not contiguous or fully known at initialization time.
Fields§
§offset: usize
Current cursor position.
len: usize
Total length of the string.
is_extended: bool
A config flag indicating whether this cursor computes legacy or extended grapheme cluster boundaries (enables GB9a and GB9b if set).
state: GraphemeState
Information about the potential boundary at offset
cat_before: Option<GraphemeCat>
Category of codepoint immediately preceding cursor, if known.
cat_after: Option<GraphemeCat>
Category of codepoint immediately after cursor, if known.
pre_context_offset: Option<usize>
If set, at least one more codepoint immediately preceding this offset
is needed to resolve whether there’s a boundary at offset
.
incb_linker_count: Option<usize>
The number of InCB=Linker
codepoints preceding offset
(potentially intermingled with InCB=Extend
).
ris_count: Option<usize>
The number of RIS codepoints preceding offset
. If pre_context_offset
is set, then counts the number of RIS between that and offset
, otherwise
is an accurate count relative to the string.
resuming: bool
Set if a call to prev_boundary
or next_boundary
was suspended due
to needing more input.
grapheme_cat_cache: (u32, u32, GraphemeCat)
Cached grapheme category and associated scalar value range.
Implementations§
source§impl GraphemeCursor
impl GraphemeCursor
sourcepub fn new(offset: usize, len: usize, is_extended: bool) -> GraphemeCursor
pub fn new(offset: usize, len: usize, is_extended: bool) -> GraphemeCursor
Create a new cursor. The string and initial offset are given at creation
time, but the contents of the string are not. The is_extended
parameter
controls whether extended grapheme clusters are selected.
The offset
parameter must be on a codepoint boundary.
let s = "हिन्दी";
let mut legacy = GraphemeCursor::new(0, s.len(), false);
assert_eq!(legacy.next_boundary(s, 0), Ok(Some("ह".len())));
let mut extended = GraphemeCursor::new(0, s.len(), true);
assert_eq!(extended.next_boundary(s, 0), Ok(Some("हि".len())));
fn grapheme_category(&mut self, ch: char) -> GraphemeCat
sourcepub fn set_cursor(&mut self, offset: usize)
pub fn set_cursor(&mut self, offset: usize)
Set the cursor to a new location in the same string.
let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.cur_cursor(), 0);
cursor.set_cursor(2);
assert_eq!(cursor.cur_cursor(), 2);
sourcepub fn cur_cursor(&self) -> usize
pub fn cur_cursor(&self) -> usize
The current offset of the cursor. Equal to the last value provided to
new()
or set_cursor()
, or returned from next_boundary()
or
prev_boundary()
.
// Two flags (🇷🇸🇮🇴), each flag is two RIS codepoints, each RIS is 4 bytes.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.cur_cursor(), 4);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.cur_cursor(), 8);
sourcepub fn provide_context(&mut self, chunk: &str, chunk_start: usize)
pub fn provide_context(&mut self, chunk: &str, chunk_start: usize)
Provide additional pre-context when it is needed to decide a boundary.
The end of the chunk must coincide with the value given in the
GraphemeIncomplete::PreContext
request.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
// Not enough pre-context to decide if there's a boundary between the two flags.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(8)));
// Provide one more Regional Indicator Symbol of pre-context
cursor.provide_context(&flags[4..8], 4);
// Still not enough context to decide.
assert_eq!(cursor.is_boundary(&flags[8..], 8), Err(GraphemeIncomplete::PreContext(4)));
// Provide additional requested context.
cursor.provide_context(&flags[0..4], 0);
// That's enough to decide (it always is when context goes to the start of the string)
assert_eq!(cursor.is_boundary(&flags[8..], 8), Ok(true));
fn decide(&mut self, is_break: bool)
fn decision(&mut self, is_break: bool) -> Result<bool, GraphemeIncomplete>
fn is_boundary_result(&self) -> Result<bool, GraphemeIncomplete>
sourcefn handle_incb_consonant(&mut self, chunk: &str, chunk_start: usize)
fn handle_incb_consonant(&mut self, chunk: &str, chunk_start: usize)
For handling rule GB9c:
There’s an InCB=Consonant
after this, and we need to look back
to verify whether there should be a break.
Seek backward to find an InCB=Linker
preceded by an InCB=Consonsnt
(potentially separated by some number of InCB=Linker
or InCB=Extend
).
If we find the consonant in question, then there’s no break; if we find a consonant
with no linker, or a non-linker non-extend non-consonant, or the start of text, there’s a break;
otherwise we need more context
fn handle_regional(&mut self, chunk: &str, chunk_start: usize)
fn handle_emoji(&mut self, chunk: &str, chunk_start: usize)
sourcepub fn is_boundary(
&mut self,
chunk: &str,
chunk_start: usize,
) -> Result<bool, GraphemeIncomplete>
pub fn is_boundary( &mut self, chunk: &str, chunk_start: usize, ) -> Result<bool, GraphemeIncomplete>
Determine whether the current cursor location is a grapheme cluster boundary.
Only a part of the string need be supplied. If chunk_start
is nonzero or
the length of chunk
is not equal to len
on creation, then this method
may return GraphemeIncomplete::PreContext
. The caller should then
call provide_context
with the requested chunk, then retry calling this
method.
For partial chunks, if the cursor is not at the beginning or end of the string, the chunk should contain at least the codepoint following the cursor. If the string is nonempty, the chunk must be nonempty.
All calls should have consistent chunk contents (ie, if a chunk provides content for a given slice, all further chunks covering that slice must have the same content for it).
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(8, flags.len(), false);
assert_eq!(cursor.is_boundary(flags, 0), Ok(true));
cursor.set_cursor(12);
assert_eq!(cursor.is_boundary(flags, 0), Ok(false));
sourcepub fn next_boundary(
&mut self,
chunk: &str,
chunk_start: usize,
) -> Result<Option<usize>, GraphemeIncomplete>
pub fn next_boundary( &mut self, chunk: &str, chunk_start: usize, ) -> Result<Option<usize>, GraphemeIncomplete>
Find the next boundary after the current cursor position. Only a part of
the string need be supplied. If the chunk is incomplete, then this
method might return GraphemeIncomplete::PreContext
or
GraphemeIncomplete::NextChunk
. In the former case, the caller should
call provide_context
with the requested chunk, then retry. In the
latter case, the caller should provide the chunk following the one
given, then retry.
See is_boundary
for expectations on the provided chunk.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(4, flags.len(), false);
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(Some(16)));
assert_eq!(cursor.next_boundary(flags, 0), Ok(None));
And an example that uses partial strings:
let s = "abcd";
let mut cursor = GraphemeCursor::new(0, s.len(), false);
assert_eq!(cursor.next_boundary(&s[..2], 0), Ok(Some(1)));
assert_eq!(cursor.next_boundary(&s[..2], 0), Err(GraphemeIncomplete::NextChunk));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(2)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(Some(4)));
assert_eq!(cursor.next_boundary(&s[2..4], 2), Ok(None));
sourcepub fn prev_boundary(
&mut self,
chunk: &str,
chunk_start: usize,
) -> Result<Option<usize>, GraphemeIncomplete>
pub fn prev_boundary( &mut self, chunk: &str, chunk_start: usize, ) -> Result<Option<usize>, GraphemeIncomplete>
Find the previous boundary after the current cursor position. Only a part
of the string need be supplied. If the chunk is incomplete, then this
method might return GraphemeIncomplete::PreContext
or
GraphemeIncomplete::PrevChunk
. In the former case, the caller should
call provide_context
with the requested chunk, then retry. In the
latter case, the caller should provide the chunk preceding the one
given, then retry.
See is_boundary
for expectations on the provided chunk.
let flags = "\u{1F1F7}\u{1F1F8}\u{1F1EE}\u{1F1F4}";
let mut cursor = GraphemeCursor::new(12, flags.len(), false);
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(8)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(flags, 0), Ok(None));
And an example that uses partial strings (note the exact return is not
guaranteed, and may be PrevChunk
or PreContext
arbitrarily):
let s = "abcd";
let mut cursor = GraphemeCursor::new(4, s.len(), false);
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Ok(Some(3)));
assert_eq!(cursor.prev_boundary(&s[2..4], 2), Err(GraphemeIncomplete::PrevChunk));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(2)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(1)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(Some(0)));
assert_eq!(cursor.prev_boundary(&s[0..2], 0), Ok(None));
Trait Implementations§
source§impl Clone for GraphemeCursor
impl Clone for GraphemeCursor
source§fn clone(&self) -> GraphemeCursor
fn clone(&self) -> GraphemeCursor
1.0.0 · source§fn clone_from(&mut self, source: &Self)
fn clone_from(&mut self, source: &Self)
source
. Read moreAuto Trait Implementations§
impl Freeze for GraphemeCursor
impl RefUnwindSafe for GraphemeCursor
impl Send for GraphemeCursor
impl Sync for GraphemeCursor
impl Unpin for GraphemeCursor
impl UnwindSafe for GraphemeCursor
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> CloneToUninit for Twhere
T: Clone,
impl<T> CloneToUninit for Twhere
T: Clone,
source§unsafe fn clone_to_uninit(&self, dst: *mut T)
unsafe fn clone_to_uninit(&self, dst: *mut T)
clone_to_uninit
)