Struct icu_capi::segmenter_word::ffi::ICU4XWordSegmenter
source · pub struct ICU4XWordSegmenter(WordSegmenter);
Expand description
An ICU4X word-break segmenter, capable of finding word breakpoints in strings.
Tuple Fields§
§0: WordSegmenter
Implementations§
source§impl ICU4XWordSegmenter
impl ICU4XWordSegmenter
sourcepub fn create_auto(
provider: &ICU4XDataProvider,
) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
pub fn create_auto( provider: &ICU4XDataProvider, ) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
Construct an ICU4XWordSegmenter
with automatically selecting the best available LSTM
or dictionary payload data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
sourcepub fn create_lstm(
provider: &ICU4XDataProvider,
) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
pub fn create_lstm( provider: &ICU4XDataProvider, ) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
Construct an ICU4XWordSegmenter
with LSTM payload data for Burmese, Khmer, Lao, and
Thai.
Warning: ICU4XWordSegmenter
created by this function doesn’t handle Chinese or
Japanese.
sourcepub fn create_dictionary(
provider: &ICU4XDataProvider,
) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
pub fn create_dictionary( provider: &ICU4XDataProvider, ) -> Result<Box<ICU4XWordSegmenter>, ICU4XError>
Construct an ICU4XWordSegmenter
with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai.
sourcepub fn segment_utf8<'a>(
&'a self,
input: &'a DiplomatStr,
) -> Box<ICU4XWordBreakIteratorUtf8<'a>>
pub fn segment_utf8<'a>( &'a self, input: &'a DiplomatStr, ) -> Box<ICU4XWordBreakIteratorUtf8<'a>>
Segments a string.
Ill-formed input is treated as if errors had been replaced with REPLACEMENT CHARACTERs according to the WHATWG Encoding Standard.
sourcepub fn segment_utf16<'a>(
&'a self,
input: &'a DiplomatStr16,
) -> Box<ICU4XWordBreakIteratorUtf16<'a>>
pub fn segment_utf16<'a>( &'a self, input: &'a DiplomatStr16, ) -> Box<ICU4XWordBreakIteratorUtf16<'a>>
Segments a string.
Ill-formed input is treated as if errors had been replaced with REPLACEMENT CHARACTERs according to the WHATWG Encoding Standard.
sourcepub fn segment_latin1<'a>(
&'a self,
input: &'a [u8],
) -> Box<ICU4XWordBreakIteratorLatin1<'a>>
pub fn segment_latin1<'a>( &'a self, input: &'a [u8], ) -> Box<ICU4XWordBreakIteratorLatin1<'a>>
Segments a Latin-1 string.
Auto Trait Implementations§
impl Freeze for ICU4XWordSegmenter
impl RefUnwindSafe for ICU4XWordSegmenter
impl !Send for ICU4XWordSegmenter
impl !Sync for ICU4XWordSegmenter
impl Unpin for ICU4XWordSegmenter
impl UnwindSafe for ICU4XWordSegmenter
Blanket Implementations§
source§impl<T> BorrowMut<T> for Twhere
T: ?Sized,
impl<T> BorrowMut<T> for Twhere
T: ?Sized,
source§fn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
source§impl<T> Filterable for T
impl<T> Filterable for T
source§fn filterable(
self,
filter_name: &'static str,
) -> RequestFilterDataProvider<T, fn(_: DataRequest<'_>) -> bool>
fn filterable( self, filter_name: &'static str, ) -> RequestFilterDataProvider<T, fn(_: DataRequest<'_>) -> bool>
source§impl<T> IntoEither for T
impl<T> IntoEither for T
source§fn into_either(self, into_left: bool) -> Either<Self, Self>
fn into_either(self, into_left: bool) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left
is true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read moresource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self
into a Left
variant of Either<Self, Self>
if into_left(&self)
returns true
.
Converts self
into a Right
variant of Either<Self, Self>
otherwise. Read more