Struct icu::segmenter::WordSegmenter
source · [−]pub struct WordSegmenter { /* private fields */ }
Expand description
Supports loading word break data, and creating word break iterators for different string encodings.
Examples
Segment a string:
use icu_segmenter::WordSegmenter;
let segmenter =
WordSegmenter::try_new_unstable(&icu_testdata::unstable())
.expect("Data exists");
let breakpoints: Vec<usize> =
segmenter.segment_str("Hello World").collect();
assert_eq!(&breakpoints, &[0, 5, 6, 11]);
Segment a Latin1 byte string:
use icu_segmenter::WordSegmenter;
let segmenter =
WordSegmenter::try_new_unstable(&icu_testdata::unstable())
.expect("Data exists");
let breakpoints: Vec<usize> =
segmenter.segment_latin1(b"Hello World").collect();
assert_eq!(&breakpoints, &[0, 5, 6, 11]);
Implementations
sourceimpl WordSegmenter
impl WordSegmenter
sourcepub fn try_new_unstable<D>(
provider: &D
) -> Result<WordSegmenter, SegmenterError> where
D: DataProvider<WordBreakDataV1Marker> + DataProvider<UCharDictionaryBreakDataV1Marker> + DataProvider<LstmDataV1Marker> + DataProvider<GraphemeClusterBreakDataV1Marker> + ?Sized,
pub fn try_new_unstable<D>(
provider: &D
) -> Result<WordSegmenter, SegmenterError> where
D: DataProvider<WordBreakDataV1Marker> + DataProvider<UCharDictionaryBreakDataV1Marker> + DataProvider<LstmDataV1Marker> + DataProvider<GraphemeClusterBreakDataV1Marker> + ?Sized,
Construct a WordSegmenter
.
sourcepub fn try_new_with_any_provider(
provider: &impl AnyProvider
) -> Result<WordSegmenter, SegmenterError>
pub fn try_new_with_any_provider(
provider: &impl AnyProvider
) -> Result<WordSegmenter, SegmenterError>
Creates a new instance using an AnyProvider
.
For details on the behavior of this function, see: Self::try_new_unstable
sourcepub fn try_new_with_buffer_provider(
provider: &impl BufferProvider
) -> Result<WordSegmenter, SegmenterError>
pub fn try_new_with_buffer_provider(
provider: &impl BufferProvider
) -> Result<WordSegmenter, SegmenterError>
✨ Enabled with the "serde"
feature.
Creates a new instance using a BufferProvider
.
For details on the behavior of this function, see: Self::try_new_unstable
sourcepub fn segment_str(
&'l self,
input: &'s str
) -> RuleBreakIterator<'l, 's, WordBreakTypeUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_str(
&'l self,
input: &'s str
) -> RuleBreakIterator<'l, 's, WordBreakTypeUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a word break iterator for an str
(a UTF-8 string).
sourcepub fn segment_utf8(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, WordBreakTypePotentiallyIllFormedUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_utf8(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, WordBreakTypePotentiallyIllFormedUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a word break iterator for a potentially ill-formed UTF8 string
Invalid characters are treated as REPLACEMENT CHARACTER
sourcepub fn segment_latin1(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, RuleBreakTypeLatin1>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_latin1(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, RuleBreakTypeLatin1>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a word break iterator for a Latin-1 (8-bit) string.
sourcepub fn segment_utf16(
&'l self,
input: &'s [u16]
) -> RuleBreakIterator<'l, 's, WordBreakTypeUtf16>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_utf16(
&'l self,
input: &'s [u16]
) -> RuleBreakIterator<'l, 's, WordBreakTypeUtf16>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a word break iterator for a UTF-16 string.
Auto Trait Implementations
impl RefUnwindSafe for WordSegmenter
impl Send for WordSegmenter
impl Sync for WordSegmenter
impl Unpin for WordSegmenter
impl UnwindSafe for WordSegmenter
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more