Struct icu::segmenter::SentenceSegmenter
source · [−]pub struct SentenceSegmenter { /* private fields */ }
Expand description
Supports loading sentence break data, and creating sentence break iterators for different string encodings.
Examples
Segment a string:
use icu_segmenter::SentenceSegmenter;
let segmenter =
SentenceSegmenter::try_new_unstable(&icu_testdata::unstable())
.expect("Data exists");
let breakpoints: Vec<usize> =
segmenter.segment_str("Hello World").collect();
assert_eq!(&breakpoints, &[0, 11]);
Segment a Latin1 byte string:
use icu_segmenter::SentenceSegmenter;
let segmenter =
SentenceSegmenter::try_new_unstable(&icu_testdata::unstable())
.expect("Data exists");
let breakpoints: Vec<usize> =
segmenter.segment_latin1(b"Hello World").collect();
assert_eq!(&breakpoints, &[0, 11]);
Implementations
sourceimpl SentenceSegmenter
impl SentenceSegmenter
sourcepub fn try_new_unstable<D>(
provider: &D
) -> Result<SentenceSegmenter, SegmenterError> where
D: DataProvider<SentenceBreakDataV1Marker> + ?Sized,
pub fn try_new_unstable<D>(
provider: &D
) -> Result<SentenceSegmenter, SegmenterError> where
D: DataProvider<SentenceBreakDataV1Marker> + ?Sized,
Construct a SentenceSegmenter
.
sourcepub fn try_new_with_any_provider(
provider: &impl AnyProvider
) -> Result<SentenceSegmenter, SegmenterError>
pub fn try_new_with_any_provider(
provider: &impl AnyProvider
) -> Result<SentenceSegmenter, SegmenterError>
Creates a new instance using an AnyProvider
.
For details on the behavior of this function, see: Self::try_new_unstable
sourcepub fn try_new_with_buffer_provider(
provider: &impl BufferProvider
) -> Result<SentenceSegmenter, SegmenterError>
pub fn try_new_with_buffer_provider(
provider: &impl BufferProvider
) -> Result<SentenceSegmenter, SegmenterError>
✨ Enabled with the "serde"
feature.
Creates a new instance using a BufferProvider
.
For details on the behavior of this function, see: Self::try_new_unstable
sourcepub fn segment_str(
&'l self,
input: &'s str
) -> RuleBreakIterator<'l, 's, RuleBreakTypeUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_str(
&'l self,
input: &'s str
) -> RuleBreakIterator<'l, 's, RuleBreakTypeUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a sentence break iterator for an str
(a UTF-8 string).
sourcepub fn segment_utf8(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, RuleBreakTypePotentiallyIllFormedUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_utf8(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, RuleBreakTypePotentiallyIllFormedUtf8>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a sentence break iterator for a potentially ill-formed UTF8 string
Invalid characters are treated as REPLACEMENT CHARACTER
sourcepub fn segment_latin1(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, RuleBreakTypeLatin1>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_latin1(
&'l self,
input: &'s [u8]
) -> RuleBreakIterator<'l, 's, RuleBreakTypeLatin1>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a sentence break iterator for a Latin-1 (8-bit) string.
sourcepub fn segment_utf16(
&'l self,
input: &'s [u16]
) -> RuleBreakIterator<'l, 's, RuleBreakTypeUtf16>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
pub fn segment_utf16(
&'l self,
input: &'s [u16]
) -> RuleBreakIterator<'l, 's, RuleBreakTypeUtf16>ⓘNotable traits for RuleBreakIterator<'l, 's, Y>impl<'l, 's, Y> Iterator for RuleBreakIterator<'l, 's, Y> where
Y: RuleBreakType<'l, 's>, type Item = usize;
Y: RuleBreakType<'l, 's>, type Item = usize;
Create a sentence break iterator for a UTF-16 string.
Auto Trait Implementations
impl RefUnwindSafe for SentenceSegmenter
impl Send for SentenceSegmenter
impl Sync for SentenceSegmenter
impl Unpin for SentenceSegmenter
impl UnwindSafe for SentenceSegmenter
Blanket Implementations
sourceimpl<T> BorrowMut<T> for T where
T: ?Sized,
impl<T> BorrowMut<T> for T where
T: ?Sized,
const: unstable · sourcefn borrow_mut(&mut self) -> &mut T
fn borrow_mut(&mut self) -> &mut T
Mutably borrows from an owned value. Read more