pub struct WordSegmenter { /* private fields */ }
Expand description

Supports loading word break data, and creating word break iterators for different string encodings.

🚧 This code is experimental; it may change at any time, in breaking or non-breaking ways, including in SemVer minor releases. It can be enabled with the "experimental" Cargo feature of the icu meta-crate. Use with caution. #2259

Examples

Segment a string:

use icu_segmenter::WordSegmenter;
let segmenter =
    WordSegmenter::try_new_unstable(&icu_testdata::unstable())
        .expect("Data exists");

let breakpoints: Vec<usize> =
    segmenter.segment_str("Hello World").collect();
assert_eq!(&breakpoints, &[0, 5, 6, 11]);

Segment a Latin1 byte string:

use icu_segmenter::WordSegmenter;
let segmenter =
    WordSegmenter::try_new_unstable(&icu_testdata::unstable())
        .expect("Data exists");

let breakpoints: Vec<usize> =
    segmenter.segment_latin1(b"Hello World").collect();
assert_eq!(&breakpoints, &[0, 5, 6, 11]);

Implementations

Construct a WordSegmenter.

Creates a new instance using an AnyProvider.

For details on the behavior of this function, see: Self::try_new_unstable

📚 Help choosing a constructor

Enabled with the "serde" feature.

Creates a new instance using a BufferProvider.

For details on the behavior of this function, see: Self::try_new_unstable

📚 Help choosing a constructor

Create a word break iterator for an str (a UTF-8 string).

Create a word break iterator for a potentially ill-formed UTF8 string

Invalid characters are treated as REPLACEMENT CHARACTER

Create a word break iterator for a Latin-1 (8-bit) string.

Create a word break iterator for a UTF-16 string.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.