pub struct GraphemeClusterSegmenter { /* private fields */ }
Expand description

Segments a string into grapheme clusters.

Supports loading grapheme cluster break data, and creating grapheme cluster break iterators for different string encodings.

🚧 This code is experimental; it may change at any time, in breaking or non-breaking ways, including in SemVer minor releases. It can be enabled with the "experimental" Cargo feature of the icu meta-crate. Use with caution. #2259

Examples

Segment a string:

use icu_segmenter::GraphemeClusterSegmenter;
let segmenter = GraphemeClusterSegmenter::try_new_unstable(
    &icu_testdata::unstable(),
)
.expect("Data exists");

let breakpoints: Vec<usize> = segmenter.segment_str("Hello 🗺").collect();
// World Map (U+1F5FA) is encoded in four bytes in UTF-8.
assert_eq!(&breakpoints, &[0, 1, 2, 3, 4, 5, 6, 10]);

Segment a Latin1 byte string:

use icu_segmenter::GraphemeClusterSegmenter;
let segmenter = GraphemeClusterSegmenter::try_new_unstable(
    &icu_testdata::unstable(),
)
.expect("Data exists");

let breakpoints: Vec<usize> =
    segmenter.segment_latin1(b"Hello World").collect();
assert_eq!(&breakpoints, &[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]);

Implementations

Creates a new instance using an AnyProvider.

For details on the behavior of this function, see: Self::try_new_unstable

📚 Help choosing a constructor

Enabled with the "serde" feature.

Creates a new instance using a BufferProvider.

For details on the behavior of this function, see: Self::try_new_unstable

📚 Help choosing a constructor

Create a grapheme cluster break iterator for an str (a UTF-8 string).

Create a grapheme cluster break iterator for a potentially ill-formed UTF8 string

Invalid characters are treated as REPLACEMENT CHARACTER

Create a grapheme cluster break iterator for a Latin-1 (8-bit) string.

Create a grapheme cluster break iterator for a UTF-16 string.

Auto Trait Implementations

Blanket Implementations

Gets the TypeId of self. Read more

Immutably borrows from an owned value. Read more

Mutably borrows from an owned value. Read more

Returns the argument unchanged.

Calls U::from(self).

That is, this conversion is whatever the implementation of From<T> for U chooses to do.

The type returned in the event of a conversion error.

Performs the conversion.

The type returned in the event of a conversion error.

Performs the conversion.