Segments a string.
Ill-formed input is treated as if errors had been replaced with REPLACEMENT CHARACTERs according to the WHATWG Encoding Standard.
See the Rust documentation for segment_utf16 for more information.
StaticcreateConstruct an [WordSegmenter] with automatically selecting the best available LSTM
or dictionary payload data, using compiled data. This does not assume any content locale.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for new_auto for more information.
StaticcreateConstruct an [WordSegmenter] with automatically selecting the best available LSTM
or dictionary payload data, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for try_new_auto for more information.
StaticcreateConstruct an [WordSegmenter] with automatically selecting the best available LSTM
or dictionary payload data, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for try_new_auto for more information.
StaticcreateConstruct an [WordSegmenter] with with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai, using compiled data. This does not assume any content locale.
Note: currently, it uses dictionary for Chinese and Japanese, and dictionary for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for new_dictionary for more information.
StaticcreateConstruct an [WordSegmenter] with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and dictionary for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for try_new_dictionary for more information.
StaticcreateConstruct an [WordSegmenter] with dictionary payload data for Chinese, Japanese,
Burmese, Khmer, Lao, and Thai, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and dictionary for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for try_new_dictionary for more information.
StaticcreateConstruct an [WordSegmenter] with LSTM payload data for Burmese, Khmer, Lao, and
Thai, using compiled data. This does not assume any content locale.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for new_lstm for more information.
StaticcreateConstruct an [WordSegmenter] with LSTM payload data for Burmese, Khmer, Lao, and
Thai, using compiled data.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for try_new_lstm for more information.
StaticcreateConstruct an [WordSegmenter] with LSTM payload data for Burmese, Khmer, Lao, and
Thai, using a particular data source.
Note: currently, it uses dictionary for Chinese and Japanese, and LSTM for Burmese, Khmer, Lao, and Thai.
See the Rust documentation for try_new_lstm for more information.
An ICU4X word-break segmenter, capable of finding word breakpoints in strings.
See the Rust documentation for
WordSegmenterfor more information.