pub struct SentenceSegmenter { /* private fields */ }Expand description
Supports loading sentence break data, and creating sentence break iterators for different string encodings.
Most segmentation methods live on SentenceSegmenterBorrowed, which can be obtained via
SentenceSegmenter::new() or SentenceSegmenter::as_borrowed().
§Examples
Segment a string:
use icu::segmenter::{
    options::SentenceBreakInvariantOptions, SentenceSegmenter,
};
let segmenter =
    SentenceSegmenter::new(SentenceBreakInvariantOptions::default());
let breakpoints: Vec<usize> =
    segmenter.segment_str("Hello World").collect();
assert_eq!(&breakpoints, &[0, 11]);Segment a Latin1 byte string:
use icu::segmenter::{
    options::SentenceBreakInvariantOptions, SentenceSegmenter,
};
let segmenter =
    SentenceSegmenter::new(SentenceBreakInvariantOptions::default());
let breakpoints: Vec<usize> =
    segmenter.segment_latin1(b"Hello World").collect();
assert_eq!(&breakpoints, &[0, 11]);Successive boundaries can be used to retrieve the sentences. In particular, the first boundary is always 0, and the last one is the length of the segmented text in code units.
use itertools::Itertools;
let text = "Ceci tuera cela. Le livre tuera l’édifice.";
let sentences: Vec<&str> = segmenter
    .segment_str(text)
    .tuple_windows()
    .map(|(i, j)| &text[i..j])
    .collect();
assert_eq!(
    &sentences,
    &["Ceci tuera cela. ", "Le livre tuera l’édifice."]
);Implementations§
Source§impl SentenceSegmenter
 
impl SentenceSegmenter
Sourcepub const fn new(
    _options: SentenceBreakInvariantOptions,
) -> SentenceSegmenterBorrowed<'static>
 
pub const fn new( _options: SentenceBreakInvariantOptions, ) -> SentenceSegmenterBorrowed<'static>
Constructs a SentenceSegmenterBorrowed with an invariant locale and compiled data.
✨ Enabled with the compiled_data Cargo feature.
Sourcepub fn try_new(
    options: SentenceBreakOptions<'_>,
) -> Result<SentenceSegmenter, DataError>
 
pub fn try_new( options: SentenceBreakOptions<'_>, ) -> Result<SentenceSegmenter, DataError>
Constructs a SentenceSegmenter for a given options and using compiled data.
✨ Enabled with the compiled_data Cargo feature.
Sourcepub fn try_new_with_buffer_provider(
    provider: &(impl BufferProvider + ?Sized),
    options: SentenceBreakOptions<'_>,
) -> Result<SentenceSegmenter, DataError>
 
pub fn try_new_with_buffer_provider( provider: &(impl BufferProvider + ?Sized), options: SentenceBreakOptions<'_>, ) -> Result<SentenceSegmenter, DataError>
A version of [Self :: try_new] that uses custom data provided by a BufferProvider.
✨ Enabled with the serde feature.
Sourcepub fn try_new_unstable<D>(
    provider: &D,
    options: SentenceBreakOptions<'_>,
) -> Result<SentenceSegmenter, DataError>
 
pub fn try_new_unstable<D>( provider: &D, options: SentenceBreakOptions<'_>, ) -> Result<SentenceSegmenter, DataError>
A version of Self::try_new that uses custom data provided by a DataProvider.
Sourcepub fn as_borrowed(&self) -> SentenceSegmenterBorrowed<'_>
 
pub fn as_borrowed(&self) -> SentenceSegmenterBorrowed<'_>
Constructs a borrowed version of this type for more efficient querying.
Most useful methods for segmentation are on this type.
Trait Implementations§
Auto Trait Implementations§
impl Freeze for SentenceSegmenter
impl RefUnwindSafe for SentenceSegmenter
impl !Send for SentenceSegmenter
impl !Sync for SentenceSegmenter
impl Unpin for SentenceSegmenter
impl UnwindSafe for SentenceSegmenter
Blanket Implementations§
Source§impl<T> BorrowMut<T> for Twhere
    T: ?Sized,
 
impl<T> BorrowMut<T> for Twhere
    T: ?Sized,
Source§fn borrow_mut(&mut self) -> &mut T
 
fn borrow_mut(&mut self) -> &mut T
Source§impl<T> IntoEither for T
 
impl<T> IntoEither for T
Source§fn into_either(self, into_left: bool) -> Either<Self, Self>
 
fn into_either(self, into_left: bool) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left is true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read moreSource§fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
 
fn into_either_with<F>(self, into_left: F) -> Either<Self, Self>
self into a Left variant of Either<Self, Self>
if into_left(&self) returns true.
Converts self into a Right variant of Either<Self, Self>
otherwise. Read more