Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

stem_snowball

Snowball Stemmer


Description

Stem a set of terms using one of the algorithms provided by the Snowball stemming library.

Usage

stem_snowball(x, algorithm = "en")

Arguments

x

character vector of terms to stem.

algorithm

stemming algorithm; see ‘Details’ for the valid choices.

Details

Apply a Snowball stemming algorithm to a vector of input terms, x, returning the result in a character vector of the same length with the same names.

The algorithm argument specifies the stemming algorithm. Valid choices include the following: "ar" ("arabic"), "da" ("danish"), "de" ("german"), "en" ("english"), "es" ("spanish"), "fi" ("finnish"), "fr" ("french"), "hu" ("hungarian"), "it" ("italian"), "nl" ("dutch"), "no" ("norwegian"), "pt" ("portuguese"), "ro" ("romanian"), "ru" ("russian"), "sv" ("swedish"), "ta" ("tamil"), "tr" ("turkish"), and "porter". Setting algorithm = NULL gives a stemmer that returns its input unchanged.

The function only stems single-word terms of kind "letter"; it leaves other inputs (multi-word terms, and terms of kind "number", "punct", and "symbol") unchanged.

The Snowball stemming library provides the underlying implementation. The wordStem function from the SnowballC package provides a similar interface, but that function applies the algorithm to all input terms, regardless of the kind of the term.

Value

A character vector the same length and names as the input, x, with entries containing the corresponding stems.

See Also

Examples

# apply english stemming algorithm; don't stem non-letter terms
stem_snowball(c("win", "winning", "winner", "#winning"))

# compare with SnowballC, which stems all kinds, not just letter
## Not run: SnowballC::wordStem(c("win", "winning", "winner", "#winning"), "en")

corpus

Text Corpus Analysis

v0.10.2
Apache License (== 2.0) | file LICENSE
Authors
Leslie Huang [cre, ctb], Patrick O. Perry [aut, cph], Finn Årup Nielsen [cph, dtc] (AFINN Sentiment Lexicon), Martin Porter and Richard Boulton [ctb, cph, dtc] (Snowball Stemmer and Stopword Lists), The Regents of the University of California [ctb, cph] (Strtod Library Procedure), Carlo Strapparava and Alessandro Valitutti [cph, dtc] (WordNet-Affect Lexicon), Unicode, Inc. [cph, dtc] (Unicode Character Database)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.