Stem the terms in an object
tokens_wordstem(x, language = quanteda_options("language_stemmer")) char_wordstem(x, language = quanteda_options("language_stemmer")) dfm_wordstem(x, language = quanteda_options("language_stemmer"))
x |
a character, tokens, or dfm object whose word stems are to be removed. If tokenized texts, the tokenization must be word-based. |
language |
the name of a recognized language, as returned by getStemLanguages, or a two- or three-letter ISO-639 code corresponding to one of these languages (see references for the list of codes) |
tokens_wordstem
returns a tokens object whose word
types have been stemmed.
char_wordstem
returns a character object whose word
types have been stemmed.
dfm_wordstem
returns a dfm object whose word
types (features) have been stemmed, and recombined to consolidate features made
equivalent because of stemming.
http://www.iso.org/iso/home/standards/language_codes.htm for the ISO-639 language codes
# example applied to tokens txt <- c(one = "eating eater eaters eats ate", two = "taxing taxes taxed my tax return") th <- tokens(txt) tokens_wordstem(th) # simple example char_wordstem(c("win", "winning", "wins", "won", "winner")) # example applied to a dfm (origdfm <- dfm(tokens(txt))) dfm_wordstem(origdfm)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.