Replace tokens in a tokens object
Substitute token types based on vectorized one-to-one matching. Since this
function is created for lemmatization or user-defined stemming. It supports
substitution of multi-word features by multi-word features, but substitution
is fastest when pattern
and replacement
are character vectors
and valuetype = "fixed"
as the function only substitute types of
tokens. Please use tokens_lookup()
with exclusive = FALSE
to replace dictionary values.
tokens_replace( x, pattern, replacement, valuetype = "glob", case_insensitive = TRUE, verbose = quanteda_options("verbose") )
x |
tokens object whose token elements will be replaced |
pattern |
a character vector or list of character vectors. See pattern for more details. |
replacement |
a character vector or (if |
valuetype |
the type of pattern matching: |
case_insensitive |
logical; if |
verbose |
print status messages if |
tokens_lookup
toks1 <- tokens(data_corpus_inaugural, remove_punct = TRUE) # lemmatization taxwords <- c("tax", "taxing", "taxed", "taxed", "taxation") lemma <- rep("TAX", length(taxwords)) toks2 <- tokens_replace(toks1, taxwords, lemma, valuetype = "fixed") kwic(toks2, "TAX") %>% tail(10) # stemming type <- types(toks1) stem <- char_wordstem(type, "porter") toks3 <- tokens_replace(toks1, type, stem, valuetype = "fixed", case_insensitive = FALSE) identical(toks3, tokens_wordstem(toks1, "porter")) # multi-multi substitution toks4 <- tokens_replace(toks1, phrase(c("Supreme Court")), phrase(c("Supreme Court of the United States"))) kwic(toks4, phrase(c("Supreme Court of the United States")))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.