Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

replace_tokens

Replace Tokens


Description

Replace tokens with a single substring. This is much faster than mgsub if one wants to replace fixed tokens with a single value or remove them all together. This can be useful for quickly replacing tokens like names in string with a single value in order to reduce noise.

Usage

replace_tokens(x, tokens, replacement = NULL, ignore.case = FALSE, ...)

Arguments

x

A character vector.

tokens

A vector of token to be replaced.

replacement

A single character string to replace the tokens with. The default, NULL, replaces the tokens with nothing.

ignore.case

logical. If TRUE the case of the tokens will be ignored.

...

ignored.

Value

Returns a vector of strings with tokens replaced.

Note

The function splits the string apart into tokens for speed optimization. After the replacement occurs the strings are pasted back together. The strings are not guaranteed to retain exact spacing of the original.

See Also

Examples

replace_tokens(DATA$state, c('No', 'what', "it's"))
replace_tokens(DATA$state, c('No', 'what', "it's"), "<<TOKEN>>")
replace_tokens(
    DATA$state, 
    c('No', 'what', "it's"), 
    "<<TOKEN>>", 
    ignore.case = TRUE
)

## Not run: 
## Now let's see the speed
## Set up data
library(textshape)
data(hamlet)
set.seed(11)
tokens <- sample(unique(unlist(split_token(hamlet$dialogue))), 2000)

tic <- Sys.time()
head(replace_tokens(hamlet$dialogue, tokens))
(toc <- Sys.time() - tic)


tic <- Sys.time()
head(mgsub(hamlet$dialogue, tokens, ""))
(toc <- Sys.time() - tic)


## Amp it up 20x more data
tic <- Sys.time()
head(replace_tokens(rep(hamlet$dialogue, 20), tokens))
(toc <- Sys.time() - tic)

## Replace names example

library(lexicon)
library(textshape)
nms <- gsub("(^.)(.*)", "\\U\\1\\L\\2", common_names, perl = TRUE)
x <- split_portion(
    sample(c(sample(grady_augmented, 5000), sample(nms, 10000, TRUE))), 
    n.words = 12
)
x$text.var <- paste0(
    x$text.var, 
    sample(c('.', '!', '?'), length(x$text.var), TRUE)
 )
replace_tokens(x$text.var, nms, 'NAME')

## End(Not run)

textclean

Text Cleaning Tools

v0.9.3
GPL-2
Authors
Tyler Rinker [aut, cre], ctwheels StackOverflow [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.