Split tokens by a separator pattern
Replaces tokens by multiple replacements consisting of elements split by a
separator pattern, with the option of retaining the separator. This function
effectively reverses the operation of tokens_compound()
.
tokens_split( x, separator = " ", valuetype = c("fixed", "regex"), remove_separator = TRUE )
x |
a tokens object |
separator |
a single-character pattern match by which tokens are separated |
valuetype |
the type of pattern matching: |
remove_separator |
if |
# undo tokens_compound() toks1 <- tokens("pork barrel is an idiomatic multi-word expression") tokens_compound(toks1, phrase("pork barrel")) tokens_compound(toks1, phrase("pork barrel")) %>% tokens_split(separator = "_") # similar to tokens(x, remove_hyphen = TRUE) but post-tokenization toks2 <- tokens("UK-EU negotiation is not going anywhere as of 2018-12-24.") tokens_split(toks2, separator = "-", remove_separator = FALSE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.