Extract a subset of a tokens
Returns document subsets of a tokens that meet certain conditions, including
direct logical operations on docvars (document-level variables).
tokens_subset()
functions identically to subset.data.frame()
, using
non-standard evaluation to evaluate conditions based on the docvars in the
tokens.
tokens_subset(x, subset, drop_docid = TRUE, ...)
x |
tokens object to be subsetted |
subset |
logical expression indicating the documents to keep: missing values are taken as false |
drop_docid |
if |
... |
not used |
tokens object, with a subset of documents (and docvars) selected according to arguments
corp <- corpus(c(d1 = "a b c d", d2 = "a a b e", d3 = "b b c e", d4 = "e e f a b"), docvars = data.frame(grp = c(1, 1, 2, 3))) toks <- tokens(corp) # selecting on a docvars condition tokens_subset(toks, grp > 1) # selecting on a supplied vector tokens_subset(toks, c(TRUE, FALSE, TRUE, FALSE))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.