Randomly sample documents from a tokens object
Take a random sample of documents of the specified size from a corpus, with or without replacement, optionally by grouping variables or with probability weights.
tokens_sample(x, size = NULL, replace = FALSE, prob = NULL, by = NULL)
x |
a tokens object whose documents will be sampled |
size |
a positive number, the number of documents to select; when used
with |
replace |
if |
prob |
a vector of probability weights for obtaining the elements of the
vector being sampled. May not be applied when |
by |
optional grouping variable for sampling. This will be evaluated in
the docvars data.frame, so that docvars may be referred to by name without
quoting. This also changes previous behaviours for |
a tokens object (re)sampled on the documents, containing the document variables for the documents sampled.
set.seed(123) toks <- tokens(data_corpus_inaugural[1:6]) toks tokens_sample(toks) tokens_sample(toks, replace = TRUE) %>% docnames() tokens_sample(toks, size = 3, replace = TRUE) %>% docnames() # sampling using by docvars(toks) tokens_sample(toks, size = 2, replace = TRUE, by = Party) %>% docnames()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.