STM Corpus Coercion
Convert a set of document term counts and associated metadata to
the form required for processing by the stm
function.
asSTMCorpus(documents, vocab, data = NULL, ...)
documents |
A documents-by-term matrix of counts, or a set of
counts in the format returned by |
vocab |
Character vector specifying the words in the corpus in the
order of the vocab indices in documents. Each term in the vocabulary index
must appear at least once in the documents. See |
data |
An optional data frame containing the prevalence and/or content covariates. If unspecified the variables are taken from the active environment. |
... |
Additional arguments passed to or from other methods. |
A list with components "documents"
, "vocab"
, and
"data"
in the form needed for further processing by the stm
function.
library(quanteda) gadarian_corpus <- corpus(gadarian, text_field = "open.ended.response") gadarian_dfm <- dfm(gadarian_corpus, remove = stopwords("english"), stem = TRUE) asSTMCorpus(gadarian_dfm)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.