stm: asSTMCorpus – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

asSTMCorpus

STM Corpus Coercion

Description

Convert a set of document term counts and associated metadata to the form required for processing by the stm function.

Usage

asSTMCorpus(documents, vocab, data = NULL, ...)

Arguments

`documents`	A documents-by-term matrix of counts, or a set of counts in the format returned by `prepDocuments`. Supported matrix formats include quanteda dfm and Matrix sparse matrix objects in `"dgCMatrix"` or `"dgTMatrix"` format.
`vocab`	Character vector specifying the words in the corpus in the order of the vocab indices in documents. Each term in the vocabulary index must appear at least once in the documents. See `prepDocuments` for dropping unused items in the vocabulary. If `documents` is a sparse matrix or quanteda dfm object, then `vocab` should not (and must not) be supplied. It is contained already inside the column names of the matrix.
`data`	An optional data frame containing the prevalence and/or content covariates. If unspecified the variables are taken from the active environment.
`...`	Additional arguments passed to or from other methods.

Value

A list with components "documents", "vocab", and "data" in the form needed for further processing by the stm function.

Examples

library(quanteda)
gadarian_corpus <- corpus(gadarian, text_field = "open.ended.response")
gadarian_dfm <- dfm(gadarian_corpus, 
                     remove = stopwords("english"),
                     stem = TRUE)
asSTMCorpus(gadarian_dfm)

stm

Estimation of the Structural Topic Model

v1.3.6

MIT + file LICENSE

Authors

Margaret Roberts [aut], Brandon Stewart [aut, cre], Dustin Tingley [aut], Kenneth Benoit [ctb]

Initial release