Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

convertCorpus

Convert stm formatted documents to another format


Description

Takes an stm formatted documents and vocab object and returns formats usable in other packages.

Usage

convertCorpus(documents, vocab, type = c("slam", "lda", "Matrix"))

Arguments

documents

the documents object in stm format

vocab

the vocab object in stm format

type

the output type desired. See Details.

Details

We also recommend the quanteda and tm packages for text preparation etc. The convertCorpus function is provided as a helpful utility for moving formats around, but if you intend to do text processing with a variety of output formats, you likely want to start with quanteda or tm.

The various type conversions are described below:

type = "slam"

Converts to the simple triplet matrix representation used by the slam package. This is the format used internally by tm.

type = "lda"

Converts to the format used by the lda package. This is a very minor change as the format in stm is based on lda's data representation. The difference as noted in stm involves how the numbers are indexed. Accordingly this type returns a list containing the new documents object and the unchanged vocab object.

type = "Matrix"

Converts to the sparse matrix representation used by Matrix. This is the format used internally by numerous other text analysis packages.

If you want to write out a file containing the sparse matrix representation popularized by David Blei's C code ldac see the function writeLdac.

See Also

Examples

#convert the poliblog5k data to slam package format
poliSlam <- convertCorpus(poliblog5k.docs, poliblog5k.voc, type="slam")
class(poliSlam)
poliMatrix <- convertCorpus(poliblog5k.docs, poliblog5k.voc, type="Matrix")
class(poliMatrix)
poliLDA <- convertCorpus(poliblog5k.docs, poliblog5k.voc, type="lda")
str(poliLDA)

stm

Estimation of the Structural Topic Model

v1.3.6
MIT + file LICENSE
Authors
Margaret Roberts [aut], Brandon Stewart [aut, cre], Dustin Tingley [aut], Kenneth Benoit [ctb]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.