Convert stm formatted documents to another format
Takes an stm formatted documents and vocab object and returns formats usable in other packages.
convertCorpus(documents, vocab, type = c("slam", "lda", "Matrix"))
documents |
the documents object in stm format |
vocab |
the vocab object in stm format |
type |
the output type desired. See Details. |
We also recommend the quanteda and tm packages for text preparation
etc. The convertCorpus
function is provided as a helpful utility for
moving formats around, but if you intend to do text processing with a variety
of output formats, you likely want to start with quanteda or tm.
The various type conversions are described below:
type = "slam"
Converts to the simple triplet matrix representation used by the slam package. This is the format used internally by tm.
type = "lda"
Converts to the format
used by the lda package. This is a very minor change as the format in
stm is based on lda's data representation. The difference as
noted in stm
involves how the numbers are indexed.
Accordingly this type returns a list containing the new documents object and
the unchanged vocab object.
type = "Matrix"
Converts to the sparse matrix representation used by Matrix. This is the format used internally by numerous other text analysis packages.
If you want to write
out a file containing the sparse matrix representation popularized by David
Blei's C
code ldac
see the function writeLdac
.
#convert the poliblog5k data to slam package format poliSlam <- convertCorpus(poliblog5k.docs, poliblog5k.voc, type="slam") class(poliSlam) poliMatrix <- convertCorpus(poliblog5k.docs, poliblog5k.voc, type="Matrix") class(poliMatrix) poliLDA <- convertCorpus(poliblog5k.docs, poliblog5k.voc, type="lda") str(poliLDA)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.