mallet: mallet.subset.topic.words – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

mallet.subset.topic.words

Estimate topic-word distributions from a sub-corpus

Description

This function returns a matrix of word probabilities for each topic similar to mallet.topic.words, but estimated from a subset of the documents in the corpus. The model assumes that topics are the same no matter where they are used, but we know this is often not the case. This function lets us test whether some words are used more or less than we expect in a particular set of documents.

Usage

mallet.subset.topic.words(topic.model, subset.docs, normalized=FALSE, smoothed=FALSE)

Arguments

`topic.model`	The model returned by `MalletLDA`
`subset.docs`	An array of TRUE/FALSE values specifying which documents should be used and which should be ignored.
`normalized`	If true, normalize the rows so that each topic sums to one. If false, values will be integers (possibly plus the smoothing constant) representing the actual number of words of each type in the topics.
`smoothed`	If true, add the smoothing parameter for the model (initial value specified as `beta` in `MalletLDA`). If false, many values will be zero.

Examples

## Not run: 
nips.topic.words <- mallet.subset.topic.words(topic.model, documents$class == "NIPS",
		    				smoothed=T, normalized=T)

## End(Not run)

mallet

A wrapper around the Java machine learning tool MALLET

v1.0

MIT + file LICENSE

Authors

David Mimno

Initial release

2013-07-18