topicmodels: perplexity – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

topicmodels

perplexity

Methods for Function perplexity

Description

Determine the perplexity of a fitted model.

Usage

perplexity(object, newdata, ...)

## S4 method for signature 'VEM,simple_triplet_matrix'
perplexity(object, newdata, control, ...)

## S4 method for signature 'Gibbs,simple_triplet_matrix'
perplexity(object, newdata, control, use_theta = TRUE,
estimate_theta = TRUE, ...)

## S4 method for signature 'Gibbs_list,simple_triplet_matrix'
perplexity(object, newdata, control, use_theta  = TRUE,
estimate_theta = TRUE, ...)

Arguments

`object`	Object of class `"TopicModel"` or `"Gibbs_list"`.
`newdata`	If missing, the perplexity for the data to which the model was fitted is determined. For objects fitted using Gibbs sampling `newdata` needs to be specified.
`control`	If missing, the `control` of the fitted model is used with suitable changes of the relevant parameters (see Details).
`use_theta`	Object of class `"logical"`. If `TRUE` the estimated topic distributions for the documents are used. Otherwise equal weights are assigned to the topics for each document.
`estimate_theta`	Object of class `"logical"`. If `FALSE` the data provided is assumed to be the same as the data used for fitting the model. The topic distributions therefore do not need to be estimated and the data in `newdata` is used for weighting the term-document occurrences.
`...`	Further arguments passed to the different methods.

Details

The specified control is modified to ensure that (1) estimate.beta=FALSE and (2) nstart=1.

For "Gibbs_list" objects the control is further modified to have (1) iter=thin and (2) best=TRUE and the model is fitted to the new data with this control for each available iteration. The perplexity is then determined by averaging over the same number of iterations.

If a list is supplied as object, it is assumed that it consists of several models which were fitted using different starting configurations.

Value

A numeric value.

Author(s)

Bettina Gruen

References

Blei D.M., Ng A.Y., Jordan M.I. (2003). Latent Dirichlet Allocation. Journal of Machine Learning Research, 3, 993–1022.

Griffiths T.L., Steyvers, M. (2004). Finding Scientific Topics. Proceedings of the National Academy of Sciences of the United States of America, 101, Suppl. 1, 5228–5235.

Newman D., Asuncion A., Smyth P., Welling M. (2009). Distributed Algorithms for Topic Models. Journal of Machine Learning Research, 10, 1801–1828.

topicmodels

Topic Models

v0.2-12

GPL-2

Authors

Bettina Grün [aut, cre] (<https://orcid.org/0000-0001-7265-4773>), Kurt Hornik [aut] (<https://orcid.org/0000-0003-4198-9911>), David M Blei [ctb, cph] (VEM estimation of LDA and CTM), John D Lafferty [ctb, cph] (VEM estimation of CTM), Xuan-Hieu Phan [ctb, cph] (MCMC estimation of LDA), Makoto Matsumoto [ctb, cph] (Mersenne Twister RNG), Takuji Nishimura [ctb, cph] (Mersenne Twister RNG), Shawn Cokus [ctb] (Mersenne Twister RNG)

Initial release

perplexity

Description

Usage

Arguments

Details

Value

Author(s)

References

topicmodels

We don't support your browser anymore