Compute predictive distributions for fitted LDA-type models.
This function takes a fitted LDA-type model and computes a predictive distribution for new words in a document. This is useful for making predictions about held-out words.
predictive.distribution(document_sums, topics, alpha, eta)
document_sums |
A K \times D matrix where each entry is a numeric proportional
to the probability of seeing a topic (row) conditioned on document
(column) (this entry is sometimes denoted θ_{d,k} in the
literature, see details). Either the document_sums field or
the document_expects field from the output of
|
topics |
A K \times V matrix where each entry is a numeric proportional
to the probability of seeing the word (column) conditioned on topic
(row) (this entry is sometimes denoted β_{w,k} in the
literature, see details). The column names should correspond to the
words in the vocabulary. The topics field from the output of
|
alpha |
The scalar value of the Dirichlet hyperparameter for topic proportions. See references for details. |
eta |
The scalar value of the Dirichlet hyperparamater for topic multinomials. See references for details. |
The formula used to compute predictive probability is p_d(w) = ∑_k (θ_{d, k} + α) (β_{w, k} + η).
A V \times D matrix of the probability of seeing a word (row) in a document (column). The row names of the matrix are set to the column names of topics.
Jonathan Chang (slycoder@gmail.com)
Blei, David M. and Ng, Andrew and Jordan, Michael. Latent Dirichlet allocation. Journal of Machine Learning Research, 2003.
lda.collapsed.gibbs.sampler
for the format of
topics and document_sums and details of the model.
top.topic.words
demonstrates another use for a fitted
topic matrix.
## Fit a model (from demo(lda)). data(cora.documents) data(cora.vocab) K <- 10 ## Num clusters result <- lda.collapsed.gibbs.sampler(cora.documents, K, ## Num clusters cora.vocab, 25, ## Num iterations 0.1, 0.1) ## Predict new words for the first two documents predictions <- predictive.distribution(result$document_sums[,1:2], result$topics, 0.1, 0.1) ## Use top.topic.words to show the top 5 predictions in each document. top.topic.words(t(predictions), 5) ## [,1] [,2] ## [1,] "learning" "learning" ## [2,] "algorithm" "paper" ## [3,] "model" "problem" ## [4,] "paper" "results" ## [5,] "algorithms" "system"
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.