Perplexity of a topic model
Given document-term matrix, topic-word distribution, document-topic distribution calculates perplexity
perplexity(X, topic_word_distribution, doc_topic_distribution)
X |
sparse document-term matrix which contains terms counts. Internally |
topic_word_distribution |
dense matrix for topic-word distribution. Number of rows = |
doc_topic_distribution |
dense matrix for document-topic distribution. Number of rows = |
library(text2vec) data("movie_review") n_iter = 10 train_ind = 1:200 ids = movie_review$id[train_ind] txt = tolower(movie_review[['review']][train_ind]) names(txt) = ids tokens = word_tokenizer(txt) it = itoken(tokens, progressbar = FALSE, ids = ids) vocab = create_vocabulary(it) vocab = prune_vocabulary(vocab, term_count_min = 5, doc_proportion_min = 0.02) dtm = create_dtm(it, vectorizer = vocab_vectorizer(vocab)) n_topic = 10 model = LDA$new(n_topic, doc_topic_prior = 0.1, topic_word_prior = 0.01) doc_topic_distr = model$fit_transform(dtm, n_iter = n_iter, n_check_convergence = 1, convergence_tol = -1, progressbar = FALSE) topic_word_distr_10 = model$topic_word_distribution perplexity(dtm, topic_word_distr_10, doc_topic_distr)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.