Heldout Likelihood by Document Completion
Tools for making and evaluating heldout datasets.
make.heldout( documents, vocab, N = floor(0.1 * length(documents)), proportion = 0.5, seed = NULL )
documents |
the documents to be modeled (see |
vocab |
the vocabulary item |
N |
number of docs to be partially held out |
proportion |
proportion of docs to be held out. |
seed |
the seed, set for replicability |
These functions are used to create and evaluate heldout likelihood using the document completion method. The basic idea is to hold out some fraction of the words in a set of documents, train the model and use the document-level latent variables to evaluate the probability of the heldout portion. See the example for the basic workflow.
prep <- prepDocuments(poliblog5k.docs, poliblog5k.voc, poliblog5k.meta,subsample=500, lower.thresh=20,upper.thresh=200) heldout <- make.heldout(prep$documents, prep$vocab) documents <- heldout$documents vocab <- heldout$vocab meta <- prep$meta stm1<- stm(documents, vocab, 5, prevalence =~ rating+ s(day), init.type="Random", data=meta, max.em.its=5) eval.heldout(stm1, heldout$missing)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.