Semantic Coherence
Calculate semantic coherence (Mimno et al 2011) for an STM model.
semanticCoherence(model, documents, M = 10)
model |
the STM object |
documents |
the STM formatted documents (see |
M |
the number of top words to consider per topic |
Semantic coherence is a metric related to pointwise mutual information that was introduced in a paper by David Mimno, Hanna Wallach and colleagues (see references), The paper details a series of manual evaluations which show that their metric is a reasonable surrogate for human judgment. The core idea here is that in models which are semantically coherent the words which are most probable under a topic should co-occur within the same document.
One of our observations in Roberts et al 2014 was that semantic coherence alone is relatively easy to
achieve by having only a couple of topics which all are dominated by the most common words. Thus we
suggest that users should also consider exclusivity
which provides a natural counterpoint.
This function is currently marked with the keyword internal because it does not have much error checking.
a numeric vector containing semantic coherence for each topic
Mimno, D., Wallach, H. M., Talley, E., Leenders, M., & McCallum, A. (2011, July). "Optimizing semantic coherence in topic models." In Proceedings of the Conference on Empirical Methods in Natural Language Processing (pp. 262-272). Association for Computational Linguistics. Chicago
Roberts, M., Stewart, B., Tingley, D., Lucas, C., Leder-Luis, J., Gadarian, S., Albertson, B., et al. (2014). "Structural topic models for open ended survey responses." American Journal of Political Science, 58(4), 1064-1082.
temp<-textProcessor(documents=gadarian$open.ended.response,metadata=gadarian) meta<-temp$meta vocab<-temp$vocab docs<-temp$documents out <- prepDocuments(docs, vocab, meta) docs<-out$documents vocab<-out$vocab meta <-out$meta set.seed(02138) #maximum EM iterations set very low so example will run quickly. #Run your models to convergence! mod.out <- stm(docs, vocab, 3, prevalence=~treatment + s(pid_rep), data=meta, max.em.its=5) semanticCoherence(mod.out, docs)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.