A collection of newsgroup messages with classes.
The 20 Newsgroups data set is a collection of approximately 20,000 newsgroup documents, partitioned (nearly) evenly across 20 different newsgroups.
data(newsgroup.train.documents) data(newsgroup.test.documents) data(newsgroup.train.labels) data(newsgroup.test.labels) data(newsgroup.vocab) data(newsgroup.label.map)
newsgroup.train.documents
and newsgroup.test.documents
comprise a corpus of 20,000 newsgroup documents conforming to the LDA format,
partitioned into 11269 training and 7505 training and test cases evenly distributed
across 20 classes.
newsgroup.train.labels
is a numeric vector of length 11269 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.test.labels
is a numeric vector of length 7505 which gives
a class label from 1 to 20 for each training document in the corpus.
newsgroup.vocab
is the vocabulary of the corpus.
newsgroup.label.map
maps the numeric class labels to actual class names.
http://qwone.com/~jason/20Newsgroups/
lda.collapsed.gibbs.sampler
for the format of the
corpus.
data(newsgroup.train.documents) data(newsgroup.test.documents) data(newsgroup.train.labels) data(newsgroup.test.labels) data(newsgroup.vocab) data(newsgroup.label.map)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.