Identify the most frequent features in a dfm
List the most (or least) frequently occurring features in a dfm, either as a whole or separated by document.
topfeatures( x, n = 10, decreasing = TRUE, scheme = c("count", "docfreq"), groups = NULL )
x |
the object whose features will be returned |
n |
how many top features should be returned |
decreasing |
If |
scheme |
one of |
groups |
grouping variable for sampling, equal in length to the number
of documents. This will be evaluated in the docvars data.frame, so that
docvars may be referred to by name without quoting. This also changes
previous behaviours for |
A named numeric vector of feature counts, where the names are the
feature labels, or a list of these if groups
is given.
dfmat1 <- corpus_subset(data_corpus_inaugural, Year > 1980) %>% tokens(remove_punct = TRUE) %>% dfm() dfmat2 <- dfm_remove(dfmat1, stopwords("en")) # most frequent features topfeatures(dfmat1) topfeatures(dfmat2) # least frequent features topfeatures(dfmat2, decreasing = FALSE) # top features of individual documents topfeatures(dfmat2, n = 5, groups = docnames(dfmat2)) # grouping by president last name topfeatures(dfmat2, n = 5, groups = President) # features by document frequencies tail(topfeatures(dfmat1, scheme = "docfreq", n = 200))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.