Count the number of tokens or types
Get the count of tokens (total features) or types (unique tokens).
ntoken(x, ...) ntype(x, ...)
For dfm objects, ntype
will only return the count of features
that occur more than zero times in the dfm.
named integer vector of the counts of the total tokens or types
Due to differences between raw text tokens and features that have been
defined for a dfm, the counts may be different for dfm objects and the
texts from which the dfm was generated. Because the method tokenizes the
text in order to count the tokens, your results will depend on the options
passed through to tokens()
.
# simple example txt <- c(text1 = "This is a sentence, this.", text2 = "A word. Repeated repeated.") ntoken(txt) ntype(txt) ntoken(char_tolower(txt)) # same ntype(char_tolower(txt)) # fewer types ntoken(char_tolower(txt), remove_punct = TRUE) ntype(char_tolower(txt), remove_punct = TRUE) # with some real texts ntoken(corpus_subset(data_corpus_inaugural, Year < 1806), remove_punct = TRUE) ntype(corpus_subset(data_corpus_inaugural, Year < 1806), remove_punct = TRUE) ntoken(dfm(tokens(corpus_subset(data_corpus_inaugural, Year < 1800)))) ntype(dfm(tokens(corpus_subset(data_corpus_inaugural, Year < 1800))))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.