Extract words and phrases from a corpus
Extract words and phrases from a corpus of documents.
getvocab( corpus, mincount = 5, minphrasecount = NULL, ngram = 1, lang = "en", stopwords = lang, ... )
corpus |
The corpus of documents (a vector of characters). |
mincount |
Minimum word count to be considered as frequent. |
minphrasecount |
Minimum collocation of words count to be considered as frequent. |
ngram |
maximum size of n-grams. |
lang |
The language of the documents (NULL if no stemming). |
stopwords |
Stopwords, or the language of the documents. NULL if stop words should not be removed. |
... |
Other parameters. |
The vocabulary used in the corpus of documents.
## Not run: text = loadtext ("http://mattmahoney.net/dc/text8.zip") vocab1 = getvocab (text) # With stemming nrow (vocab1) vocab2 = getvocab (text, lang = NULL) # Without stemming nrow (vocab2) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.