Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

getvocab

Extract words and phrases from a corpus


Description

Extract words and phrases from a corpus of documents.

Usage

getvocab(
  corpus,
  mincount = 5,
  minphrasecount = NULL,
  ngram = 1,
  lang = "en",
  stopwords = lang,
  ...
)

Arguments

corpus

The corpus of documents (a vector of characters).

mincount

Minimum word count to be considered as frequent.

minphrasecount

Minimum collocation of words count to be considered as frequent.

ngram

maximum size of n-grams.

lang

The language of the documents (NULL if no stemming).

stopwords

Stopwords, or the language of the documents. NULL if stop words should not be removed.

...

Other parameters.

Value

The vocabulary used in the corpus of documents.

See Also

Examples

## Not run: 
text = loadtext ("http://mattmahoney.net/dc/text8.zip")
vocab1 = getvocab (text) # With stemming
nrow (vocab1)
vocab2 = getvocab (text, lang = NULL) # Without stemming
nrow (vocab2)

## End(Not run)

fdm2id

Data Mining and R Programming for Beginners

v0.9.5
GPL-3
Authors
Alexandre Blansché [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.