Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

vectorize.words

Word vectorization


Description

Vectorize words from a corpus of documents.

Usage

vectorize.words(
  corpus = NULL,
  ndim = 50,
  maxwords = NULL,
  mincount = 5,
  minphrasecount = NULL,
  window = 5,
  maxcooc = 10,
  maxiter = 10,
  epsilon = 0.01,
  lang = "en",
  stopwords = lang,
  ...
)

Arguments

corpus

The corpus of documents (a vector of characters).

ndim

The number of dimensions of the vector space.

maxwords

The maximum number of words.

mincount

Minimum word count to be considered as frequent.

minphrasecount

Minimum collocation of words count to be considered as frequent.

window

Window for term-co-occurence matrix construction.

maxcooc

Maximum number of co-occurrences to use in the weighting function.

maxiter

The maximum number of iteration to fit the GloVe model.

epsilon

Defines early stopping strategy when fit the GloVe model.

lang

The language of the documents (NULL if no stemming).

stopwords

Stopwords, or the language of the documents. NULL if stop words should not be removed.

...

Other parameters.

Value

The vectorized words.

See Also

Examples

## Not run: 
text = loadtext ("http://mattmahoney.net/dc/text8.zip")
words = vectorize.words (text, minphrasecount = 50)
query.words (words, origin = "paris", sub = "france", add = "germany")
query.words (words, origin = "berlin", sub = "germany", add = "france")
query.words (words, origin = "new_zealand")

## End(Not run)

fdm2id

Data Mining and R Programming for Beginners

v0.9.5
GPL-3
Authors
Alexandre Blansché [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.