fdm2id: vectorize.words – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

fdm2id

vectorize.words

Word vectorization

Description

Vectorize words from a corpus of documents.

Usage

vectorize.words(
  corpus = NULL,
  ndim = 50,
  maxwords = NULL,
  mincount = 5,
  minphrasecount = NULL,
  window = 5,
  maxcooc = 10,
  maxiter = 10,
  epsilon = 0.01,
  lang = "en",
  stopwords = lang,
  ...
)

Arguments

`corpus`	The corpus of documents (a vector of characters).
`ndim`	The number of dimensions of the vector space.
`maxwords`	The maximum number of words.
`mincount`	Minimum word count to be considered as frequent.
`minphrasecount`	Minimum collocation of words count to be considered as frequent.
`window`	Window for term-co-occurence matrix construction.
`maxcooc`	Maximum number of co-occurrences to use in the weighting function.
`maxiter`	The maximum number of iteration to fit the GloVe model.
`epsilon`	Defines early stopping strategy when fit the GloVe model.
`lang`	The language of the documents (NULL if no stemming).
`stopwords`	Stopwords, or the language of the documents. NULL if stop words should not be removed.
`...`	Other parameters.

Value

The vectorized words.

Examples

## Not run: 
text = loadtext ("http://mattmahoney.net/dc/text8.zip")
words = vectorize.words (text, minphrasecount = 50)
query.words (words, origin = "paris", sub = "france", add = "germany")
query.words (words, origin = "berlin", sub = "germany", add = "france")
query.words (words, origin = "new_zealand")

## End(Not run)

fdm2id

Data Mining and R Programming for Beginners

v0.9.5

GPL-3

Authors

Alexandre Blansché [aut, cre]

Initial release

vectorize.words

Description

Usage

Arguments

Value

See Also

Examples

fdm2id

We don't support your browser anymore