Word vectorization
Vectorize words from a corpus of documents.
vectorize.words( corpus = NULL, ndim = 50, maxwords = NULL, mincount = 5, minphrasecount = NULL, window = 5, maxcooc = 10, maxiter = 10, epsilon = 0.01, lang = "en", stopwords = lang, ... )
corpus |
The corpus of documents (a vector of characters). |
ndim |
The number of dimensions of the vector space. |
maxwords |
The maximum number of words. |
mincount |
Minimum word count to be considered as frequent. |
minphrasecount |
Minimum collocation of words count to be considered as frequent. |
window |
Window for term-co-occurence matrix construction. |
maxcooc |
Maximum number of co-occurrences to use in the weighting function. |
maxiter |
The maximum number of iteration to fit the GloVe model. |
epsilon |
Defines early stopping strategy when fit the GloVe model. |
lang |
The language of the documents (NULL if no stemming). |
stopwords |
Stopwords, or the language of the documents. NULL if stop words should not be removed. |
... |
Other parameters. |
The vectorized words.
## Not run: text = loadtext ("http://mattmahoney.net/dc/text8.zip") words = vectorize.words (text, minphrasecount = 50) query.words (words, origin = "paris", sub = "france", add = "germany") query.words (words, origin = "berlin", sub = "germany", add = "france") query.words (words, origin = "new_zealand") ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.