word2vec: predict.word2vec – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

word2vec

predict.word2vec

Predict functionalities for a word2vec model

Description

Get either

the embedding of words
the nearest words which are similar to either a word or a word vector

Usage

## S3 method for class 'word2vec'
predict(
  object,
  newdata,
  type = c("nearest", "embedding"),
  top_n = 10L,
  encoding = "UTF-8",
  ...
)

Arguments

`object`	a word2vec model as returned by `word2vec` or `read.word2vec`
`newdata`	for type 'embedding', `newdata` should be a character vector of words for type 'nearest', `newdata` should be a character vector of words or a matrix in the embedding space
`type`	either 'embedding' or 'nearest'. Defaults to 'nearest'.
`top_n`	show only the top n nearest neighbours. Defaults to 10.
`encoding`	set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'.
`...`	not used

Value

depending on the type, you get a different result back:

for type nearest: a list of data.frames with columns term, similarity and rank indicating with words which are closest to the provided newdata words or word vectors. If newdata is just one vector instead of a matrix, it returns a data.frame
for type embedding: a matrix of word vectors of the words provided in newdata

Examples

path  <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding")
emb
nn  <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5)
nn

# Do some calculations with the vectors and find similar terms to these
emb <- as.matrix(model)
vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ]
predict(model, vector, type = "nearest", top_n = 10)

vector <- emb["gastvrouw", ] - emb["gastvrij", ]
predict(model, vector, type = "nearest", top_n = 5)

vectors <- emb[c("gastheer", "gastvrouw"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model, vectors, type = "nearest", top_n = 10)

word2vec

Distributed Representations of Words

v0.3.3

Apache License (>= 2.0)

Authors

Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), Max Fomichev [ctb, cph] (Code in src/word2vec)

Initial release

predict.word2vec

Description

Usage

Arguments

Value

See Also

Examples

word2vec

We don't support your browser anymore