Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

predict.word2vec

Predict functionalities for a word2vec model


Description

Get either

  • the embedding of words

  • the nearest words which are similar to either a word or a word vector

Usage

## S3 method for class 'word2vec'
predict(
  object,
  newdata,
  type = c("nearest", "embedding"),
  top_n = 10L,
  encoding = "UTF-8",
  ...
)

Arguments

object

a word2vec model as returned by word2vec or read.word2vec

newdata

for type 'embedding', newdata should be a character vector of words
for type 'nearest', newdata should be a character vector of words or a matrix in the embedding space

type

either 'embedding' or 'nearest'. Defaults to 'nearest'.

top_n

show only the top n nearest neighbours. Defaults to 10.

encoding

set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'.

...

not used

Value

depending on the type, you get a different result back:

  • for type nearest: a list of data.frames with columns term, similarity and rank indicating with words which are closest to the provided newdata words or word vectors. If newdata is just one vector instead of a matrix, it returns a data.frame

  • for type embedding: a matrix of word vectors of the words provided in newdata

See Also

Examples

path  <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding")
emb
nn  <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5)
nn

# Do some calculations with the vectors and find similar terms to these
emb <- as.matrix(model)
vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ]
predict(model, vector, type = "nearest", top_n = 10)

vector <- emb["gastvrouw", ] - emb["gastvrij", ]
predict(model, vector, type = "nearest", top_n = 5)

vectors <- emb[c("gastheer", "gastvrouw"), ]
vectors <- rbind(vectors, avg = colMeans(vectors))
predict(model, vectors, type = "nearest", top_n = 10)

word2vec

Distributed Representations of Words

v0.3.3
Apache License (>= 2.0)
Authors
Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), Max Fomichev [ctb, cph] (Code in src/word2vec)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.