Predict functionalities for a word2vec model
Get either
the embedding of words
the nearest words which are similar to either a word or a word vector
## S3 method for class 'word2vec' predict( object, newdata, type = c("nearest", "embedding"), top_n = 10L, encoding = "UTF-8", ... )
object |
a word2vec model as returned by |
newdata |
for type 'embedding', |
type |
either 'embedding' or 'nearest'. Defaults to 'nearest'. |
top_n |
show only the top n nearest neighbours. Defaults to 10. |
encoding |
set the encoding of the text elements to the specified encoding. Defaults to 'UTF-8'. |
... |
not used |
depending on the type, you get a different result back:
for type nearest: a list of data.frames with columns term, similarity and rank indicating with words which are closest to the provided newdata
words or word vectors. If newdata
is just one vector instead of a matrix, it returns a data.frame
for type embedding: a matrix of word vectors of the words provided in newdata
path <- system.file(package = "word2vec", "models", "example.bin") model <- read.word2vec(path) emb <- predict(model, c("bus", "toilet", "unknownword"), type = "embedding") emb nn <- predict(model, c("bus", "toilet"), type = "nearest", top_n = 5) nn # Do some calculations with the vectors and find similar terms to these emb <- as.matrix(model) vector <- emb["buurt", ] - emb["rustige", ] + emb["restaurants", ] predict(model, vector, type = "nearest", top_n = 10) vector <- emb["gastvrouw", ] - emb["gastvrij", ] predict(model, vector, type = "nearest", top_n = 5) vectors <- emb[c("gastheer", "gastvrouw"), ] vectors <- rbind(vectors, avg = colMeans(vectors)) predict(model, vectors, type = "nearest", top_n = 10)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.