word2vec: word2vec_similarity – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

word2vec_similarity

Similarity between word vectors as used in word2vec

Description

The similarity between word vectors is defined as the square root of the average inner product of the vector elements (sqrt(sum(x . y) / ncol(x))) capped to zero

Usage

word2vec_similarity(x, y, top_n = +Inf)

Arguments

`x`	a matrix with embeddings where the rownames of the matrix provide the label of the term
`y`	a matrix with embeddings where the rownames of the matrix provide the label of the term
`top_n`	integer indicating to return only the top n most similar terms from y for each row of x. If `top_n` is supplied, a data.frame will be returned with only the highest similarities between x and y instead of all pairwise similarities

Value

By default, the function returns a similarity matrix between the rows of x and the rows of y. The similarity between row i of x and row j of y is found in cell [i, j] of the returned similarity matrix.
If top_n is provided, the return value is a data.frame with columns term1, term2, similarity and rank indicating the similarity between the provided terms in x and y ordered from high to low similarity and keeping only the top_n most similar records.

Examples

x <- matrix(rnorm(6), nrow = 2, ncol = 3)
rownames(x) <- c("word1", "word2")
y <- matrix(rnorm(15), nrow = 5, ncol = 3)
rownames(y) <- c("term1", "term2", "term3", "term4", "term5")

word2vec_similarity(x, y)
word2vec_similarity(x, y, top_n = 1)
word2vec_similarity(x, y, top_n = 2)
word2vec_similarity(x, y, top_n = +Inf)

## Example with a word2vec model
path  <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- as.matrix(model)

x <- emb[c("gastheer", "gastvrouw", "kamer"), ]
y <- emb
word2vec_similarity(x, x)
word2vec_similarity(x, y, top_n = 3)
predict(model, x, type = "nearest", top_n = 3)

word2vec

Distributed Representations of Words

v0.3.3

Apache License (>= 2.0)

Authors

Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), Max Fomichev [ctb, cph] (Code in src/word2vec)

Initial release