Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

word2vec_similarity

Similarity between word vectors as used in word2vec


Description

The similarity between word vectors is defined as the square root of the average inner product of the vector elements (sqrt(sum(x . y) / ncol(x))) capped to zero

Usage

word2vec_similarity(x, y, top_n = +Inf)

Arguments

x

a matrix with embeddings where the rownames of the matrix provide the label of the term

y

a matrix with embeddings where the rownames of the matrix provide the label of the term

top_n

integer indicating to return only the top n most similar terms from y for each row of x. If top_n is supplied, a data.frame will be returned with only the highest similarities between x and y instead of all pairwise similarities

Value

By default, the function returns a similarity matrix between the rows of x and the rows of y. The similarity between row i of x and row j of y is found in cell [i, j] of the returned similarity matrix.
If top_n is provided, the return value is a data.frame with columns term1, term2, similarity and rank indicating the similarity between the provided terms in x and y ordered from high to low similarity and keeping only the top_n most similar records.

See Also

Examples

x <- matrix(rnorm(6), nrow = 2, ncol = 3)
rownames(x) <- c("word1", "word2")
y <- matrix(rnorm(15), nrow = 5, ncol = 3)
rownames(y) <- c("term1", "term2", "term3", "term4", "term5")

word2vec_similarity(x, y)
word2vec_similarity(x, y, top_n = 1)
word2vec_similarity(x, y, top_n = 2)
word2vec_similarity(x, y, top_n = +Inf)

## Example with a word2vec model
path  <- system.file(package = "word2vec", "models", "example.bin")
model <- read.word2vec(path)
emb <- as.matrix(model)

x <- emb[c("gastheer", "gastvrouw", "kamer"), ]
y <- emb
word2vec_similarity(x, x)
word2vec_similarity(x, y, top_n = 3)
predict(model, x, type = "nearest", top_n = 3)

word2vec

Distributed Representations of Words

v0.3.3
Apache License (>= 2.0)
Authors
Jan Wijffels [aut, cre, cph] (R wrapper), BNOSAC [cph] (R wrapper), Max Fomichev [ctb, cph] (Code in src/word2vec)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.