Similarity between word vectors as used in word2vec
The similarity between word vectors is defined as the square root of the average inner product of the vector elements (sqrt(sum(x . y) / ncol(x))) capped to zero
word2vec_similarity(x, y, top_n = +Inf)
x |
a matrix with embeddings where the rownames of the matrix provide the label of the term |
y |
a matrix with embeddings where the rownames of the matrix provide the label of the term |
top_n |
integer indicating to return only the top n most similar terms from y for each row of x.
If |
By default, the function returns a similarity matrix between the rows of x
and the rows of y
.
The similarity between row i of x
and row j of y
is found in cell [i, j]
of the returned similarity matrix.
If top_n
is provided, the return value is a data.frame with columns term1, term2, similarity and rank
indicating the similarity between the provided terms in x
and y
ordered from high to low similarity and keeping only the top_n most similar records.
x <- matrix(rnorm(6), nrow = 2, ncol = 3) rownames(x) <- c("word1", "word2") y <- matrix(rnorm(15), nrow = 5, ncol = 3) rownames(y) <- c("term1", "term2", "term3", "term4", "term5") word2vec_similarity(x, y) word2vec_similarity(x, y, top_n = 1) word2vec_similarity(x, y, top_n = 2) word2vec_similarity(x, y, top_n = +Inf) ## Example with a word2vec model path <- system.file(package = "word2vec", "models", "example.bin") model <- read.word2vec(path) emb <- as.matrix(model) x <- emb[c("gastheer", "gastvrouw", "kamer"), ] y <- emb word2vec_similarity(x, x) word2vec_similarity(x, y, top_n = 3) predict(model, x, type = "nearest", top_n = 3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.