Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

bm_25

BM25 Matching


Description

BM25 stands for Best Matching 25. It is widely using for ranking documents and a preferred method than TF*IDF scores. It is used to find the similar documents from a corpus, given a new document. It is popularly used in information retrieval systems. This implementation is based on c++ functions hence quite optimised as well.

Usage

bm_25(document, corpus, top_n)

Arguments

document

a string for which to find similar documents

corpus

a vector of strings against which document is to be matched

top_n

top n similar documents to find

Value

a vector containing similar documents and their scores

Examples

docs <- c("chimpanzees are found in jungle",
          "chimps are jungle animals",
          "Mercedes automobiles are best",
          "merc is made in germany",
          "chimps are intelligent animals")

sentence <- "automobiles are"
s <- bm_25(document=sentence, corpus=docs, top_n=2)

superml

Build Machine Learning Models Like Using Python's Scikit-Learn Library in R

v0.5.3
GPL-3 | file LICENSE
Authors
Manish Saraswat [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.