udpipe: document_term_frequencies_statistics – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

udpipe

document_term_frequencies_statistics

Add Term Frequency, Inverse Document Frequency and Okapi BM25 statistics to the output of document_term_frequencies

Description

Term frequency Inverse Document Frequency (tfidf) is calculated as the multiplication of

Term Frequency (tf): how many times the word occurs in the document / how many words are in the document
Inverse Document Frequency (idf): log(number of documents / number of documents where the term appears)

The Okapi BM25 statistic is calculated as the multiplication of the inverse document frequency and the weighted term frequency as defined at https://en.wikipedia.org/wiki/Okapi_BM25.

Usage

document_term_frequencies_statistics(x, k = 1.2, b = 0.75)

Arguments

`x`	a data.table as returned by `document_term_frequencies` containing the columns doc_id, term and freq.
`k`	parameter k1 of the Okapi BM25 ranking function as defined at https://en.wikipedia.org/wiki/Okapi_BM25. Defaults to 1.2.
`b`	parameter b of the Okapi BM25 ranking function as defined at https://en.wikipedia.org/wiki/Okapi_BM25. Defaults to 0.5.

Value

a data.table with columns doc_id, term, freq and added to that the computed statistics tf, idf, tfidf, tf_bm25 and bm25.

Examples

data(brussels_reviews_anno)
x <- document_term_frequencies(brussels_reviews_anno[, c("doc_id", "token")])
x <- document_term_frequencies_statistics(x)
head(x)

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.5

MPL-2.0

Authors

Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]

Initial release

document_term_frequencies_statistics

Description

Usage

Arguments

Value

Examples

udpipe

We don't support your browser anymore