Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

bind_tf_idf

Bind the term frequency and inverse document frequency of a tidy text dataset to the dataset


Description

Calculate and bind the term frequency and inverse document frequency of a tidy text dataset, along with the product, tf-idf, to the dataset. Each of these values are added as columns. This function supports non-standard evaluation through the tidyeval framework.

Usage

bind_tf_idf(tbl, term, document, n)

Arguments

tbl

A tidy text dataset with one-row-per-term-per-document

term

Column containing terms as string or symbol

document

Column containing document IDs as string or symbol

n

Column containing document-term counts as string or symbol

Details

The arguments term, document, and n are passed by expression and support quasiquotation; you can unquote strings and symbols.

If the dataset is grouped, the groups are ignored but are retained.

The dataset must have exactly one row per document-term combination for this to work.

Examples

library(dplyr)
library(janeaustenr)

book_words <- austen_books() %>%
  unnest_tokens(word, text) %>%
  count(book, word, sort = TRUE)

book_words

# find the words most distinctive to each document
book_words %>%
  bind_tf_idf(word, book, n) %>%
  arrange(desc(tf_idf))

tidytext

Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

v0.3.1
MIT + file LICENSE
Authors
Gabriela De Queiroz [ctb], Colin Fay [ctb] (<https://orcid.org/0000-0001-7343-1846>), Emil Hvitfeldt [ctb], Os Keyes [ctb] (<https://orcid.org/0000-0001-5196-609X>), Kanishka Misra [ctb], Tim Mastny [ctb], Jeff Erickson [ctb], David Robinson [aut], Julia Silge [aut, cre] (<https://orcid.org/0000-0002-3671-836X>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.