Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

corpus_tidiers

Tidiers for a corpus object from the quanteda package


Description

Tidy a corpus object from the quanteda package. tidy returns a tbl_df with one-row-per-document, with a text column containing the document's text, and one column for each document-level metadata. glance returns a one-row tbl_df with corpus-level metadata, such as source and created. For Corpus objects from the tm package, see tidy.Corpus.

Usage

## S3 method for class 'corpus'
tidy(x, ...)

## S3 method for class 'corpus'
glance(x, ...)

Arguments

x

A Corpus object, such as a VCorpus or PCorpus

...

Extra arguments, not used

Details

For the most part, the tidy output is equivalent to the "documents" data frame in the corpus object, except that it is converted to a tbl_df, and texts column is renamed to text to be consistent with other uses in tidytext.

Similarly, the glance output is simply the "metadata" object, with NULL fields removed and turned into a one-row tbl_df.

Examples

if (requireNamespace("quanteda", quietly = TRUE)) {
 data("data_corpus_inaugural", package = "quanteda")

 data_corpus_inaugural

 tidy(data_corpus_inaugural)
}

tidytext

Text Mining using 'dplyr', 'ggplot2', and Other Tidy Tools

v0.3.1
MIT + file LICENSE
Authors
Gabriela De Queiroz [ctb], Colin Fay [ctb] (<https://orcid.org/0000-0001-7343-1846>), Emil Hvitfeldt [ctb], Os Keyes [ctb] (<https://orcid.org/0000-0001-5196-609X>), Kanishka Misra [ctb], Tim Mastny [ctb], Jeff Erickson [ctb], David Robinson [aut], Julia Silge [aut, cre] (<https://orcid.org/0000-0002-3671-836X>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.