corpus: gutenberg_corpus – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

corpus

gutenberg_corpus

Project Gutenberg Corpora

Description

Get a corpus of texts from Project Gutenberg.

Usage

gutenberg_corpus(ids, filter = NULL, mirror = NULL, verbose = TRUE, ...)

Arguments

`ids`	an integer vector of requested Gutenberg text IDs.
`filter`	a text filter to set on the corpus.
`mirror`	a character string URL for the Gutenberg mirror to use, or NULL to determine automatically.
`verbose`	a logical scalar indicating whether to print progress updates to the console.
`...`	additional arguments passed to `as_corpus`.

Details

gutenberg_corpus downloads a set of texts from Project Gutenberg, creating a corpus with the texts as rows. You specify the texts for inclusion using their Project Gutenberg IDs, passed to the function in the ids argument.

You can search for Project Gutenberg texts and get their IDs using the gutenberg_works function from the gutenbergr package.

Value

A corpus (data frame) with three columns: "title", "author", and "text".

Examples

# get the texts of George Eliot's novels
## Not run: eliot <- gutenberg_corpus(c(145, 550, 6688))

corpus

Text Corpus Analysis

v0.10.2

Apache License (== 2.0) | file LICENSE

Authors

Leslie Huang [cre, ctb], Patrick O. Perry [aut, cph], Finn Årup Nielsen [cph, dtc] (AFINN Sentiment Lexicon), Martin Porter and Richard Boulton [ctb, cph, dtc] (Snowball Stemmer and Stopword Lists), The Regents of the University of California [ctb, cph] (Strtod Library Procedure), Carlo Strapparava and Alessandro Valitutti [cph, dtc] (WordNet-Affect Lexicon), Unicode, Inc. [cph, dtc] (Unicode Character Database)

Initial release

gutenberg_corpus

Description

Usage

Arguments

Details

Value

See Also

Examples

corpus

We don't support your browser anymore