Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

TextDocument

Text Documents


Description

Representing and computing on text documents.

Details

Text documents are documents containing (natural language) text. In packages which employ the infrastructure provided by package NLP, such documents are represented via the virtual S3 class "TextDocument": such packages then provide S3 text document classes extending the virtual base class (such as the AnnotatedPlainTextDocument objects provided by package NLP itself).

All extension classes must provide an as.character() method which extracts the natural language text in documents of the respective classes in a “suitable” (not necessarily structured) form, as well as content() and meta() methods for accessing the (possibly raw) document content and metadata.

In addition, the infrastructure features the generic functions words(), sents(), etc., for which extension classes can provide methods giving a structured view of the text contained in documents of these classes (returning, e.g., a character vector with the word tokens in these documents, and a list of such character vectors).

See Also

AnnotatedPlainTextDocument, CoNLLTextDocument, CoNLLUTextDocument, TaggedTextDocument, and WordListDocument for the text document classes provided by package NLP.


NLP

Natural Language Processing Infrastructure

v0.2-1
GPL-3
Authors
Kurt Hornik [aut, cre] (<https://orcid.org/0000-0003-4198-9911>)
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.