Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

PCorpus

Permanent Corpora


Description

Create permanent corpora.

Usage

PCorpus(x,
        readerControl = list(reader = reader(x), language = "en"),
        dbControl = list(dbName = "", dbType = "DB1"))

Arguments

x

A Source object.

readerControl

a named list of control parameters for reading in content from x.

reader

a function capable of reading in and processing the format delivered by x.

language

a character giving the language (preferably as IETF language tags, see language in package NLP). The default language is assumed to be English ("en").

dbControl

a named list of control parameters for the underlying database storage provided by package filehash.

dbName

a character giving the filename for the database.

dbType

a character giving the database format (see filehashOption for possible database formats).

Details

A permanent corpus stores documents outside of R in a database. Since multiple PCorpus R objects with the same underlying database can exist simultaneously in memory, changes in one get propagated to all corresponding objects (in contrast to the default R semantics).

Value

An object inheriting from PCorpus and Corpus.

See Also

Corpus for basic information on the corpus infrastructure employed by package tm.

VCorpus provides an implementation with volatile storage semantics.

Examples

txt <- system.file("texts", "txt", package = "tm")
## Not run: 
PCorpus(DirSource(txt),
        dbControl = list(dbName = "pcorpus.db", dbType = "DB1"))
## End(Not run)

tm

Text Mining Package

v0.7-8
GPL-3
Authors
Ingo Feinerer [aut, cre] (<https://orcid.org/0000-0001-7656-8338>), Kurt Hornik [aut] (<https://orcid.org/0000-0003-4198-9911>), Artifex Software, Inc. [ctb, cph] (pdf_info.ps taken from GPL Ghostscript)
Initial release
2020-11-17

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.