Base method extensions for corpus objects
Extensions of base R functions for corpus objects.
## S3 method for class 'corpus' c1 + c2 ## S3 method for class 'corpus' c(..., recursive = FALSE) ## S3 method for class 'corpus' x[i, drop_docid = TRUE] ## S3 method for class 'summary.corpus' print(x, ...)
c1 |
corpus one to be added |
c2 |
corpus two to be added |
recursive |
logical used by |
x |
a corpus object |
i |
document names or indices for documents to extract. |
if |
|
The +
operator for a corpus object will combine two corpus
objects, resolving any non-matching docvars()
by making them
into NA
values for the corpus lacking that field. Corpus-level meta
data is concatenated, except for source
and notes
, which are
stamped with information pertaining to the creation of the new joined
corpus.
The c()
operator is also defined for corpus class objects, and provides
an easy way to combine multiple corpus objects.
There are some issues that need to be addressed in future revisions of
quanteda concerning the use of factors to store document variables and
meta-data. Currently most or all of these are not recorded as factors,
because we use stringsAsFactors=FALSE
in the
data.frame()
calls that are used to create and store the
document-level information, because the texts should always be stored as
character vectors and never as factors.
The +
and c()
operators return a corpus()
object.
Indexing a corpus works in three ways, as of v2.x.x:
[
returns a subsetted corpus
[[
returns the textual contents of a subsetted corpus (similar to as.character()
)
$
returns a vector containing the single named docvars
# concatenate corpus objects corp1 <- corpus(data_char_ukimmig2010[1:2]) corp2 <- corpus(data_char_ukimmig2010[3:4]) corp3 <- corpus(data_char_ukimmig2010[5:6]) summary(c(corp1, corp2, corp3)) # two ways to index corpus elements data_corpus_inaugural["1793-Washington"] data_corpus_inaugural[2] # return the text itself data_corpus_inaugural[["1793-Washington"]]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.