quanteda: dfm-class – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

dfm-class

Virtual class "dfm" for a document-feature matrix

Description

The dfm class of object is a type of Matrix-class object with additional slots, described below. quanteda uses two subclasses of the dfm class, depending on whether the object can be represented by a sparse matrix, in which case it is a dfm class object, or if dense, then a dfmDense object. See Details.

Usage

## S4 method for signature 'dfm'
t(x)

## S4 method for signature 'dfm'
colSums(x, na.rm = FALSE, dims = 1, ...)

## S4 method for signature 'dfm'
rowSums(x, na.rm = FALSE, dims = 1, ...)

## S4 method for signature 'dfm'
colMeans(x, na.rm = FALSE, dims = 1, ...)

## S4 method for signature 'dfm'
rowMeans(x, na.rm = FALSE, dims = 1, ...)

## S4 method for signature 'dfm,numeric'
Arith(e1, e2)

## S4 method for signature 'numeric,dfm'
Arith(e1, e2)

## S4 method for signature 'dfm,index,index,missing'
x[i, j, ..., drop = TRUE]

## S4 method for signature 'dfm,index,index,logical'
x[i, j, ..., drop = TRUE]

## S4 method for signature 'dfm,missing,missing,missing'
x[i, j, ..., drop = TRUE]

## S4 method for signature 'dfm,missing,missing,logical'
x[i, j, ..., drop = TRUE]

## S4 method for signature 'dfm,index,missing,missing'
x[i, j, ..., drop = TRUE]

## S4 method for signature 'dfm,index,missing,logical'
x[i, j, ..., drop = TRUE]

## S4 method for signature 'dfm,missing,index,missing'
x[i, j, ..., drop = TRUE]

## S4 method for signature 'dfm,missing,index,logical'
x[i, j, ..., drop = TRUE]

Arguments

`x`	the dfm object
`na.rm`	if `TRUE`, omit missing values (including `NaN`) from the calculations
`dims`	ignored
`...`	additional arguments not used here
`e1`	first quantity in "+" operation for dfm
`e2`	second quantity in "+" operation for dfm
`i`	document names or indices for documents to extract.
`j`	feature names or indices for documents to extract.
`drop_docid`	if `TRUE`, `docid` for documents are removed as the result of extraction.

Details

The dfm class is a virtual class that will contain dgCMatrix-class.

Slots

weightTf: the type of term frequency weighting applied to the dfm. Default is "frequency", indicating that the values in the cells of the dfm are simple feature counts. To change this, use the dfm_weight() method.
weightFf: the type of document frequency weighting applied to the dfm. See docfreq().
smooth: a smoothing parameter, defaults to zero. Can be changed using the dfm_smooth() method.
Dimnames: These are inherited from Matrix-class but are named docs and features respectively.

Examples

# dfm subsetting
dfmat <- dfm(tokens(c("this contains lots of stopwords",
                  "no if, and, or but about it: lots",
                  "and a third document is it"),
                remove_punct = TRUE))
dfmat[1:2, ]
dfmat[1:2, 1:5]

quanteda

Quantitative Analysis of Textual Data

v3.0.0

GPL-3

Authors

Kenneth Benoit [cre, aut, cph] (<https://orcid.org/0000-0002-0797-564X>), Kohei Watanabe [aut] (<https://orcid.org/0000-0001-6519-5265>), Haiyan Wang [aut] (<https://orcid.org/0000-0003-4992-4311>), Paul Nulty [aut] (<https://orcid.org/0000-0002-7214-4666>), Adam Obeng [aut] (<https://orcid.org/0000-0002-2906-4775>), Stefan Müller [aut] (<https://orcid.org/0000-0002-6315-4125>), Akitaka Matsuo [aut] (<https://orcid.org/0000-0002-3323-6330>), William Lowe [aut] (<https://orcid.org/0000-0002-1549-6163>), Christian Müller [ctb], European Research Council [fnd] (ERC-2011-StG 283794-QUANTESS)

Initial release