quanteda: dfm – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

quanteda

dfm

Create a document-feature matrix

Description

Construct a sparse document-feature matrix, from a character, corpus, tokens, or even other dfm object.

Usage

dfm(
  x,
  tolower = TRUE,
  remove_padding = FALSE,
  verbose = quanteda_options("verbose"),
  ...
)

Arguments

`x`	a tokens or dfm object
`tolower`	convert all features to lowercase
`remove_padding`	logical; if `TRUE`, remove the "pads" left as empty tokens after calling `tokens()` or `tokens_remove()` with `padding = TRUE`
`verbose`	display messages if `TRUE`
`...`	not used directly

Value

a dfm object

Changes in version 3

In quanteda v3, many convenience functions formerly available in dfm() were deprecated. Formerly, dfm() could be called directly on a character or corpus object, but we now steer users to tokenise their inputs first using tokens(). Other convenience arguments to dfm() were also removed, such as select, dictionary, thesaurus, and groups. All of these functions are available elsewhere, e.g. through dfm_group(). See news(Version >= "2.9", package = "quanteda") for details.

Examples

## for a corpus
toks <- data_corpus_inaugural %>%
  corpus_subset(Year > 1980) %>%
  tokens()
dfm(toks)

# removal options
toks <- tokens(c("a b c", "A B C D")) %>%
    tokens_remove("b", padding = TRUE)
toks
dfm(toks)
dfm(toks, remove = "") # remove "pads"

# preserving case
dfm(toks, tolower = FALSE)

quanteda

Quantitative Analysis of Textual Data

v3.0.0

GPL-3

Authors

Kenneth Benoit [cre, aut, cph] (<https://orcid.org/0000-0002-0797-564X>), Kohei Watanabe [aut] (<https://orcid.org/0000-0001-6519-5265>), Haiyan Wang [aut] (<https://orcid.org/0000-0003-4992-4311>), Paul Nulty [aut] (<https://orcid.org/0000-0002-7214-4666>), Adam Obeng [aut] (<https://orcid.org/0000-0002-2906-4775>), Stefan Müller [aut] (<https://orcid.org/0000-0002-6315-4125>), Akitaka Matsuo [aut] (<https://orcid.org/0000-0002-3323-6330>), William Lowe [aut] (<https://orcid.org/0000-0002-1549-6163>), Christian Müller [ctb], European Research Council [fnd] (ERC-2011-StG 283794-QUANTESS)

Initial release

dfm

Description

Usage

Arguments

Value

Changes in version 3

See Also

Examples

quanteda

We don't support your browser anymore