Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

dtm_colsums

Column sums and Row sums for document term matrices


Description

Column sums and Row sums for document term matrices

Usage

dtm_colsums(dtm, groups)

dtm_rowsums(dtm, groups)

Arguments

dtm

an object returned by document_term_matrix

groups

optionally, a list with column/row names or column/row indexes of the dtm which should be combined by taking the sum over the rows or columns of these. See the examples

Value

Returns either a vector in case argument groups is not provided or a sparse matrix of class dgCMatrix in case argument groups is provided

  • in case groups is not provided: a vector of row/column sums with corresponding names

  • in case groups is provided: a sparse matrix containing summed information over the groups of rows/columns

Examples

x <- data.frame(
 doc_id = c(1, 1, 2, 3, 4), 
 term = c("A", "C", "Z", "X", "G"), 
 freq = c(1, 5, 7, 10, 0))
dtm <- document_term_matrix(x)
x <- dtm_colsums(dtm)
x
x <- dtm_rowsums(dtm)
head(x)

## 
## Grouped column summation
## 
x <- list(doc1 = c("aa", "bb", "aa", "b"), doc2 = c("bb", "bb", "BB"))
dtm <- document_term_matrix(x)
dtm
dtm_colsums(dtm, groups = list(combinedB = c("b", "bb"), combinedA = c("aa", "A")))
dtm_colsums(dtm, groups = list(combinedA = c("aa", "A")))
dtm_colsums(dtm, groups = list(
  combinedB = grep(pattern = "b", colnames(dtm), ignore.case = TRUE, value = TRUE), 
  combinedA = c("aa", "A", "ZZZ"),
  test      = character()))
dtm_colsums(dtm, groups = list())

## 
## Grouped row summation
## 
x <- list(doc1 = c("aa", "bb", "aa", "b"), 
          doc2 = c("bb", "bb", "BB"),
          doc3 = c("bb", "bb", "BB"),
          doc4 = c("bb", "bb", "BB", "b"))
dtm <- document_term_matrix(x)
dtm
dtm_rowsums(dtm, groups = list(doc1 = "doc1", combi = c("doc2", "doc3", "doc4")))
dtm_rowsums(dtm, groups = list(unknown = "docUnknown", combi = c("doc2", "doc3", "doc4")))
dtm_rowsums(dtm, groups = list())

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.5
MPL-2.0
Authors
Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.