Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

dtm_conform

Make sure a document term matrix has exactly the specified rows and columns


Description

Makes sure the document term matrix has exactly the rows and columns which you specify. If missing rows or columns are occurring, the function fills these up either with empty cells or with the value that you provide. See the examples.

Usage

dtm_conform(dtm, rows, columns, fill)

Arguments

dtm

a document term matrix: an object returned by document_term_matrix

rows

a character vector of row names which dtm should have

columns

a character vector of column names which dtm should have

fill

a value to use to fill up missing rows / columns. Defaults to using an empty cell.

Value

the sparse matrix dtm with exactly the specified rows and columns

See Also

Examples

x <- data.frame(doc_id = c("doc_1", "doc_1", "doc_1", "doc_2"), 
                text = c("a", "a", "b", "c"), 
                stringsAsFactors = FALSE)
dtm <- document_term_frequencies(x)
dtm <- document_term_matrix(dtm)
dtm
dtm_conform(dtm, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(dtm, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"), 
            fill = 1)
dtm_conform(dtm, rows = c("doc_1", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(dtm, columns = c("a", "b", "Z"))
dtm_conform(dtm, rows = c("doc_1"))
dtm_conform(dtm, rows = character())
dtm_conform(dtm, columns = character())
dtm_conform(dtm, rows = character(), columns = character())

##
## Some examples on border line cases
##
special1 <- dtm[, character()]
special2 <- dtm[character(), character()]
special3 <- dtm[character(), ]

dtm_conform(special1, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(special1, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"), 
            fill = 1)
dtm_conform(special1, rows = c("doc_1", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(special1, columns = c("a", "b", "Z"))
dtm_conform(special1, rows = c("doc_1"))
dtm_conform(special1, rows = character())
dtm_conform(special1, columns = character())
dtm_conform(special1, rows = character(), columns = character())

dtm_conform(special2, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(special2, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"), 
            fill = 1)
dtm_conform(special2, rows = c("doc_1", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(special2, columns = c("a", "b", "Z"))
dtm_conform(special2, rows = c("doc_1"))
dtm_conform(special2, rows = character())
dtm_conform(special2, columns = character())
dtm_conform(special2, rows = character(), columns = character())

dtm_conform(special3, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(special3, 
            rows = c("doc_1", "doc_2", "doc_3"), columns = c("a", "b", "c", "Z", "Y"), 
            fill = 1)
dtm_conform(special3, rows = c("doc_1", "doc_3"), columns = c("a", "b", "c", "Z", "Y"))
dtm_conform(special3, columns = c("a", "b", "Z"))
dtm_conform(special3, rows = c("doc_1"))
dtm_conform(special3, rows = character())
dtm_conform(special3, columns = character())
dtm_conform(special3, rows = character(), columns = character())

udpipe

Tokenization, Parts of Speech Tagging, Lemmatization and Dependency Parsing with the 'UDPipe' 'NLP' Toolkit

v0.8.5
MPL-2.0
Authors
Jan Wijffels [aut, cre, cph], BNOSAC [cph], Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University in Prague, Czech Republic [cph], Milan Straka [ctb, cph], Jana Straková [ctb, cph]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.