Recombine a dfm or fcm by combining identical dimension elements
"Compresses" or groups a dfm or fcm whose dimension names are
the same, for either documents or features. This may happen, for instance,
if features are made equivalent through application of a thesaurus. It could also be needed after a
cbind.dfm()
or rbind.dfm()
operation. In most cases, you will not
need to call dfm_compress
, since it is called automatically by functions that change the
dimensions of the dfm, e.g. dfm_tolower()
.
dfm_compress(x, margin = c("both", "documents", "features")) fcm_compress(x)
fcm_compress
returns an fcm whose features have been
recombined by combining counts of identical features, summing their counts.
fcm_compress
works only when the fcm was created with a
document context.
# dfm_compress examples dfmat <- rbind(dfm(tokens(c("b A A", "C C a b B")), tolower = FALSE), dfm(tokens("A C C C C C"), tolower = FALSE)) colnames(dfmat) <- char_tolower(featnames(dfmat)) dfmat dfm_compress(dfmat, margin = "documents") dfm_compress(dfmat, margin = "features") dfm_compress(dfmat) # no effect if no compression needed dfmatsubset <- dfm(tokens(data_corpus_inaugural[1:5])) dim(dfmatsubset) dim(dfm_compress(dfmatsubset)) # compress an fcm fcmat1 <- fcm(tokens("A D a C E a d F e B A C E D"), context = "window", window = 3) ## this will produce an error: # fcm_compress(fcmat1) txt <- c("The fox JUMPED over the dog.", "The dog jumped over the fox.") toks <- tokens(txt, remove_punct = TRUE) fcmat2 <- fcm(toks, context = "document") colnames(fcmat2) <- rownames(fcmat2) <- tolower(colnames(fcmat2)) colnames(fcmat2)[5] <- rownames(fcmat2)[5] <- "fox" fcmat2 fcm_compress(fcmat2)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.