diceR: dice – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

diceR

dice

Diverse Clustering Ensemble

Description

Runs consensus clustering across subsamples, algorithms, and number of clusters (k).

Usage

dice(
  data,
  nk,
  reps = 10,
  algorithms = NULL,
  k.method = NULL,
  nmf.method = c("brunet", "lee"),
  hc.method = "average",
  distance = "euclidean",
  cons.funs = c("kmodes", "majority", "CSPA", "LCE", "LCA"),
  sim.mat = c("cts", "srs", "asrs"),
  prep.data = c("none", "full", "sampled"),
  min.var = 1,
  seed = 1,
  trim = FALSE,
  reweigh = FALSE,
  n = 5,
  evaluate = TRUE,
  plot = FALSE,
  ref.cl = NULL,
  progress = TRUE
)

Arguments

`data`	data matrix with rows as samples and columns as variables
`nk`	number of clusters (k) requested; can specify a single integer or a range of integers to compute multiple k
`reps`	number of subsamples
`algorithms`	vector of clustering algorithms for performing consensus clustering. Must be any number of the following: "nmf", "hc", "diana", "km", "pam", "ap", "sc", "gmm", "som", "cmeans", "hdbscan". A custom clustering algorithm can be used.
`k.method`	determines the method to choose k when no reference class is given. When `ref.cl` is not `NULL`, k is the number of distinct classes of `ref.cl`. Otherwise the input from `k.method` chooses k. The default is to use the PAC to choose the best k(s). Specifying an integer as a user-desired k will override the best k chosen by PAC. Finally, specifying "all" will produce consensus results for all k. The "all" method is implicitly performed when there is only one k used.
`nmf.method`	specify NMF-based algorithms to run. By default the "brunet" and "lee" algorithms are called. See `NMF::nmf()` for details.
`hc.method`	agglomeration method for hierarchical clustering. The the "average" method is used by default. See`stats::hclust()` for details.
`distance`	a vector of distance functions. Defaults to "euclidean". Other options are given in `stats::dist()`. A custom distance function can be used.
`cons.funs`	consensus functions to use. Current options are "kmodes" (k-modes), "majority" (majority voting), "CSPA" (Cluster-based Similarity Partitioning Algorithm), "LCE" (linkage clustering ensemble), "LCA" (latent class analysis)
`sim.mat`	similarity matrix; choices are "cts", "srs", "asrs".
`prep.data`	Prepare the data on the "full" dataset, the "sampled" dataset, or "none" (default).
`min.var`	minimum variability measure threshold used to filter the feature space for only highly variable features. Only features with a minimum variability measure across all samples greater than `min.var` will be used. If `type = "conventional"`, the standard deviation is the measure used, and if `type = "robust"`, the MAD is the measure used.
`seed`	random seed for knn imputation reproducibility
`trim`	logical; if `TRUE`, algorithms that score low on internal indices will be trimmed out
`reweigh`	logical; if `TRUE`, after trimming out poor performing algorithms, each algorithm is reweighed depending on its internal indices.
`n`	an integer specifying the top `n` algorithms to keep after trimming off the poor performing ones using Rank Aggregation. If the total number of algorithms is less than `n` no trimming is done.
`evaluate`	logical; if `TRUE` (default), validity indices are returned. Internal validity indices are always computed. If `ref.cl` is not `NULL`, then external validity indices will also be computed.
`plot`	logical; if `TRUE`, `graph_all` is called and a summary evaluation heatmap of ranked algorithms vs. internal validity indices is plotted as well.
`ref.cl`	reference class
`progress`	logical; should a progress bar be displayed?

Details

There are three ways to handle the input data before clustering via argument prep.data. The default is to use the raw data as-is ("none"). Or, we can enact prepare_data() on the full dataset ("full"), or the bootstrap sampled datasets ("sampled").

Value

A list with the following elements

`E`	raw clustering ensemble object
`Eknn`	clustering ensemble object with knn imputation used on `E`
`Ecomp`	flattened ensemble object with remaining missing entries imputed by majority voting
`clusters`	final clustering assignment from the diverse clustering ensemble method
`indices`	if `evaluate = TRUE`, shows cluster evaluation indices; otherwise `NULL`

Author(s)

Aline Talhouk, Derek Chiu

Examples

library(dplyr)
data(hgsc)
dat <- hgsc[1:100, 1:50]
ref.cl <- strsplit(rownames(dat), "_") %>%
  purrr::map_chr(2) %>%
  factor() %>%
  as.integer()
dice.obj <- dice(dat, nk = 4, reps = 5, algorithms = "hc", cons.funs =
"kmodes", ref.cl = ref.cl, progress = FALSE)
str(dice.obj, max.level = 2)

diceR

Diverse Cluster Ensemble in R

v1.0.3

MIT + file LICENSE

Authors

Derek Chiu [aut, cre], Aline Talhouk [aut], Johnson Liu [ctb, com]

Initial release

dice

Description

Usage

Arguments

Details

Value

Author(s)

Examples

diceR

We don't support your browser anymore