Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

suggestK

Visually suggest appropiate k value


Description

This can be used to select appropriate value of k for factorization of particular dataset. Plots median (across cells in all datasets) K-L divergence from uniform for cell factor loadings as a function of k. This should increase as k increases but is expected to level off above sufficiently high number of factors (k). This is because cells should have factor loadings which are not uniformly distributed when an appropriate number of factors is reached.

Depending on number of cores used, this process can take 10-20 minutes.

Usage

suggestK(
  object,
  k.test = seq(5, 50, 5),
  lambda = 5,
  thresh = 1e-04,
  max.iters = 100,
  num.cores = 1,
  rand.seed = 1,
  gen.new = FALSE,
  nrep = 1,
  plot.log2 = TRUE,
  return.data = FALSE,
  return.raw = FALSE,
  verbose = TRUE
)

Arguments

object

liger object. Should normalize, select genes, and scale before calling.

k.test

Set of factor numbers to test (default seq(5, 50, 5)).

lambda

Lambda to use for all foctorizations (default 5).

thresh

Convergence threshold. Convergence occurs when |obj0-obj|/(mean(obj0,obj)) < thresh

max.iters

Maximum number of block coordinate descent iterations to perform

num.cores

Number of cores to use for optimizing factorizations in parallel (default 1)

rand.seed

Random seed for reproducibility (default 1).

gen.new

Do not use optimizeNewK in factorizations. Results in slower factorizations. (default FALSE).

nrep

Number restarts to perform at each k value tested (increase to produce smoother curve if results unclear) (default 1).

plot.log2

Plot log2 curve for reference on K-L plot (log2 is upper bound and con sometimes help in identifying "elbow" of plot). (default TRUE)

return.data

Whether to return list of data matrices (raw) or dataframe (processed) instead of ggplot object (default FALSE).

return.raw

If return.results TRUE, whether to return raw data (in format described below), or dataframe used to produce ggplot object. Raw data is list of matrices of K-L divergences (length(k.test) by n_cells). Length of list corresponds to nrep. (default FALSE)

verbose

Print progress bar/messages (TRUE by default)

Value

Matrix of results if indicated or ggplot object. Plots K-L divergence vs. k to console.

Examples

## Not run: 
# Requires preprocessed liger object
# examine plot for most appropriate k, use multiple cores for faster results
suggestK(ligerex, num.cores = 4)

## End(Not run)

rliger

Linked Inference of Genomic Experimental Relationships

v1.0.0
GPL-3
Authors
Joshua Welch [aut, ctb], Chao Gao [aut, ctb, cre], Jialin Liu [aut, ctb], Joshua Sodicoff [aut, ctb], Velina Kozareva [aut, ctb], Evan Macosko [aut, ctb], Paul Hoffman [ctb], Ilya Korsunsky [ctb], Robert Lee [ctb]
Initial release
2021-04-18

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.