Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

selectGenes

Select a subset of informative genes


Description

This function identifies highly variable genes from each dataset and combines these gene sets (either by union or intersection) for use in downstream analysis. Assuming that gene expression approximately follows a Poisson distribution, this function identifies genes with gene expression variance above a given variance threshold (relative to mean gene expression). It also provides a log plot of gene variance vs gene expression (with a line indicating expected expression across genes and cells). Selected genes are plotted in green.

Usage

selectGenes(
  object,
  var.thresh = 0.1,
  alpha.thresh = 0.99,
  num.genes = NULL,
  tol = 1e-04,
  datasets.use = 1:length(object@raw.data),
  combine = "union",
  capitalize = FALSE,
  do.plot = FALSE,
  cex.use = 0.3,
  chunk = 1000
)

Arguments

object

liger object. Should have already called normalize.

var.thresh

Variance threshold. Main threshold used to identify variable genes. Genes with expression variance greater than threshold (relative to mean) are selected. (higher threshold -> fewer selected genes). Accepts single value or vector with separate var.thresh for each dataset. (default 0.1)

alpha.thresh

Alpha threshold. Controls upper bound for expected mean gene expression (lower threshold -> higher upper bound). (default 0.99)

num.genes

Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes. Accepts single value or vector with same length as number of datasets (optional, default=NULL).

tol

Tolerance to use for optimization if num.genes values passed in (default 0.0001).

datasets.use

List of datasets to include for discovery of highly variable genes. (default 1:length(object@raw.data))

combine

How to combine variable genes across experiments. Either "union" or "intersection". (default "union")

capitalize

Capitalize gene names to match homologous genes (ie. across species) (default FALSE)

do.plot

Display log plot of gene variance vs. gene expression for each dataset. Selected genes are plotted in green. (default FALSE)

cex.use

Point size for plot.

chunk

size of chunks in hdf5 file. (default 1000)

Value

liger object with var.genes slot set.

Examples

## Not run: 
# Given datasets Y and Z
ligerex <- createLiger(list(y_set = Y, z_set = Z))
ligerex <- normalize(ligerex)
# use default selectGenes settings (var.thresh = 0.1)
ligerex <- selectGenes(ligerex)
# select a smaller subset of genes
ligerex <- selectGenes(ligerex, var.thresh = 0.3)

## End(Not run)

rliger

Linked Inference of Genomic Experimental Relationships

v1.0.0
GPL-3
Authors
Joshua Welch [aut, ctb], Chao Gao [aut, ctb, cre], Jialin Liu [aut, ctb], Joshua Sodicoff [aut, ctb], Velina Kozareva [aut, ctb], Evan Macosko [aut, ctb], Paul Hoffman [ctb], Ilya Korsunsky [ctb], Robert Lee [ctb]
Initial release
2021-04-18

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.