Select a subset of informative genes
This function identifies highly variable genes from each dataset and combines these gene sets (either by union or intersection) for use in downstream analysis. Assuming that gene expression approximately follows a Poisson distribution, this function identifies genes with gene expression variance above a given variance threshold (relative to mean gene expression). It also provides a log plot of gene variance vs gene expression (with a line indicating expected expression across genes and cells). Selected genes are plotted in green.
selectGenes( object, var.thresh = 0.1, alpha.thresh = 0.99, num.genes = NULL, tol = 1e-04, datasets.use = 1:length(object@raw.data), combine = "union", capitalize = FALSE, do.plot = FALSE, cex.use = 0.3, chunk = 1000 )
object |
|
var.thresh |
Variance threshold. Main threshold used to identify variable genes. Genes with expression variance greater than threshold (relative to mean) are selected. (higher threshold -> fewer selected genes). Accepts single value or vector with separate var.thresh for each dataset. (default 0.1) |
alpha.thresh |
Alpha threshold. Controls upper bound for expected mean gene expression (lower threshold -> higher upper bound). (default 0.99) |
num.genes |
Number of genes to find for each dataset. Optimises the value of var.thresh for each dataset to get this number of genes. Accepts single value or vector with same length as number of datasets (optional, default=NULL). |
tol |
Tolerance to use for optimization if num.genes values passed in (default 0.0001). |
datasets.use |
List of datasets to include for discovery of highly variable genes. (default 1:length(object@raw.data)) |
combine |
How to combine variable genes across experiments. Either "union" or "intersection". (default "union") |
capitalize |
Capitalize gene names to match homologous genes (ie. across species) (default FALSE) |
do.plot |
Display log plot of gene variance vs. gene expression for each dataset. Selected genes are plotted in green. (default FALSE) |
cex.use |
Point size for plot. |
chunk |
size of chunks in hdf5 file. (default 1000) |
liger
object with var.genes slot set.
## Not run: # Given datasets Y and Z ligerex <- createLiger(list(y_set = Y, z_set = Z)) ligerex <- normalize(ligerex) # use default selectGenes settings (var.thresh = 0.1) ligerex <- selectGenes(ligerex) # select a smaller subset of genes ligerex <- selectGenes(ligerex, var.thresh = 0.3) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.