Perform online iNMF on scaled datasets
Perform online integrative non-negative matrix factorization to represent multiple single-cell datasets in terms of H, W, and V matrices. It optimizes the iNMF objective function using online learning (non-negative least squares for H matrix, hierarchical alternating least squares for W and V matrices), where the number of factors is set by k. The function allows online learning in 3 scenarios: (1) fully observed datasets; (2) iterative refinement using continually arriving datasets; and (3) projection of new datasets without updating the existing factorization. All three scenarios require fixed memory independent of the number of cells.
For each dataset, this factorization produces an H matrix (cells by k), a V matrix (k by genes), and a shared W matrix (k by genes). The H matrices represent the cell factor loadings. W is identical among all datasets, as it represents the shared components of the metagenes across datasets. The V matrices represent the dataset-specific components of the metagenes.
online_iNMF( object, X_new = NULL, projection = FALSE, W.init = NULL, V.init = NULL, H.init = NULL, A.init = NULL, B.init = NULL, k = 20, lambda = 5, max.epochs = 5, miniBatch_max_iters = 1, miniBatch_size = 5000, h5_chunk_size = 1000, seed = 123, verbose = TRUE )
object |
|
X_new |
List of new datasets for scenario 2 or scenario 3. Each list element should be the name of an HDF5 file. |
projection |
Perform data integration by shared metagene (W) projection (scenario 3). (default FALSE) |
W.init |
Optional initialization for W. (default NULL) |
V.init |
Optional initialization for V (default NULL) |
H.init |
Optional initialization for H (default NULL) |
A.init |
Optional initialization for A (default NULL) |
B.init |
Optional initialization for B (default NULL) |
k |
Inner dimension of factorization–number of metagenes (default 20). A value in the range 20-50 works well for most analyses. |
lambda |
Regularization parameter. Larger values penalize dataset-specific effects more strongly (ie. alignment should increase as lambda increases). We recommend always using the default value except possibly for analyses with relatively small differences (biological replicates, male/female comparisons, etc.) in which case a lower value such as 1.0 may improve reconstruction quality. (default 5.0). |
max.epochs |
Maximum number of epochs (complete passes through the data). (default 5) |
miniBatch_max_iters |
Maximum number of block coordinate descent (HALS algorithm) iterations to perform for each update of W and V (default 1). Changing this parameter is not recommended. |
miniBatch_size |
Total number of cells in each minibatch (default 5000). This is a reasonable default, but a smaller value such as 1000 may be necessary for analyzing very small datasets. In general, minibatch size should be no larger than the number of cells in the smallest dataset. |
h5_chunk_size |
Chunk size of input hdf5 files (default 1000). The chunk size should be no larger than the batch size. |
seed |
Random seed to allow reproducible results (default 123). |
verbose |
Print progress bar/messages (TRUE by default) |
liger
object with H, W, V, A and B slots set.
## Not run: # Requires preprocessed liger object # Get factorization using 20 factors and mini-batch of 5000 cells # (default setting, can be adjusted for ideal results) ligerex <- online_iNMF(ligerex, k = 20, lambda = 5, miniBatch_size = 5000) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.