uwot: umap_transform – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

umap_transform

Add New Points to an Existing Embedding

Description

Carry out an embedding of new data using an existing embedding. Requires using the result of calling umap or tumap with ret_model = TRUE.

Usage

umap_transform(
  X = NULL,
  model = NULL,
  nn_method = NULL,
  init_weighted = TRUE,
  search_k = NULL,
  tmpdir = tempdir(),
  n_epochs = NULL,
  n_threads = NULL,
  n_sgd_threads = 0,
  grain_size = 1,
  verbose = FALSE,
  init = "weighted"
)

Arguments

`X`	The new data to be transformed, either a matrix of data frame. Must have the same columns in the same order as the input data used to generate the `model`.
`model`	Data associated with an existing embedding.
`nn_method`	Optional pre-calculated nearest neighbor data. It must be a list consisting of two elements: `"idx"`. A `n_vertices x n_neighbors` matrix containing the integer indexes of the nearest neighbors in `X`. Each vertex is considered to be its own nearest neighbor, i.e. `idx[, 1] == 1:n_vertices`. `"dist"`. A `n_vertices x n_neighbors` matrix containing the distances of the nearest neighbors. Multiple nearest neighbor data (e.g. from two different pre-calculated metrics) can be passed by passing a list containing the nearest neighbor data lists as items. The `X` parameter is ignored when using pre-calculated nearest neighbor data.
`init_weighted`	If `TRUE`, then initialize the embedded coordinates of `X` using a weighted average of the coordinates of the nearest neighbors from the original embedding in `model`, where the weights used are the edge weights from the UMAP smoothed knn distances. Otherwise, use an un-weighted average. This parameter will be deprecated and removed at version 1.0 of this package. Use the `init` parameter as a replacement, replacing `init_weighted = TRUE` with `init = "weighted"` and `init_weighted = FALSE` with `init = "average"`.
`search_k`	Number of nodes to search during the neighbor retrieval. The larger k, the more the accurate results, but the longer the search takes. Default is the value used in building the `model` is used.
`tmpdir`	Temporary directory to store nearest neighbor indexes during nearest neighbor search. Default is `tempdir`. The index is only written to disk if `n_threads > 1`; otherwise, this parameter is ignored.
`n_epochs`	Number of epochs to use during the optimization of the embedded coordinates. A value between `30 - 100` is a reasonable trade off between speed and thoroughness. By default, this value is set to one third the number of epochs used to build the `model`.
`n_threads`	Number of threads to use, (except during stochastic gradient descent). Default is half the number of concurrent threads supported by the system.
`n_sgd_threads`	Number of threads to use during stochastic gradient descent. If set to > 1, then results will not be reproducible, even if 'set.seed' is called with a fixed seed before running.
`grain_size`	Minimum batch size for multithreading. If the number of items to process in a thread falls below this number, then no threads will be used. Used in conjunction with `n_threads` and `n_sgd_threads`.
`verbose`	If `TRUE`, log details to the console.
`init`	how to initialize the transformed coordinates. One of: `"weighted"` (The default). Use a weighted average of the coordinates of the nearest neighbors from the original embedding in `model`, where the weights used are the edge weights from the UMAP smoothed knn distances. Equivalent to `init_weighted = TRUE`. `"average"`. Use the mean average of the coordinates of the nearest neighbors from the original embedding in `model`. Equivalent to `init_weighted = FALSE`. A matrix of user-specified input coordinates, which must have dimensions the same as `(nrow(X), ncol(model$embedding))`. This parameter should be used in preference to `init_weighted`.

Details

Note that some settings are incompatible with the production of a UMAP model via umap: external neighbor data (passed via a list to the argument of the nn_method parameter), and factor columns that were included in the UMAP calculation via the metric parameter. In the latter case, the model produced is based only on the numeric data. A transformation is possible, but factor columns in the new data are ignored.

Value

A matrix of coordinates for X transformed into the space of the model.

Examples

iris_train <- iris[1:100, ]
iris_test <- iris[101:150, ]

# You must set ret_model = TRUE to return extra data needed
iris_train_umap <- umap(iris_train, ret_model = TRUE)
iris_test_umap <- umap_transform(iris_test, iris_train_umap)

uwot

The Uniform Manifold Approximation and Projection (UMAP) Method for Dimensionality Reduction

v0.1.10

GPL (>= 3)

Authors

James Melville [aut, cre], Aaron Lun [ctb], Mohamed Nadhir Djekidel [ctb], Yuhan Hao [ctb]

Initial release