K-Medoids Clustering Variable Selection
Creates a specification of a recipe step that will partition numeric variables according to k-medoids clustering and select the cluster medoids.
step_kmedoids(
recipe,
...,
k = 5,
center = TRUE,
scale = TRUE,
method = c("pam", "clara"),
metric = "euclidean",
optimize = FALSE,
num_samp = 50,
samp_size = 40 + 2 * k,
replace = TRUE,
prefix = "KMedoids",
role = "predictor",
skip = FALSE,
id = recipes::rand_id("kmedoids")
)
tunable.step_kmedoids(x, ...)recipe |
recipe object to which the step will be added. |
... |
one or more selector functions to choose which variables will be
used to compute the components. See |
k |
number of k-medoids clusterings of the variables. The value of
|
center, scale |
logicals indicating whether to mean center and median absolute deviation scale the original variables prior to cluster partitioning, or functions or names of functions for the centering and scaling; not applied to selected variables. |
method |
character string specifying one of the clustering methods
provided by the cluster package. The |
metric |
character string specifying the distance metric for calculating
dissimilarities between observations as |
optimize |
logical indicator or 0:5 integer level specifying
optimization for the |
num_samp |
number of sub-datasets to sample for the
|
samp_size |
number of cases to include in each sub-dataset. |
replace |
logical indicating whether to replace the original variables. |
prefix |
if the original variables are not replaced, the selected variables are added to the dataset with the character string prefix added to their names; otherwise, the original variable names are retained. |
role |
analysis role that added step variables should be assigned. By default, they are designated as model predictors. |
skip |
logical indicating whether to skip the step when the recipe is
baked. While all operations are baked when |
id |
unique character string to identify the step. |
x |
|
K-medoids clustering partitions variables into k groups such that the dissimilarity between the variables and their assigned cluster medoids is minimized. Cluster medoids are then returned as a set of k variables.
Function step_kmedoids creates a new step whose class is of
the same name and inherits from step_sbf, adds it to the
sequence of existing steps (if any) in the recipe, and returns the updated
recipe. For the tidy method, a tibble with columns terms
(selectors or variables selected), cluster assignments,
selected (logical indicator of selected cluster medoids),
silhouette (silhouette values), and name of the selected
variable names.
Kaufman L and Rousseeuw PJ (1990). Finding Groups in Data: An Introduction to Cluster Analysis. Wiley: New York.
Reynolds A, Richards G, de la Iglesia B and Rayward-Smith V (1992). Clustering rules: a comparison of partitioning and hierarchical clustering algorithms. Journal of Mathematical Modelling and Algorithms 5, 475–504.
library(recipes) rec <- recipe(rating ~ ., data = attitude) kmedoids_rec <- rec %>% step_kmedoids(all_predictors(), k = 3) kmedoids_prep <- prep(kmedoids_rec, training = attitude) kmedoids_data <- bake(kmedoids_prep, attitude) pairs(kmedoids_data, lower.panel = NULL) tidy(kmedoids_rec, number = 1) tidy(kmedoids_prep, number = 1)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.