k-means sampling
Perform a k-means sampling on a matrix for multivariate calibration
naes(X, k, pc, iter.max = 10, method = 0, .center = TRUE, .scale = FALSE)
X |
a numeric matrix (optionally a data frame that can be coerced to a numerical matrix). |
k |
either the number of calibration samples to select or a set of cluster centres to initiate the k-means clustering. |
pc |
optional. If not specified, k-means is run directly on the variable
(Euclidean) space.
Alternatively, a PCA is performed before k-means and |
iter.max |
maximum number of iterations allowed for the k-means
clustering. Default is |
method |
the method used for selecting calibration samples within each
cluster: either samples closest to the cluster.
centers ( |
.center |
logical value indicating whether the input matrix should be
centered before Principal Component Analysis. Default set to |
.scale |
logical value indicating whether the input matrix should be
scaled before Principal Component Analysis. Default set to |
K-means sampling is a simple procedure based on cluster analysis to select calibration samples from large multivariate datasets. The method can be described in three points (Naes et al.,2001):
Perform a PCA and decide how many principal component to keep,
Carry out a k-means clustering on the principal component scores and choose the number of resulting clusters to be equal to the number of desired calibration samples,
Select one sample from each cluster.
a list with components:
'model
' numeric vector giving the row indices of the input data
selected for calibration
'test
' numeric vector giving the row indices of the remaining
observations
'pc
' if the pc
argument is specified, a numeric matrix of the
scaled pc scores
'cluster
' integer vector indicating the cluster to which each
point was assigned
'centers
' a matrix of cluster centres
Antoine Stevens & Leonardo Ramirez-Lopez
Naes, T., 1987. The design of calibration in near infra-red reflectance analysis by clustering. Journal of Chemometrics 1, 121-134.
Naes, T., Isaksson, T., Fearn, T., and Davies, T., 2002. A user friendly guide to multivariate calibration and classification. NIR Publications, Chichester, United Kingdom.
data(NIRsoil) sel <- naes(NIRsoil$spc, k = 5, p = .99, method = 0) # clusters plot(sel$pc[, 1:2], col = sel$cluster + 2) # points selected for calibration with method = 0 points(sel$pc[sel$model, 1:2], col = 2, pch = 19, cex = 1 ) # pre-defined centers can also be provided sel2 <- naes(NIRsoil$spc, k = sel$centers, p = .99, method = 1 ) # points selected for calibration with method = 1 points(sel$pc[sel2$model, 1:2], col = 1, pch = 15, cex = 1 )
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.