Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

kenStone

Kennard-Stone algorithm for calibration sampling


Description

Select calibration samples from a large multivariate data using the Kennard-Stone algorithm

Usage

kenStone(X, k, metric = "mahal", pc, group, .center = TRUE, .scale = FALSE)

Arguments

X

a numeric matrix.

k

number of calibration samples to be selected.

metric

distance metric to be used: 'euclid' (Euclidean distance) or 'mahal' (Mahalanobis distance, default).

pc

optional. If not specified, distance are computed in the Euclidean space. Alternatively, distance are computed in the principal component score space and pc is the number of principal components retained. If pc < 1, the number of principal components kept corresponds to the number of components explaining at least (pc * 100) percent of the total variance.

group

An optional factor (or vector that can be coerced to a factor by as.factor) of length equal to nrow(X), giving the identifier of related observations (e.g. samples of the same batch of measurements, samples of the same origin, or of the same soil profile). When one observation is selected by the procedure all observations of the same group are removed together and assigned to the calibration set. This allows to select calibration points that are independent from the remaining points.

.center

logical value indicating whether the input matrix should be centered before Principal Component Analysis. Default set to TRUE.

.scale

logical value indicating whether the input matrix should be scaled before Principal Component Analysis. Default set to FALSE.

Details

The Kennard–Stone algorithm allows to select samples with a uniform distribution over the predictor space (Kennard and Stone, 1969). It starts by selecting the pair of points that are the farthest apart. They are assigned to the calibration set and removed from the list of points. Then, the procedure assigns remaining points to the calibration set by computing the distance between each unassigned points \(i_0\) and selected points \(i\) and finding the point for which:

This essentially selects point \(i_0\) which is the farthest apart from its closest neighbors \(i\) in the calibration set. The algorithm uses the Euclidean distance to select the points. However, the Mahalanobis distance can also be used. This can be achieved by performing a PCA on the input data and computing the Euclidean distance on the truncated score matrix according to the following definition of the Mahalanobis \(H\) distance:

where \(\hat t_{ia}\) is the \(a^{th}\) principal component score of point \(i\), \(\hat t_{ja}\) is the corresponding value for point \(j\), \(\hat \lambda_a\) is the eigenvalue of principal component \(a\) and \(A\) is the number of principal components included in the computation.

Value

a list with the following components:

  • model: numeric vector giving the row indices of the input data selected for calibration

  • test: numeric vector giving the row indices of the remaining observations

  • pc: if the pc argument is specified, a numeric matrix of the scaled pc scores

Author(s)

Antoine Stevens & Leonardo Ramirez-Lopez

References

Kennard, R.W., and Stone, L.A., 1969. Computer aided design of experiments. Technometrics 11, 137-148.

See Also

Examples

data(NIRsoil)
sel <- kenStone(NIRsoil$spc, k = 30, pc = .99)
plot(sel$pc[, 1:2], xlab = "PC1", ylab = "PC2")
# points selected for calibration
points(sel$pc[sel$model, 1:2], pch = 19, col = 2) 
# Test on artificial data
X <- expand.grid(1:20, 1:20) + rnorm(1e5, 0, .1)
plot(X, xlab = "VAR1", ylab = "VAR2")
sel <- kenStone(X, k = 25, metric = "euclid")
points(X[sel$model, ], pch = 19, col = 2)

prospectr

Miscellaneous Functions for Processing and Sample Selection of Spectroscopic Data

v0.2.1
MIT + file LICENSE
Authors
Antoine Stevens [aut, cre] (<https://orcid.org/0000-0002-1588-7519>), Leonardo Ramirez-Lopez [aut, cre] (<https://orcid.org/0000-0002-5369-5120>), Guillaume Hans [ctb] (<https://orcid.org/0000-0002-6503-5760>)
Initial release
2020-10-23

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.