K-fold Cross Validation
This function calculates the predicted values at each point of the design and gives an estimation of criterion using K-fold cross-validation.
crossValidation(model, K)
model |
an output of the |
K |
the number of groups into which the data should be split to apply cross-validation |
A list with the following components:
Ypred |
a vector of predicted values obtained using K-fold cross-validation at the points of the design |
Q2 |
a real which is the estimation of the criterion |
folds |
a list which indicates the partitioning of the data into the folds |
RMSE_CV |
|
MAE_CV |
|
In the case of a Kriging model, other components to test the robustess of the procedure are proposed:
theta |
the range parameter theta estimated for each fold, |
trend |
the trend parameter estimated for each fold, |
shape |
the estimated shape parameter if the covariance structure is of type |
The principle of cross-validation is to split the data into K folds of approximately equal size A_{1}{A1}, ..., A_{K}{AK}. For k=1 to K, a model Y^(-k) is fitted from the data A1 U ... U AK and this model is validated on the fold Ak. Given a criterion of quality L (here, L could be the RMSE
or the MAE
criterion), the "evaluation" of the model consists in computing :
Lk = 1/(n/K) Sum (i in Ak) L (yi,Y^(-k)(xi).
The cross-validation criterion is the mean of the K criterion: L_CV1/K (L1+...+LK).
The Q2
criterion is defined as: Q2
=\code{R2}(\code{Y},\code{Ypred}) with Y
the response value and Ypred
the value fit by cross-validation.
When K
is equal to the number of observations, leave-one-out cross-validation
is performed.
D. Dupuy
## Not run: rm(list=ls()) # A 2D example Branin <- function(x1,x2) { x1 <- x1*15-5 x2 <- x2*15 (x2 - 5/(4*pi^2)*(x1^2) + 5/pi*x1 - 6)^2 + 10*(1 - 1/(8*pi))*cos(x1) + 10 } # Linear model on 50 points n <- 50 X <- matrix(runif(n*2),ncol=2,nrow=n) Y <- Branin(X[,1],X[,2]) modLm <- modelFit(X,Y,type = "Linear",formula=Y~X1+X2+X1:X2+I(X1^2)+I(X2^2)) R2(Y,modLm$model$fitted.values) crossValidation(modLm,K=10)$Q2 # kriging model : gaussian covariance structure, no trend, no nugget effect # on 16 points n <- 16 X <- data.frame(x1=runif(n),x2=runif(n)) Y <- Branin(X[,1],X[,2]) mKm <- modelFit(X,Y,type="Kriging",formula=~1, covtype="powexp") K <- 10 out <- crossValidation(mKm, K) par(mfrow=c(2,2)) plot(c(0,1:K),c(mKm$model@covariance@range.val[1],out$theta[,1]), xlab='',ylab='Theta1') plot(c(0,1:K),c(mKm$model@covariance@range.val[2],out$theta[,2]), xlab='',ylab='Theta2') plot(c(0,1:K),c(mKm$model@covariance@shape.val[1],out$shape[,1]), xlab='',ylab='p1',ylim=c(0,2)) plot(c(0,1:K),c(mKm$model@covariance@shape.val[2],out$shape[,2]), xlab='',ylab='p2',ylim=c(0,2)) par(mfrow=c(1,1)) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.