Cross-Validation for cmls
Does k-fold or generalized cross-validation to tune the constraint options for cmls. Tunes the model with respect to any combination of the arguments const, df, degree, and/or intercept.
cv.cmls(X, Y, nfolds = 2, foldid = NULL, parameters = NULL,
const = "uncons", df = 10, degree = 3, intercept = TRUE,
mse = TRUE, parallel = FALSE, cl = NULL, verbose = TRUE, ...)X |
Matrix of dimension n x p. |
Y |
Matrix of dimension n x m. |
nfolds |
Number of folds for k-fold cross-validation. Ignored if |
foldid |
Factor or integer vector of length n giving the fold identification for each observation. |
parameters |
Parameters for tuning. Data frame with columns |
const |
Parameters for tuning. Character vector specifying constraints for tuning. See Details. |
df |
Parameters for tuning. Integer vector specifying degrees of freedom for tuning. See Details. |
degree |
Parameters for tuning. Integer vector specifying polynomial degrees for tuning. See Details. |
intercept |
Parameters for tuning. Logical vector specifying intercepts for tuning. See Details. |
mse |
If |
parallel |
Logical indicating if |
cl |
Cluster created by |
verbose |
If |
... |
Additional arguments to the |
The parameters for tuning can be supplied via one of two options:
(A) Using the parameters argument. In this case, the argument parameters must be a data frame with columns const, df, degree, and intercept, where each row gives a combination of parameters for the CV tuning.
(B) Using the const, df, degree, and intercept arguments. In this case, the expand.grid function is used to create the parameters data frame, which contains all combinations of the arguments const, df, degree, and intercept. Duplicates are removed before the CV tuning.
best.parameters |
Best combination of parameters, i.e., the combination that minimizes the |
top5.parameters |
Top five combinations of parameters, i.e., the combinations that give the five smallest values of the |
full.parameters |
Full set of parameters. Data frame with |
Nathaniel E. Helwig <helwig@umn.edu>
Helwig, N. E. (in prep). Constrained multivariate least squares in R.
# make X
set.seed(1)
n <- 50
m <- 20
p <- 2
Xmat <- matrix(rnorm(n*p), nrow = n, ncol = p)
# make B (which satisfies all constraints except monotonicity)
x <- seq(0, 1, length.out = m)
Bmat <- rbind(sin(2*pi*x), sin(2*pi*x+pi)) / sqrt(4.75)
struc <- rbind(rep(c(TRUE, FALSE), each = m / 2),
rep(c(FALSE, TRUE), each = m / 2))
Bmat <- Bmat * struc
# make noisy data
Ymat <- Xmat %*% Bmat + rnorm(n*m, sd = 0.5)
# 5-fold CV: tune df (5,...,15) for const = "smooth"
kcv <- cv.cmls(X = Xmat, Y = Ymat, nfolds = 5,
const = "smooth", df = 5:15)
kcv$best.parameters
kcv$top5.parameters
plot(kcv$full.parameters$df, kcv$full.parameters$cvloss, t = "b")
# sample foldid for 5-fold CV
set.seed(2)
foldid <- sample(rep(1:5, length.out = n))
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (no struc)
# using sequential computation (default)
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15)
})
kcv$best.parameters
kcv$top5.parameters
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (no struc)
# using parallel package for parallel computations
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
cl <- makeCluster(detectCores())
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15,
parallel = TRUE, cl = cl)
stopCluster(cl)
})
kcv$best.parameters
kcv$top5.parameters
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (w/ struc)
# using sequential computation (default)
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15, struc = struc)
})
kcv$best.parameters
kcv$top5.parameters
# 5-fold CV: tune df (5,...,15) w/ all 20 relevant constraints (w/ struc)
# using parallel package for parallel computations
myconst <- as.character(const(print = FALSE)$label[-c(13:16)])
system.time({
cl <- makeCluster(detectCores())
kcv <- cv.cmls(X = Xmat, Y = Ymat, foldid = foldid,
const = myconst, df = 5:15, struc = struc,
parallel = TRUE, cl = cl)
stopCluster(cl)
})
kcv$best.parameters
kcv$top5.parametersPlease choose more modern alternatives, such as Google Chrome or Mozilla Firefox.