Conditional Variance Estimator (CVE).
This is the main function in the CVE
package. It creates objects of
class "cve"
to estimate the mean subspace. Helper functions that
require a "cve"
object can then be applied to the output from this
function.
Conditional Variance Estimation (CVE) is a sufficient dimension reduction (SDR) method for regressions studying E(Y|X), the conditional expectation of a response Y given a set of predictors X. This function provides methods for estimating the dimension and the subspace spanned by the columns of a p x k matrix B of minimal rank k such that
E(Y|X) = E(Y|B'X)
or, equivalently,
Y = g(B'X) + ε
where X is independent of ε with positive definite variance-covariance matrix Var(X) = Σ_X. ε is a mean zero random variable with finite Var(ε) = E(ε^2), g is an unknown, continuous non-constant function, and B = (b_1,..., b_k) is a real p x k matrix of rank k <= p.
Both the dimension k and the subspace span(B) are unknown. The CVE method makes very few assumptions.
A kernel matrix Bhat is estimated such that the column space of Bhat should be close to the mean subspace span(B). The primary output from this method is a set of orthonormal vectors, Bhat, whose span estimates span(B).
The method central implements the Ensemble Conditional Variance Estimation
(ECVE) as described in [2]. It augments the CVE method by applying an
ensemble of functions (parameter func_list
) to the response to
estimate the central subspace. This corresponds to the generalization
F(Y|X) = F(Y|B'X)
or, equivalently,
Y = g(B'X, ε)
where F is the conditional cumulative distribution function.
cve(formula, data, method = "mean", max.dim = 10L, ...)
formula |
an object of class |
data |
an optional data frame, containing the data for the formula if
supplied like |
method |
This character string specifies the method of fitting. The options are
|
max.dim |
upper bounds for |
... |
optional parameters passed on to |
an S3 object of class cve
with components:
design matrix of predictor vector used for calculating cve-estimate,
n-dimensional vector of responses used for calculating cve-estimate,
Name of used method,
the matched call,
list of components V, L, B, loss, h
for
each k = min.dim, ..., max.dim
. If k
was supplied in the
call min.dim = max.dim = k
.
B
is the cve-estimate with dimension
p x k.
V
is the orthogonal complement of B.
L
is the loss for each sample seperatels such that
it's mean is loss
.
loss
is the value of the target function that is
minimized, evaluated at V.
h
bandwidth parameter used to calculate
B, V, loss, L
.
[1] Fertl, L. and Bura, E. (2021) "Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.08782>
[2] Fertl, L. and Bura, E. (2021) "Ensemble Conditional Variance Estimation for Sufficient Dimension Reduction" <arXiv:2102.13435>
For a detailed description of formula
see
formula
.
# set dimensions for simulation model p <- 5 k <- 2 # create B for simulation b1 <- rep(1 / sqrt(p), p) b2 <- (-1)^seq(1, p) / sqrt(p) B <- cbind(b1, b2) # sample size n <- 100 set.seed(21) # creat predictor data x ~ N(0, I_p) x <- matrix(rnorm(n * p), n, p) # simulate response variable # y = f(B'x) + err # with f(x1, x2) = x1^2 + 2 * x2 and err ~ N(0, 0.25^2) y <- (x %*% b1)^2 + 2 * (x %*% b2) + 0.25 * rnorm(n) # calculate cve with method 'mean' for k unknown in 1, ..., 3 cve.obj.s <- cve(y ~ x, max.dim = 2) # default method 'mean' # calculate cve with method 'weighed' for k = 2 cve.obj.w <- cve(y ~ x, k = 2, method = 'weighted.mean') B2 <- coef(cve.obj.s, k = 2) # get projected X data (same as cve.obj.s$X %*% B2) proj.X <- directions(cve.obj.s, k = 2) # plot y against projected data plot(proj.X[, 1], y) plot(proj.X[, 2], y) # creat 10 new x points and y according to model x.new <- matrix(rnorm(10 * p), 10, p) y.new <- (x.new %*% b1)^2 + 2 * (x.new %*% b2) + 0.25 * rnorm(10) # predict y.new yhat <- predict(cve.obj.s, x.new, 2) plot(y.new, yhat) # projection matrix on span(B) # same as B %*% t(B) since B is semi-orthogonal PB <- B %*% solve(t(B) %*% B) %*% t(B) # cve estimates for B with mean and weighted method B.s <- coef(cve.obj.s, k = 2) B.w <- coef(cve.obj.w, k = 2) # same as B.s %*% t(B.s) since B.s is semi-orthogonal (same vor B.w) PB.s <- B.s %*% solve(t(B.s) %*% B.s) %*% t(B.s) PB.w <- B.w %*% solve(t(B.w) %*% B.w) %*% t(B.w) # compare estimation accuracy of mean and weighted cve estimate by # Frobenius norm of difference of projections. norm(PB - PB.s, type = 'F') norm(PB - PB.w, type = 'F')
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.