SFSI: solveEN – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

solveEN

Coordinate Descent algorithm to solve Elastic-Net-type problems

Description

Computes the entire Elastic-Net solution for the regression coefficients for all values of the penalization parameter, via the Coordinate Descent (CD) algorithm (Friedman, 2007). It uses as inputs a variance matrix among predictors and a covariance vector between response and predictors

Usage

solveEN(P, v, alpha = 1, lambda = NULL, nLambda = 100, 
   minLambda = .Machine$double.eps^0.5, scale = TRUE, 
   tol = 1e-05, maxIter = 1000, verbose = FALSE)

Arguments

`P`	(numeric matrix) Variance-covariance matrix of predictors. It can be of the "float32" type as per the 'float' R-package
`v`	(numeric vector) Covariance between response variable and predictors
`lambda`	(numeric vector) Penalization parameter sequence. Default is `lambda=NULL`, in this case a decreasing grid of `'nLambda'` lambdas will be generated starting from a maximum equal to max(abs(v)/alpha) to a minimum equal to zero. If `alpha=0` the grid is generated starting from a maximum equal to 5
`nLambda`	(integer) Number of lambdas generated when `lambda=NULL`
`minLambda`	(numeric) Minimum value of lambda that are generated when `lambda=NULL`
`alpha`	(numeric) Value between 0 and 1 for the weights given to the L1 and L2-penalties
`scale`	`TRUE` or `FALSE` to scale matrix `P` for variables with unit variance and scale `v` by the standard deviation of the corresponding predictor taken from the diagonal of `P`
`tol`	(numeric) Maximum error between two consecutive solutions of the CD algorithm to declare convergence
`maxIter`	(integer) Maximum number of iterations to run the CD algorithm at each lambda step before convergence is reached
`verbose`	`TRUE` or `FALSE` to whether printing each CD step

Details

Finds solutions for the regression coefficients in a linear model

y_i = x'_iβ + e_i

where y_i is the response for the i^th observation, x_i=(x_i1,...,x_ip)' is a vector of p predictors assumed to have unit variance, β=(β₁,...,β_p)' is a vector of regression coefficients, and e_i is a residual.

The regression coefficients β are estimated as function of the variance matrix among predictors (P) and the covariance vector between response and predictors (v) by minimizing the penalized mean squared error function

-v' β + 1/2 β' P β + λ J(β)

where λ is the penalization parameter and J(β) is a penalty function given by

1/2(1-α)||β||₂² + α||β||₁

where 0 ≤ α ≤ 1, and ||β||₁ = ∑|β_j| and ||β||₂² = ∑β_j² are the L1 and (squared) L2-norms, respectively.

The "partial residual" excluding the contribution of the predictor x_ij is

e_i^(j) = y_i - x'_iβ + x_ijβ_j

then the ordinary least-squares (OLS) coefficient of x_ij on this residual is (up-to a constant)

β_j^(ols) = v_j - P'_jβ + β_j

where v_j is the j^th element of v and P_j is the j^th column of the matrix P.

Coefficients are updated for each j=1,...,p from their current value β_j to a new value β_j(α,λ), given α and λ, by "soft-thresholding" their OLS estimate until convergence as fully described in Friedman (2007).

Value

Returns a list object containing the elements:

lambda: (vector) all the sequence of values of the penalty.
beta: (matrix) regression coefficients for each predictor (in columns) associated to each value of the penalization parameter lambda (in rows).
df: (vector) degrees of freedom, number of non-zero predictors associated to each value of lambda.

The returned object is of the class 'LASSO' for which methods fitted exist. Function 'plotPath' can be also used

Author(s)

Marco Lopez-Cruz (maraloc@gmail.com) and Gustavo de los Campos

References

Friedman J, Hastie T, Höfling H, Tibshirani R (2007). Pathwise coordinate optimization. The Annals of Applied Statistics, 1(2), 302–332.

Hoerl AE, Kennard RW (1970). Ridge Regression: Biased estimation for nonorthogonal problems. Technometrics, 12(1), 55–67.

Tibshirani R (1996). Regression shrinkage and selection via the LASSO. Journal of the Royal Statistical Society B, 58(1), 267–288.

Zou H, Hastie T (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society B, 67(2), 301–320.

Examples

require(SFSI)
  data(wheatHTP)
  
  y = as.vector(Y[,"YLD"])  # Response variable
  X = scale(WL)             # Predictors

  # Training and testing sets
  tst = 1:ceiling(0.3*length(y))
  trn = seq_along(y)[-tst]

  # Calculate covariances in training set
  XtX = var(X[trn,])
  Xty = cov(y[trn],X[trn,])
  
  # Run the penalized regression
  fm1 = solveEN(XtX,Xty,alpha=0.5) 
  
  # Predicted values
  yHat1 = fitted(fm1, X=X[trn,])  # training data
  yHat2 = fitted(fm1, X=X[tst,])  # testing data
  
  # Penalization vs correlation
  plot(-log(fm1$lambda),cor(y[trn],yHat1)[1,], main="training")
  plot(-log(fm1$lambda),cor(y[tst],yHat2)[1,], main="testing")
  
  
  if(requireNamespace("float")){
   # Using a 'float' type variable
   XtX2 = float::fl(XtX)
   fm2 = solveEN(XtX2,Xty,alpha=0.5)  
   max(abs(fm1$beta-fm2$beta))  # Check for discrepances
  }

SFSI

Sparse Family and Selection Index

v0.3.0

GPL-3

Authors

Marco Lopez-Cruz [aut, cre], Gustavo de los Campos [aut], Paulino Perez-Rodriguez [ctb]

Initial release

2021-04-29