Previous chapter
The Model ZooTuning Linear Models

Linear Model Selection and Regularization

Although linear regression models have no specific tuning parameter per se we can still try to vary the predictors in the model to see if the expected out-of-sample error goes down. We can perform a step-wise regression which performs a greedy search for all possible predictor combinations of the model and returns the one with the lowest AIC. The Akaike Information Criterion is a measure for the relative Goodness-of-Fit for the model and corrected with the number of parameters. Given that models perform similarly but have a different number of predictors it prefers the one with less predictors.

The subset selection either goes

  • forward: starting with an empty model and adding predictors or
  • backward: starting with the full model and removing predictors.
  • exhaustive: trying all possible subset combinations to find the optimal one.

The subset selection estimates all possible models by either adding (removing) parameters to (from) the model and chooses the one which increases model quality the most. If model quality cannot be increased the algorithm stops. Once the subset is found a model is then fit using least squares on the reduced set of variables. The subset found can also be used with different (non-linear) model families.

We can compare the resulting model using the vanilla R way

mod_full <- lm(mpg ~ ., data = mtcars)
mod_step <- stats::step(mod_full)

AIC(mod_full)
AIC(mod_step)

Ridge- and Lasso Regression

Shrinkage models take a different approach and regularize model coefficients through a shrinkage parameter \(λ\). Both, Ridge- and Lasso-regression take a similar approach with the only difference that Ridge minimizes the \(L_2\) Norm (squared error) whereas Lasso the \(L_1\) Norm (absolute error). This also results in different parameter estimates: Ridge parameters are reduced smoothly whereas Lasso parameters tend to be reduced to exactly zero. This makes Lasso regression also a viable choice for feature selection.

We first fit a Ridge regression model using glmnet and parsnip. Note, that a glmnet model is a mixture between a Ridge- and a Lasso regression. The corresponding model can be set through the mixture parameter which specifies the proportion of the Lasso model in the final mixture. We first fit a pure Ridge model using

mod_ridge <- linear_reg(mixture = 0) %>% 
  set_engine("glmnet") %>%
  fit(mpg ~ ., data = mtcars)

The resulting ridge model also returns a set of results for each corresponding \(λ\). We can therefore use the multi_predict function which returns the predictions for each \(λ\):

mod_ridge %>% 
  multi_predict(glmn_fit, new_data = mtcars) %>% 
  bind_cols(mtcars) %>% 
  unnest(.pred) %>% 
  group_by(penalty) %>% 
  rmse(mpg, .pred) %>% 
  ggplot() + 
  geom_line(aes(penalty, .estimate)) + 
  ylab("Root-Mean-Squared Error")

It is easy to see that the error is lowest if the penalty equals to zero - this corresponds to a vanilla linear regression model.

Optimization using tune

So far we have only optimized one model without any resampling and thus got a penalty=0 or a linear regression model as the best fitting model. What is still missing is a structured approach to specify the tuning grid and parameter search itself. The package tune is still under development and can be installed through

#remotes::install_github("tidymodels/tune")
library(tune)

Since mtcars has very few data points omit the initial split and directly perform a cross-validation on the data set.

First, we specify the linear regression model and instead of specifying the parameters penalty and mixture directly we use tune() as a placeholder:

mod_glmnet <-
  linear_reg(penalty = tune(), mixture = tune()) %>%
  set_engine("glmnet")

Next, we specify a workflow() to be used by tune and implemented in the workflows package:

library(workflows)
mtcars_wflow <-
  workflow() %>%
  add_formula(mpg ~ .) %>%
  add_model(mod_glmnet)
mtcars_wflow
## ══ Workflow ═══════════════════════════════════════════════════════════════
## Preprocessor: Formula
## Model: linear_reg()
## 
## ── Preprocessor ───────────────────────────────────────────────────────────
## mpg ~ .
## 
## ── Model ──────────────────────────────────────────────────────────────────
## Linear Regression Model Specification (regression)
## 
## Main Arguments:
##   penalty = tune()
##   mixture = tune()
## 
## Computational engine: glmnet

Before we can get started we specify a tuning grid as

library(dials)
grid_df <- grid_regular(mtcars_wflow, levels = c(10, 3))
grid_df
## # A tibble: 30 x 2
##          penalty mixture
##            <dbl>   <dbl>
##  1 0.0000000001     0.05
##  2 0.00000000129    0.05
##  3 0.0000000167     0.05
##  4 0.000000215      0.05
##  5 0.00000278       0.05
##  6 0.0000359        0.05
##  7 0.000464         0.05
##  8 0.00599          0.05
##  9 0.0774           0.05
## 10 1                0.05
## # … with 20 more rows

and a cross-validation resampling:

cv_splits <- vfold_cv(mtcars, v = 3)

Finally, we can optimize our glmnet model and get a nice output through autoplot():

mtcars_glmnet <- tune_grid(mtcars_wflow, 
                         resamples = cv_splits, 
                         grid = grid_df, 
                         control = control_grid(verbose = TRUE))
mtcars_glmnet %>% autoplot()

mtcars_glmnet %>% 
  unnest(.metrics) %>% 
  filter(.metric=="rmse") %>% 
  group_by(penalty, mixture) %>% 
  summarize(est = mean(.estimate))

See also ?autoplot.tune_results for more info.

–> –>