## Linear Model Selection and Regularization

Although linear regression models have no specific tuning parameter per se we can still try to vary the predictors in the model to see if the expected out-of-sample error goes down. We can perform a step-wise regression which performs a greedy search for all possible predictor combinations of the model and returns the one with the lowest AIC. The Akaike Information Criterion is a measure for the *relative* Goodness-of-Fit for the model and corrected with the number of parameters. Given that models perform similarly but have a different number of predictors it prefers the one with less predictors.

The subset selection either goes

`forward`

: starting with an empty model and adding predictors or`backward`

: starting with the full model and removing predictors.`exhaustive`

: trying all possible subset combinations to find the optimal one.

The subset selection estimates all possible models by either adding (removing) parameters to (from) the model and chooses the one which increases model quality the most. If model quality cannot be increased the algorithm stops. Once the subset is found a model is then fit using least squares on the reduced set of variables. The subset found can also be used with different (non-linear) model families.

We can compare the resulting model using the vanilla R way

```
mod_full <- lm(mpg ~ ., data = mtcars)
mod_step <- stats::step(mod_full)
AIC(mod_full)
AIC(mod_step)
```

## Ridge- and Lasso Regression

Shrinkage models take a different approach and regularize model coefficients through a *shrinkage* parameter \(λ\). Both, Ridge- and Lasso-regression take a similar approach with the only difference that Ridge minimizes the \(L_2\) Norm (squared error) whereas Lasso the \(L_1\) Norm (absolute error). This also results in different parameter estimates: Ridge parameters are reduced smoothly whereas Lasso parameters tend to be reduced to exactly zero. This makes Lasso regression also a viable choice for feature selection.

We first fit a Ridge regression model using `glmnet`

and `parsnip`

. Note, that a `glmnet`

model is a mixture between a Ridge- and a Lasso regression. The corresponding model can be set through the `mixture`

parameter which specifies the proportion of the Lasso model in the final mixture. We first fit a *pure* Ridge model using

```
mod_ridge <- linear_reg(mixture = 0) %>%
set_engine("glmnet") %>%
fit(mpg ~ ., data = mtcars)
```

The resulting ridge model also returns a set of results for each corresponding \(λ\). We can therefore use the `multi_predict`

function which returns the predictions for each \(λ\):

```
mod_ridge %>%
multi_predict(glmn_fit, new_data = mtcars) %>%
bind_cols(mtcars) %>%
unnest(.pred) %>%
group_by(penalty) %>%
rmse(mpg, .pred) %>%
ggplot() +
geom_line(aes(penalty, .estimate)) +
ylab("Root-Mean-Squared Error")
```

It is easy to see that the error is lowest if the penalty equals to zero - this corresponds to a vanilla linear regression model.

## Optimization using `tune`

So far we have only optimized one model without any resampling and thus got a penalty=0 or a linear regression model as the best fitting model. What is still missing is a structured approach to specify the tuning grid and parameter search itself. The package **tune** is still under development and can be installed through

```
#remotes::install_github("tidymodels/tune")
library(tune)
```

Since `mtcars`

has very few data points omit the initial split and directly perform a cross-validation on the data set.

First, we specify the linear regression model and instead of specifying the parameters `penalty`

and `mixture`

directly we use `tune()`

as a placeholder:

```
mod_glmnet <-
linear_reg(penalty = tune(), mixture = tune()) %>%
set_engine("glmnet")
```

Next, we specify a `workflow()`

to be used by **tune** and implemented in the **workflows** package:

```
library(workflows)
mtcars_wflow <-
workflow() %>%
add_formula(mpg ~ .) %>%
add_model(mod_glmnet)
mtcars_wflow
```

```
## ══ Workflow ═══════════════════════════════════════════════════════════════
## Preprocessor: Formula
## Model: linear_reg()
##
## ── Preprocessor ───────────────────────────────────────────────────────────
## mpg ~ .
##
## ── Model ──────────────────────────────────────────────────────────────────
## Linear Regression Model Specification (regression)
##
## Main Arguments:
## penalty = tune()
## mixture = tune()
##
## Computational engine: glmnet
```

Before we can get started we specify a tuning grid as

```
library(dials)
grid_df <- grid_regular(mtcars_wflow, levels = c(10, 3))
grid_df
```

```
## # A tibble: 30 x 2
## penalty mixture
## <dbl> <dbl>
## 1 0.0000000001 0.05
## 2 0.00000000129 0.05
## 3 0.0000000167 0.05
## 4 0.000000215 0.05
## 5 0.00000278 0.05
## 6 0.0000359 0.05
## 7 0.000464 0.05
## 8 0.00599 0.05
## 9 0.0774 0.05
## 10 1 0.05
## # … with 20 more rows
```

and a cross-validation resampling:

`cv_splits <- vfold_cv(mtcars, v = 3)`

Finally, we can optimize our `glmnet`

model and get a nice output through `autoplot()`

:

```
mtcars_glmnet <- tune_grid(mtcars_wflow,
resamples = cv_splits,
grid = grid_df,
control = control_grid(verbose = TRUE))
mtcars_glmnet %>% autoplot()
mtcars_glmnet %>%
unnest(.metrics) %>%
filter(.metric=="rmse") %>%
group_by(penalty, mixture) %>%
summarize(est = mean(.estimate))
```

See also `?autoplot.tune_results`

for more info.

–> –>