## Linear Regression using Tidymodels

In the previous section, we saw how to fit a model with `lm()`

, `predict()`

and retrieve key model figures using `summary()`

.

We will now use the `linear_reg()`

function which is part of the **parsnip** package from the **tidymodels** framework. The advantage of using this function is not only the coverage of regularized models like `glmnet()`

but also the different supported back-ends including **stan**, **keras** or **spark** (see also `?linear_reg`

).

```
linear_reg() %>%
set_engine("lm") %>%
fit(mpg ~ ., data = mtcars)
```

Note, that the coefficients are equal to the vanilla `lm`

fit:

`lm(mpg ~ ., data = mtcars)`

```
##
## Call:
## lm(formula = mpg ~ ., data = mtcars)
##
## Coefficients:
## (Intercept) cyl disp hp drat
## 12.30337 -0.11144 0.01334 -0.02148 0.78711
## wt qsec vs am gear
## -3.71530 0.82104 0.31776 2.52023 0.65541
## carb
## -0.19942
```

Additionally, we can output the call which is performed by **parsnip** under the hood using `translate()`

:

```
linear_reg() %>%
set_engine("lm") %>%
translate()
```

## Parsnip predict

As opposed to the vanilla `lm()`

fitting function the `linear_reg()`

fit looks more complicated at first. However, compared to the `lm()`

fit `linear_reg()`

also different engines/back-ends which can be changed through the parameter in `set_engine()`

.

The fitted model can now be predicted using again the function `predict()`

:

```
mod <- linear_reg() %>%
set_engine("lm") %>%
fit(mpg ~ ., data = mtcars)
predict(mod, new_data = mtcars[1:5, ])
```

The **tidymodels** `predict()`

function has 2 differences to the base `predict()`

:

- The output is a
`tibble`

or `data.frame`

as opposed to a vector. - The parameter for
`new_data`

is `snake_case`

and not optional - thus it needs to be explicitly specified.

## Summary

**parsnip** does not provide a detailed model summary as provided by the vanilla `summary()`

output on the regression model object. The plain-text model output includes the model name, fitting time, the function call object and the coefficients:

```
mod <- linear_reg() %>%
set_engine("lm") %>%
fit(mpg ~ ., data = mtcars)
mod
```

```
## parsnip model object
##
## Fit time: 2ms
##
## Call:
## stats::lm(formula = formula, data = data)
##
## Coefficients:
## (Intercept) cyl disp hp drat
## 12.30337 -0.11144 0.01334 -0.02148 0.78711
## wt qsec vs am gear
## -3.71530 0.82104 0.31776 2.52023 0.65541
## carb
## -0.19942
```

However, it does not include statistical significance of coefficients, variance explained \(R^2\), etc. To retrieve also these model specific statistics you can directly access the model object which **parsnip** stores in a specific list data structure as can be seen here:

`str(mod, max.level = 1)`

```
## List of 5
## $ lvl : NULL
## $ spec :List of 5
## ..- attr(*, "class")= chr [1:2] "linear_reg" "model_spec"
## $ fit :List of 12
## ..- attr(*, "class")= chr "lm"
## $ preproc:List of 1
## $ elapsed: 'proc_time' Named num [1:5] 0.002 0 0.002 0 0
## ..- attr(*, "names")= chr [1:5] "user.self" "sys.self" "elapsed" "user.child" ...
## - attr(*, "class")= chr [1:2] "_lm" "model_fit"
```

To access the fitted model object `$fit`

and compute the standard `summary`

you can use

`summary(mod$fit)`

```
##
## Call:
## stats::lm(formula = formula, data = data)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.4506 -1.6044 -0.1196 1.2193 4.6271
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 12.30337 18.71788 0.657 0.5181
## cyl -0.11144 1.04502 -0.107 0.9161
## disp 0.01334 0.01786 0.747 0.4635
## hp -0.02148 0.02177 -0.987 0.3350
## drat 0.78711 1.63537 0.481 0.6353
## wt -3.71530 1.89441 -1.961 0.0633 .
## qsec 0.82104 0.73084 1.123 0.2739
## vs 0.31776 2.10451 0.151 0.8814
## am 2.52023 2.05665 1.225 0.2340
## gear 0.65541 1.49326 0.439 0.6652
## carb -0.19942 0.82875 -0.241 0.8122
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.65 on 21 degrees of freedom
## Multiple R-squared: 0.869, Adjusted R-squared: 0.8066
## F-statistic: 13.93 on 10 and 21 DF, p-value: 3.793e-07
```

## Plot

To finally plot the model predictions in a tidy fashion with the input data **ggplot2** we now focus on univariate regression models with only one predictor. The further steps include

- As a predictor we choose the weight
`wt`

and fit a new model `mod_wt`

. - Bind the original data with the predicted
`tibble`

. - Create a scatter plot including the regression line.

```
library(dplyr)
library(tibble)
mod_wt <- linear_reg() %>%
set_engine("lm") %>%
fit(mpg ~ wt, data = mtcars)
mtcars %>%
bind_cols(predict(mod_wt, new_data = mtcars)) %>%
ggplot(aes(x = wt)) +
geom_line(aes(y = .pred), group="lm") +
geom_point(aes(y = mpg))
```