Gradient Boosting with Component-wise Linear Models
Gradient boosting for optimizing arbitrary loss functions where component-wise linear models are utilized as base-learners.
## S3 method for class 'formula' glmboost(formula, data = list(), weights = NULL, offset = NULL, family = Gaussian(), na.action = na.pass, contrasts.arg = NULL, center = TRUE, control = boost_control(), oobweights = NULL, ...) ## S3 method for class 'matrix' glmboost(x, y, center = TRUE, weights = NULL, offset = NULL, family = Gaussian(), na.action = na.pass, control = boost_control(), oobweights = NULL, ...) ## Default S3 method: glmboost(x, ...)
formula |
a symbolic description of the model to be fit. |
data |
a data frame containing the variables in the model. |
weights |
an optional vector of weights to be used in the fitting process. |
offset |
a numeric vector to be used as offset (optional). |
family |
a |
na.action |
a function which indicates what should happen when the data
contain |
contrasts.arg |
a list, whose entries are contrasts suitable for input
to the |
center |
logical indicating of the predictor variables are centered before fitting. |
control |
a list of parameters controlling the algorithm. For
more details see |
oobweights |
an additional vector of out-of-bag weights, which is
used for the out-of-bag risk (i.e., if |
x |
design matrix. Sparse matrices of class |
y |
vector of responses. |
... |
additional arguments passed to |
A (generalized) linear model is fitted using a boosting algorithm based on component-wise univariate linear models. The fit, i.e., the regression coefficients, can be interpreted in the usual way. The methodology is described in Buehlmann and Yu (2003), Buehlmann (2006), and Buehlmann and Hothorn (2007). Examples and further details are given in Hofner et al (2014).
Peter Buehlmann and Bin Yu (2003), Boosting with the L2 loss: regression and classification. Journal of the American Statistical Association, 98, 324–339.
Peter Buehlmann (2006), Boosting for high-dimensional linear models. The Annals of Statistics, 34(2), 559–583.
Peter Buehlmann and Torsten Hothorn (2007), Boosting algorithms: regularization, prediction and model fitting. Statistical Science, 22(4), 477–505.
Torsten Hothorn, Peter Buehlmann, Thomas Kneib, Mattthias Schmid and Benjamin Hofner (2010), Model-based Boosting 2.0. Journal of Machine Learning Research, 11, 2109–2113.
Benjamin Hofner, Andreas Mayr, Nikolay Robinzonov and Matthias Schmid
(2014). Model-based Boosting in R: A Hands-on Tutorial Using the R
Package mboost. Computational Statistics, 29, 3–35.
doi: 10.1007/s00180-012-0382-5
Available as vignette via: vignette(package = "mboost", "mboost_tutorial")
See mboost_fit
for the generic boosting function,
gamboost
for boosted additive models, and
blackboost
for boosted trees.
See baselearners
for possible base-learners.
See cvrisk
for cross-validated stopping iteration.
Furthermore see boost_control
, Family
and
methods
.
### a simple two-dimensional example: cars data cars.gb <- glmboost(dist ~ speed, data = cars, control = boost_control(mstop = 2000), center = FALSE) cars.gb ### coefficients should coincide cf <- coef(cars.gb, off2int = TRUE) ## add offset to intercept coef(cars.gb) + c(cars.gb$offset, 0) ## add offset to intercept (by hand) signif(cf, 3) signif(coef(lm(dist ~ speed, data = cars)), 3) ## almost converged. With higher mstop the results get even better ### now we center the design matrix for ### much quicker "convergence" cars.gb_centered <- glmboost(dist ~ speed, data = cars, control = boost_control(mstop = 2000), center = TRUE) ## plot coefficient paths of glmboost par(mfrow=c(1,2), mai = par("mai") * c(1, 1, 1, 2.5)) plot(cars.gb, main = "without centering") plot(cars.gb_centered, main = "with centering") ### alternative loss function: absolute loss cars.gbl <- glmboost(dist ~ speed, data = cars, control = boost_control(mstop = 1000), family = Laplace()) cars.gbl coef(cars.gbl, off2int = TRUE) ### plot fit par(mfrow = c(1,1)) plot(dist ~ speed, data = cars) lines(cars$speed, predict(cars.gb), col = "red") ## quadratic loss lines(cars$speed, predict(cars.gbl), col = "green") ## absolute loss ### Huber loss with adaptive choice of delta cars.gbh <- glmboost(dist ~ speed, data = cars, control = boost_control(mstop = 1000), family = Huber()) lines(cars$speed, predict(cars.gbh), col = "blue") ## Huber loss legend("topleft", col = c("red", "green", "blue"), lty = 1, legend = c("Gaussian", "Laplace", "Huber"), bty = "n") ### sparse high-dimensional example that makes use of the matrix ### interface of glmboost and uses the matrix representation from ### package Matrix library("Matrix") n <- 100 p <- 10000 ptrue <- 10 X <- Matrix(0, nrow = n, ncol = p) X[sample(1:(n * p), floor(n * p / 20))] <- runif(floor(n * p / 20)) beta <- numeric(p) beta[sample(1:p, ptrue)] <- 10 y <- drop(X %*% beta + rnorm(n, sd = 0.1)) mod <- glmboost(y = y, x = X, center = TRUE) ### mstop needs tuning coef(mod, which = which(beta > 0))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.