xgboost: xgb.importance – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

xgboost

xgb.importance

Importance of features in a model.

Description

Creates a data.table of feature importances in a model.

Usage

xgb.importance(
  feature_names = NULL,
  model = NULL,
  trees = NULL,
  data = NULL,
  label = NULL,
  target = NULL
)

Arguments

`feature_names`	character vector of feature names. If the model already contains feature names, those would be used when `feature_names=NULL` (default value). Non-null `feature_names` could be provided to override those in the model.
`model`	object of class `xgb.Booster`.
`trees`	(only for the gbtree booster) an integer vector of tree indices that should be included into the importance calculation. If set to `NULL`, all trees of the model are parsed. It could be useful, e.g., in multiclass classification to get feature importances for each class separately. IMPORTANT: the tree index in xgboost models is zero-based (e.g., use `trees = 0:4` for first 5 trees).
`data`	deprecated.
`label`	deprecated.
`target`	deprecated.

Details

This function works for both linear and tree models.

For linear models, the importance is the absolute magnitude of linear coefficients. For that reason, in order to obtain a meaningful ranking by importance for a linear model, the features need to be on the same scale (which you also would want to do when using either L1 or L2 regularization).

Value

For a tree model, a data.table with the following columns:

Features names of the features used in the model;
Gain represents fractional contribution of each feature to the model based on the total gain of this feature's splits. Higher percentage means a more important predictive feature.
Cover metric of the number of observation related to this feature;
Frequency percentage representing the relative number of times a feature have been used in trees.

A linear model's importance data.table has the following columns:

Features names of the features used in the model;
Weight the linear coefficient of this feature;
Class (only for multiclass models) class label.

If feature_names is not provided and model doesn't have feature_names, index of the features will be used instead. Because the index is extracted from the model dump (based on C++ code), it starts at 0 (as in C/C++ or Python) instead of 1 (usual in R).

Examples

# binomial classification using gbtree:
data(agaricus.train, package='xgboost')
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, max_depth = 2,
               eta = 1, nthread = 2, nrounds = 2, objective = "binary:logistic")
xgb.importance(model = bst)

# binomial classification using gblinear:
bst <- xgboost(data = agaricus.train$data, label = agaricus.train$label, booster = "gblinear",
               eta = 0.3, nthread = 1, nrounds = 20, objective = "binary:logistic")
xgb.importance(model = bst)

# multiclass classification using gbtree:
nclass <- 3
nrounds <- 10
mbst <- xgboost(data = as.matrix(iris[, -5]), label = as.numeric(iris$Species) - 1,
               max_depth = 3, eta = 0.2, nthread = 2, nrounds = nrounds,
               objective = "multi:softprob", num_class = nclass)
# all classes clumped together:
xgb.importance(model = mbst)
# inspect importances separately for each class:
xgb.importance(model = mbst, trees = seq(from=0, by=nclass, length.out=nrounds))
xgb.importance(model = mbst, trees = seq(from=1, by=nclass, length.out=nrounds))
xgb.importance(model = mbst, trees = seq(from=2, by=nclass, length.out=nrounds))

# multiclass classification using gblinear:
mbst <- xgboost(data = scale(as.matrix(iris[, -5])), label = as.numeric(iris$Species) - 1,
               booster = "gblinear", eta = 0.2, nthread = 1, nrounds = 15,
               objective = "multi:softprob", num_class = nclass)
xgb.importance(model = mbst)

xgboost

Extreme Gradient Boosting

v1.4.1.1

Apache License (== 2.0) | file LICENSE

Authors

Tianqi Chen [aut], Tong He [aut, cre], Michael Benesty [aut], Vadim Khotilovich [aut], Yuan Tang [aut] (<https://orcid.org/0000-0001-5243-233X>), Hyunsu Cho [aut], Kailong Chen [aut], Rory Mitchell [aut], Ignacio Cano [aut], Tianyi Zhou [aut], Mu Li [aut], Junyuan Xie [aut], Min Lin [aut], Yifeng Geng [aut], Yutian Li [aut], XGBoost contributors [cph] (base XGBoost implementation)

Initial release

2021-04-22

xgb.importance

Description

Usage

Arguments

Details

Value

Examples

xgboost

We don't support your browser anymore