vip: vi_permute – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

vip

vi_permute

Permutation-based variable importance

Description

Compute permutation-based variable importance scores for the predictors in a model.

Usage

vi_permute(object, ...)

## Default S3 method:
vi_permute(
  object,
  feature_names = NULL,
  train = NULL,
  target = NULL,
  metric = NULL,
  smaller_is_better = NULL,
  type = c("difference", "ratio"),
  nsim = 1,
  keep = TRUE,
  sample_size = NULL,
  sample_frac = NULL,
  reference_class = NULL,
  pred_fun = NULL,
  pred_wrapper = NULL,
  verbose = FALSE,
  progress = "none",
  parallel = FALSE,
  paropts = NULL,
  ...
)

Arguments

`object`	A fitted model object (e.g., a `"randomForest"` object).
`...`	Additional optional arguments. (Currently ignored.)
`feature_names`	Character string giving the names of the predictor variables (i.e., features) of interest. If `NULL` (the default) then the internal 'get_feature_names()' function will be called to try and extract them automatically. It is good practice to always specify this argument.
`train`	A matrix-like R object (e.g., a data frame or matrix) containing the training data. If `NULL` (the default) then the internal 'get_training_data()' function will be called to try and extract it automatically. It is good practice to always specify this argument.
`target`	Either a character string giving the name (or position) of the target column in `train` or, if `train` only contains feature columns, a vector containing the target values used to train `object`.
`metric`	Either a function or character string specifying the performance metric to use in computing model performance (e.g., RMSE for regression or accuracy for binary classification). If `metric` is a function, then it requires two arguments, `actual` and `predicted`, and should return a single, numeric value. Ideally, this should be the same metric that was used to train `object`. See `list_metrics` for a list of built-in metrics.
`smaller_is_better`	Logical indicating whether or not a smaller value of `metric` is better. Default is `NULL`. Must be supplied if `metric` is a user-supplied function.
`type`	Character string specifying how to compare the baseline and permuted performance metrics. Current options are `"difference"` (the default) and `"ratio"`.
`nsim`	Integer specifying the number of Monte Carlo replications to perform. Default is 1. If `nsim > 1`, the results from each replication are simply averaged together (the standard deviation will also be returned).
`keep`	Logical indicating whether or not to keep the individual permutation scores for all `nsim` repetitions. If `TRUE` (the default) then the individual variable importance scores will be stored in an attribute called `"raw_scores"`. (Only used when `nsim > 1`.)
`sample_size`	Integer specifying the size of the random sample to use for each Monte Carlo repetition. Default is `NULL` (i.e., use all of the available training data). Cannot be specified with `sample_frac`. Can be used to reduce computation time with large data sets.
`sample_frac`	Proportion specifying the size of the random sample to use for each Monte Carlo repetition. Default is `NULL` (i.e., use all of the available training data). Cannot be specified with `sample_size`. Can be used to reduce computation time with large data sets.
`reference_class`	Character string specifying which response category represents the "reference" class (i.e., the class for which the predicted class probabilities correspond to). Only needed for binary classification problems.
`pred_fun`	Deprecated. Use `pred_wrapper` instead.
`pred_wrapper`	Prediction function that requires two arguments, `object` and `newdata`. The output of this function should be determined by the `metric` being used: Regression A numeric vector of predicted outcomes. Binary classification A vector of predicted class labels (e.g., if using misclassification error) or a vector of predicted class probabilities for the reference class (e.g., if using log loss or AUC). Multiclass classification A vector of predicted class labels (e.g., if using misclassification error) or a A matrix/data frame of predicted class probabilities for each class (e.g., if using log loss or AUC).
`verbose`	Logical indicating whether or not to print information during the construction of variable importance scores. Default is `FALSE`.
`progress`	Character string giving the name of the progress bar to use. See `create_progress_bar` for details. Default is `"none"`.
`parallel`	Logical indicating whether or not to run `vi_permute()` in parallel (using a backend provided by the `foreach` package). Default is `FALSE`. If `TRUE`, an appropriate backend must be provided by `foreach`.
`paropts`	List containing additional options to be passed on to `foreach` when `parallel = TRUE`.

Details

Coming soon!

Value

A tidy data frame (i.e., a "tibble" object) with two columns: Variable and Importance.

Examples

## Not run: 
# Load required packages
library(ggplot2)  # for ggtitle() function
library(nnet)     # for fitting neural networks

# Simulate training data
trn <- gen_friedman(500, seed = 101)  # ?vip::gen_friedman

# Inspect data
tibble::as_tibble(trn)

# Fit PPR and NN models (hyperparameters were chosen using the caret package
# with 5 repeats of 5-fold cross-validation)
pp <- ppr(y ~ ., data = trn, nterms = 11)
set.seed(0803) # for reproducibility
nn <- nnet(y ~ ., data = trn, size = 7, decay = 0.1, linout = TRUE,
           maxit = 500)

# Plot VI scores
set.seed(2021)  # for reproducibility
p1 <- vip(pp, method = "permute", target = "y", metric = "rsquared",
          pred_wrapper = predict) + ggtitle("PPR")
p2 <- vip(nn, method = "permute", target = "y", metric = "rsquared",
          pred_wrapper = predict) + ggtitle("NN")
grid.arrange(p1, p2, ncol = 2)

# Mean absolute error
mae <- function(actual, predicted) {
  mean(abs(actual - predicted))
}

# Permutation-based VIP with user-defined MAE metric
set.seed(1101)  # for reproducibility
vip(pp, method = "permute", target = "y", metric = mae,
    smaller_is_better = TRUE,
    pred_wrapper = function(object, newdata) predict(object, newdata)
) + ggtitle("PPR")

## End(Not run)

vip

Variable Importance Plots

v0.3.2

GPL (>= 2)

Authors

Brandon Greenwell [aut, cre] (<https://orcid.org/0000-0002-8120-0084>), Brad Boehmke [aut] (<https://orcid.org/0000-0002-3611-8516>), Bernie Gray [aut] (<https://orcid.org/0000-0001-9190-6032>)

Initial release