Stability Selection
Selection of influential variables or model components with error control.
## a method to compute stability selection paths for fitted mboost models ## S3 method for class 'mboost' stabsel(x, cutoff, q, PFER, grid = 0:mstop(x), folds = subsample(model.weights(x), B = B), B = ifelse(sampling.type == "MB", 100, 50), assumption = c("unimodal", "r-concave", "none"), sampling.type = c("SS", "MB"), papply = mclapply, verbose = TRUE, FWER, eval = TRUE, ...) ## just a wrapper to stabsel(p, ..., eval = FALSE) ## S3 method for class 'mboost' stabsel_parameters(p, ...)
x, p |
an fitted model of class |
cutoff |
cutoff between 0.5 and 1. Preferably a value between 0.6 and 0.9 should be used. |
q |
number of (unique) selected variables (or groups of variables depending on the model) that are selected on each subsample. |
PFER |
upper bound for the per-family error rate. This specifies the amount of falsely selected base-learners, which is tolerated. See details. |
grid |
a numeric vector of the form |
folds |
a weight matrix with number of rows equal to the number
of observations, see |
assumption |
Defines the type of assumptions on the
distributions of the selection probabilities and simultaneous
selection probabilities. Only applicable for
|
sampling.type |
use sampling scheme of of Shah & Samworth
(2013), i.e., with complementarty pairs ( |
B |
number of subsampling replicates. Per default, we use 50 complementary pairs for the error bounds of Shah & Samworth (2013) and 100 for the error bound derived in Meinshausen & Buehlmann (2010). As we use B complementray pairs in the former case this leads to 2B subsamples. |
papply |
(parallel) apply function, defaults to
|
verbose |
logical (default: |
FWER |
deprecated. Only for compatibility with older versions, use PFER instead. |
eval |
logical. Determines whether stability selection is
evaluated ( |
... |
additional arguments to parallel apply methods such as
|
For details see stabsel
in package stabs
and Hofner et al. (2015).
An object of class stabsel
with a special print
method.
The object has the following elements:
phat |
selection probabilities. |
selected |
elements with maximal selection probability greater
|
max |
maximum of selection probabilities. |
cutoff |
cutoff used. |
q |
average number of selected variables used. |
PFER |
per-family error rate. |
sampling.type |
the sampling type used for stability selection. |
assumption |
the assumptions made on the selection probabilities. |
call |
the call. |
B. Hofner, L. Boccuto and M. Goeker (2015), Controlling false discoveries in high-dimensional situations: Boosting with stability selection. BMC Bioinformatics, 16:144.
N. Meinshausen and P. Buehlmann (2010), Stability selection. Journal of the Royal Statistical Society, Series B, 72, 417–473.
R.D. Shah and R.J. Samworth (2013), Variable selection with error control: another look at stability selection. Journal of the Royal Statistical Society, Series B, 75, 55–80.
## make data set available data("bodyfat", package = "TH.data") ## set seed set.seed(1234) ### low-dimensional example mod <- glmboost(DEXfat ~ ., data = bodyfat) ## compute cutoff ahead of running stabsel to see if it is a sensible ## parameter choice. ## p = ncol(bodyfat) - 1 (= Outcome) + 1 ( = Intercept) stabsel_parameters(q = 3, PFER = 1, p = ncol(bodyfat) - 1 + 1, sampling.type = "MB") ## the same: stabsel(mod, q = 3, PFER = 1, sampling.type = "MB", eval = FALSE) ## Not run: ############################################################ ## Do not run and check these examples automatically as ## they take some time (~ 10 seconds depending on the system) ## now run stability selection (sbody <- stabsel(mod, q = 3, PFER = 1, sampling.type = "MB")) opar <- par(mai = par("mai") * c(1, 1, 1, 2.7)) plot(sbody) par(opar) plot(sbody, type = "maxsel", ymargin = 6) ## End(Not run and test) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.