SuperLearner: SL.extraTrees – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

SuperLearner

SL.extraTrees

extraTrees SuperLearner wrapper

Description

Supports the Extremely Randomized Trees package for SuperLearning, which is a variant of random forest.

Usage

SL.extraTrees(Y, X, newX, family, obsWeights, id, ntree = 500, mtry = if
  (family$family == "gaussian") max(floor(ncol(X)/3), 1) else
  floor(sqrt(ncol(X))), nodesize = if (family$family == "gaussian") 5 else 1,
  numRandomCuts = 1, evenCuts = FALSE, numThreads = 1, quantile = FALSE,
  subsetSizes = NULL, subsetGroups = NULL, tasks = NULL,
  probOfTaskCuts = mtry/ncol(X), numRandomTaskCuts = 1, verbose = FALSE,
  ...)

Arguments

`Y`	Outcome variable
`X`	Covariate dataframe
`newX`	Optional dataframe to predict the outcome
`family`	"gaussian" for regression, "binomial" for binary classification.
`obsWeights`	Optional observation-level weights (supported but not tested)
`id`	Optional id to group observations from the same unit (not used currently).
`ntree`	Number of trees (default 500).
`mtry`	Number of features tested at each node. Default is ncol(x) / 3 for regression and sqrt(ncol(x)) for classification.
`nodesize`	The size of leaves of the tree. Default is 5 for regression and 1 for classification.
`numRandomCuts`	the number of random cuts for each (randomly chosen) feature (default 1, which corresponds to the official ExtraTrees method). The higher the number of cuts the higher the chance of a good cut.
`evenCuts`	if FALSE then cutting thresholds are uniformly sampled (default). If TRUE then the range is split into even intervals (the number of intervals is numRandomCuts) and a cut is uniformly sampled from each interval.
`numThreads`	the number of CPU threads to use (default is 1).
`quantile`	if TRUE then quantile regression is performed (default is FALSE), only for regression data. Then use predict(et, newdata, quantile=k) to make predictions for k quantile.
`subsetSizes`	subset size (one integer) or subset sizes (vector of integers, requires subsetGroups), if supplied every tree is built from a random subset of size subsetSizes. NULL means no subsetting, i.e. all samples are used.
`subsetGroups`	list specifying subset group for each sample: from samples in group g, each tree will randomly select subsetSizes[g] samples.
`tasks`	vector of tasks, integers from 1 and up. NULL if no multi-task learning. (untested)
`probOfTaskCuts`	probability of performing task cut at a node (default mtry / ncol(x)). Used only if tasks is specified. (untested)
`numRandomTaskCuts`	number of times task cut is performed at a node (default 1). Used only if tasks is specified. (untested)
`verbose`	Verbosity of model fitting.
`...`	Any remaining arguments (not supported though).

Details

If Java runs out of memory: java.lang.OutOfMemoryError: Java heap space, then (assuming you have free memory) you can increase the heap size by: options( java.parameters = "-Xmx2g" ) before calling library(extraTrees),

References

Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.

Simm, J., de Abril, I. M., & Sugiyama, M. (2014). Tree-based ensemble multi-task learning method for classification and regression. IEICE TRANSACTIONS on Information and Systems, 97(6), 1677-1681.

Examples

data(Boston, package = "MASS")
Y = Boston$medv
# Remove outcome from covariate dataframe.
X = Boston[, -14]

set.seed(1)

# Sample rows to speed up example.
row_subset = sample(nrow(X), 30)

sl = SuperLearner(Y[row_subset], X[row_subset, ], family = gaussian(),
cvControl = list(V = 2), SL.library = c("SL.mean", "SL.extraTrees"))

print(sl)

SuperLearner

Super Learner Prediction

v2.0-28

GPL-3

Authors

Eric Polley [aut, cre], Erin LeDell [aut], Chris Kennedy [aut], Sam Lendle [ctb], Mark van der Laan [aut, ths]

Initial release

2021-05-04

SL.extraTrees

Description

Usage

Arguments

Details

References

See Also

Examples

SuperLearner

We don't support your browser anymore