extraTrees SuperLearner wrapper
Supports the Extremely Randomized Trees package for SuperLearning, which is a variant of random forest.
SL.extraTrees(Y, X, newX, family, obsWeights, id, ntree = 500, mtry = if (family$family == "gaussian") max(floor(ncol(X)/3), 1) else floor(sqrt(ncol(X))), nodesize = if (family$family == "gaussian") 5 else 1, numRandomCuts = 1, evenCuts = FALSE, numThreads = 1, quantile = FALSE, subsetSizes = NULL, subsetGroups = NULL, tasks = NULL, probOfTaskCuts = mtry/ncol(X), numRandomTaskCuts = 1, verbose = FALSE, ...)
Y |
Outcome variable |
X |
Covariate dataframe |
newX |
Optional dataframe to predict the outcome |
family |
"gaussian" for regression, "binomial" for binary classification. |
obsWeights |
Optional observation-level weights (supported but not tested) |
id |
Optional id to group observations from the same unit (not used currently). |
ntree |
Number of trees (default 500). |
mtry |
Number of features tested at each node. Default is ncol(x) / 3 for regression and sqrt(ncol(x)) for classification. |
nodesize |
The size of leaves of the tree. Default is 5 for regression and 1 for classification. |
numRandomCuts |
the number of random cuts for each (randomly chosen) feature (default 1, which corresponds to the official ExtraTrees method). The higher the number of cuts the higher the chance of a good cut. |
evenCuts |
if FALSE then cutting thresholds are uniformly sampled (default). If TRUE then the range is split into even intervals (the number of intervals is numRandomCuts) and a cut is uniformly sampled from each interval. |
numThreads |
the number of CPU threads to use (default is 1). |
quantile |
if TRUE then quantile regression is performed (default is FALSE), only for regression data. Then use predict(et, newdata, quantile=k) to make predictions for k quantile. |
subsetSizes |
subset size (one integer) or subset sizes (vector of integers, requires subsetGroups), if supplied every tree is built from a random subset of size subsetSizes. NULL means no subsetting, i.e. all samples are used. |
subsetGroups |
list specifying subset group for each sample: from samples in group g, each tree will randomly select subsetSizes[g] samples. |
tasks |
vector of tasks, integers from 1 and up. NULL if no multi-task learning. (untested) |
probOfTaskCuts |
probability of performing task cut at a node (default mtry / ncol(x)). Used only if tasks is specified. (untested) |
numRandomTaskCuts |
number of times task cut is performed at a node (default 1). Used only if tasks is specified. (untested) |
verbose |
Verbosity of model fitting. |
... |
Any remaining arguments (not supported though). |
If Java runs out of memory: java.lang.OutOfMemoryError: Java heap space, then (assuming you have free memory) you can increase the heap size by: options( java.parameters = "-Xmx2g" ) before calling library(extraTrees),
Geurts, P., Ernst, D., & Wehenkel, L. (2006). Extremely randomized trees. Machine learning, 63(1), 3-42.
Simm, J., de Abril, I. M., & Sugiyama, M. (2014). Tree-based ensemble multi-task learning method for classification and regression. IEICE TRANSACTIONS on Information and Systems, 97(6), 1677-1681.
data(Boston, package = "MASS") Y = Boston$medv # Remove outcome from covariate dataframe. X = Boston[, -14] set.seed(1) # Sample rows to speed up example. row_subset = sample(nrow(X), 30) sl = SuperLearner(Y[row_subset], X[row_subset, ], family = gaussian(), cvControl = list(V = 2), SL.library = c("SL.mean", "SL.extraTrees")) print(sl)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.