Plot random random variable importances
Plot variable importances from random permutations of class labels and the variable importances from the original data set.
randomVarImpsRFplot(randomImportances, forest, whichImp = "impsUnscaled", nvars = NULL, show.var.names = FALSE, vars.highlight = NULL, main = NULL, screeRandom = TRUE, lwdBlack = 1.5, lwdRed = 2, lwdLightblue = 1, cexPoint = 1, overlayTrue = FALSE, xlab = NULL, ylab = NULL, ...)
randomImportances |
A list with a structure such as the object
return by |
.
forest |
A random forest fitted to the original data. This forest
must have been fitted with |
whichImp |
The importance measue to use. One (only one) of
|
nvars |
If NULL will show the plot for the complete range of variables. If an integer, will plot only the most important nvars. |
show.var.names |
If TRUE, show the variable names in the plot. Unless you are plotting few variables, it probably won't be of any use. |
vars.highlight |
A vector indicating the variables to highlight in the plot with a vertical blue segment. You need to pass here a vector of variable names, not variable positions. |
main |
The title for the plot. |
screeRandom |
If TRUE, order all the variable importances (i.e., those from both the original and the permuted class labels data sets) from largest to smallest before plotting. The plot will thus resemble a usual "scree plot". |
lwdBlack |
The width of the line to use for the importances from the original data set. |
lwdRed |
The width of the line to use for the average of the importances for the permuted data sets. |
lwdLightblue |
The width of the line for the importances for the individual permuted data sets. |
cexPoint |
|
overlayTrue |
If TRUE, the variable importance from the original data set will be plotted last, so you can see it even if buried in the middle of many gree lines; can be of help when the plot does not allow you to see the black line. |
xlab |
The title for the x-axis (see |
ylab |
The title for the y-axis (see |
... |
Additional arguments to plot. |
Only used for its side effects of producing plots. In particular, you will see lines of three colors:
black |
Connects the variable importances from the original simulated data. |
green |
Connect the variable
importances from the data sets with permuted class labels; there
will be as many lines as |
red |
Connects the average of the importances from the permuted data sets. |
Additionally, if you used a valid set of values for
vars.highlight
, these will be shown with a vertical blue
segment.
These plots resemble the scree plots commonly used with principal component analysis, and the actual choice of colors was taken from the importance spectrum plots of Friedman \& Meulman.
Ramon Diaz-Uriarte rdiaz02@gmail.com
Breiman, L. (2001) Random forests. Machine Learning, 45, 5–32.
Diaz-Uriarte, R. , Alvarez de Andres, S. (2005) Variable selection from random forests: application to gene expression data. Tech. report. http://ligarto.org/rdiaz/Papers/rfVS/randomForestVarSel.html
Friedman, J., Meulman, J. (2005) Clustering objects on subsets of attributes (with discussion). J. Royal Statistical Society, Series B, 66, 815–850.
x <- matrix(rnorm(45 * 30), ncol = 30) x[1:20, 1:2] <- x[1:20, 1:2] + 2 colnames(x) <- paste0("V", seq.int(ncol(x))) cl <- factor(c(rep("A", 20), rep("B", 25))) rf <- randomForest(x, cl, ntree = 200, importance = TRUE) rf.rvi <- randomVarImpsRF(x, cl, rf, numrandom = 20, usingCluster = FALSE) randomVarImpsRFplot(rf.rvi, rf) op <- par(las = 2) randomVarImpsRFplot(rf.rvi, rf, show.var.names = TRUE) par(op) ## Not run: ## identical, but using a cluster ## make a small cluster, for the sake of illustration psockCL <- makeCluster(2, "PSOCK") clusterSetRNGStream(psockCL, iseed = 789) clusterEvalQ(psockCL, library(varSelRF)) rf.rvi <- randomVarImpsRF(x, cl, rf, numrandom = 20, usingCluster = TRUE, TheCluster = psockCL) randomVarImpsRFplot(rf.rvi, rf) stopCluster(psockCL) ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.