Subsets/filters a data frame and drops the unused levels.
Subsets/filters a data frame and drops the unused levels.
Subset(x, subset, select, drop = FALSE, resetRownames = TRUE, ...) filterD(x, ..., except = NULL)
x |
A data frame. |
subset |
A logical expression that indicates elements or rows to keep: missing values are taken as false. |
select |
An expression, that indicates columns to select from a data frame. |
drop |
passed on to |
resetRownames |
A logical that indicates if the rownames should be reset after the subsetting ( |
... |
further arguments to be passed to or from other methods. |
except |
Indices of columns from which NOT to drop levels. |
Newbie students using R expect that when a factor variable is subsetted with subset or filtered with filter that any original levels that are no longer used after the subsetting or filtering will be ignored. This, however, is not the case and often results in tables with empty cells and figures with empty bars. One remedy is to use drop.levels from gdata immediately following the subset or filter call. This generally becomes a repetitive sequence for most newbie students; thus, Subset and filterD incorporate these two functions into one function.
Subset is a wrapper to subset with a catch for non-data.frames and a specific call to drop.levels just before the data.frame is returned. I also added an argument to allow resetting the row names. filterD is a wrapper for filter from dplyr followed by drop.levels just before the data.frame is returned. Otherwise, there is no new code here.
These functions are used only for data frames.
A data frame with the subsetted rows and selected variables.
Basic Data Manipulations.
Derek H. Ogle, derek@derekogle.com
See subset and filter from dplyr for similar functionality. See drop.levels in gdata and droplevels for related functionality.
## The problem -- note use of unused level in the final table. levels(iris$Species) iris.set1 <- subset(iris,Species=="setosa" | Species=="versicolor") levels(iris.set1$Species) xtabs(~Species,data=iris) ## A simpler fix using Subset iris.set2 <- Subset(iris,Species=="setosa" | Species=="versicolor") levels(iris.set2$Species) xtabs(~Species,data=iris.set2) ## A simpler fix using filterD iris.set3 <- filterD(iris,Species=="setosa" | Species=="versicolor") levels(iris.set3$Species) xtabs(~Species,data=iris.set3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.