Subsets/filters a data frame and drops the unused levels.
Subsets/filters a data frame and drops the unused levels.
Subset(x, subset, select, drop = FALSE, resetRownames = TRUE, ...) filterD(x, ..., except = NULL)
x |
A data frame. |
subset |
A logical expression that indicates elements or rows to keep: missing values are taken as false. |
select |
An expression, that indicates columns to select from a data frame. |
drop |
passed on to |
resetRownames |
A logical that indicates if the rownames should be reset after the subsetting ( |
... |
further arguments to be passed to or from other methods. |
except |
Indices of columns from which NOT to drop levels. |
Newbie students using R expect that when a factor variable is subsetted with subset
or filtered with filter
that any original levels that are no longer used after the subsetting or filtering will be ignored. This, however, is not the case and often results in tables with empty cells and figures with empty bars. One remedy is to use drop.levels
from gdata immediately following the subset
or filter
call. This generally becomes a repetitive sequence for most newbie students; thus, Subset
and filterD
incorporate these two functions into one function.
Subset
is a wrapper to subset
with a catch for non-data.frames and a specific call to drop.levels
just before the data.frame is returned. I also added an argument to allow resetting the row names. filterD
is a wrapper for filter
from dplyr followed by drop.levels
just before the data.frame is returned. Otherwise, there is no new code here.
These functions are used only for data frames.
A data frame with the subsetted rows and selected variables.
Basic Data Manipulations.
Derek H. Ogle, derek@derekogle.com
See subset
and filter
from dplyr for similar functionality. See drop.levels
in gdata and droplevels
for related functionality.
## The problem -- note use of unused level in the final table. levels(iris$Species) iris.set1 <- subset(iris,Species=="setosa" | Species=="versicolor") levels(iris.set1$Species) xtabs(~Species,data=iris) ## A simpler fix using Subset iris.set2 <- Subset(iris,Species=="setosa" | Species=="versicolor") levels(iris.set2$Species) xtabs(~Species,data=iris.set2) ## A simpler fix using filterD iris.set3 <- filterD(iris,Species=="setosa" | Species=="versicolor") levels(iris.set3$Species) xtabs(~Species,data=iris.set3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.