Split-Apply-Combine Computing
BY
is an S3 generic that efficiently applies functions over vectors or matrix- and data frame columns by groups. Similar to dapply
it seeks to retain the structure and attributes of the data, but can also output to various standard formats. A simple parallelism is also available.
BY(x, ...) ## Default S3 method: BY(x, g, FUN, ..., use.g.names = TRUE, sort = TRUE, expand.wide = FALSE, parallel = FALSE, mc.cores = 1L, return = c("same", "vector", "list")) ## S3 method for class 'matrix' BY(x, g, FUN, ..., use.g.names = TRUE, sort = TRUE, expand.wide = FALSE, parallel = FALSE, mc.cores = 1L, return = c("same", "matrix", "data.frame", "list")) ## S3 method for class 'data.frame' BY(x, g, FUN, ..., use.g.names = TRUE, sort = TRUE, expand.wide = FALSE, parallel = FALSE, mc.cores = 1L, return = c("same", "matrix", "data.frame", "list")) ## S3 method for class 'grouped_df' BY(x, FUN, ..., use.g.names = FALSE, keep.group_vars = TRUE, expand.wide = FALSE, parallel = FALSE, mc.cores = 1L, return = c("same", "matrix", "data.frame", "list"))
x |
a atomic vector, matrix, data frame or alike object. |
g |
a factor, |
FUN |
a function, can be scalar- or vector-valued. |
... |
further arguments to |
use.g.names |
logical. Make group-names and add to the result as names (default method) or row-names (matrix and data frame methods). No row-names are generated for data.table's. |
sort |
logical. Sort the groups? Internally passed to |
expand.wide |
logical. If |
parallel |
logical. |
mc.cores |
integer. Argument to |
return |
an integer or string indicating the type of object to return. The default |
keep.group_vars |
grouped_df method: Logical. |
It is however principally a wrapper around lapply(split(x, g), FUN, ...)
, that strongly optimizes on attribute checking compared to base R functions. For more details look at the documentation for dapply
which works very similar (apart from the splitting performed in BY
). For larger tasks requiring split-apply-combine computing on data frames use dplyr, data.table, or try to work with the Fast Statistical Functions.
BY
is used internally in collap
for functions that are not Fast Statistical Functions.
X
where FUN
was applied to every column split by g
.
v <- iris$Sepal.Length # A numeric vector f <- iris$Species # A factor. Vectors/lists will internally be converted to factor ## default vector method BY(v, f, sum) # Sum by species head(BY(v, f, scale)) # Scale by species (please use fscale instead) head(BY(v, f, scale, use.g.names = FALSE)) # Omitting auto-generated names BY(v, f, quantile) # Species quantiles: by default stacked BY(v, f, quantile, expand.wide = TRUE) # Wide format ## matrix method m <- qM(num_vars(iris)) BY(m, f, sum) # Also return as matrix BY(m, f, sum, return = "data.frame") # Return as data.frame.. also works for computations below head(BY(m, f, scale)) head(BY(m, f, scale, use.g.names = FALSE)) BY(m, f, quantile) BY(m, f, quantile, expand.wide = TRUE) BY(m, f, quantile, expand.wide = TRUE, # Return as list of matrices return = "list") ## data.frame method BY(num_vars(iris), f, sum) # Also returns a data.fram BY(num_vars(iris), f, sum, return = 2) # Return as matrix.. also works for computations below head(BY(num_vars(iris), f, scale)) head(BY(num_vars(iris), f, scale, use.g.names = FALSE)) BY(num_vars(iris), f, quantile) BY(num_vars(iris), f, quantile, expand.wide = TRUE) BY(num_vars(iris), f, quantile, # Return as list of matrices expand.wide = TRUE, return = "list") ## grouped data frame method (faster than dplyr only for small data) library(dplyr) giris <- group_by(iris, Species) giris %>% BY(sum) # Compute sum giris %>% BY(sum, use.g.names = TRUE, # Use row.names and keep.group_vars = FALSE) # remove 'Species' and groups attribute giris %>% BY(sum, return = "matrix") # Return matrix giris %>% BY(sum, return = "matrix", # Matrix with row.names use.g.names = TRUE) giris %>% BY(quantile) # Compute quantiles (output is stacked) giris %>% BY(quantile, # Much better, also keeps 'Species' expand.wide = TRUE)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.