Fast Grouping / collapse Grouping Objects
GRP
performs fast, ordered and unordered, groupings of vectors and data frames (or lists of vectors) using radixorderv
. The output is a list-like object of class 'GRP' which can be printed, plotted and used as an efficient input to all of collapse's fast statistical and transformation functions / operators, as well as to collap
, BY
and TRA
.
fgroup_by
is similar to dplyr::group_by
but faster. It creates a grouped data frame with a 'GRP' object attached - for faster dplyr-like programming with collapse's fast functions.
There are also several conversion methods to convert to and from 'GRP' objects. Notable among these is GRP.grouped_df
, which returns a 'GRP' object from a grouped data frame created with dplyr::group_by
(or fgroup_by
), and the duo GRP.factor
and as.factor_GRP
.
GRP(X, ...) ## Default S3 method: GRP(X, by = NULL, sort = TRUE, decreasing = FALSE, na.last = TRUE, return.groups = TRUE, return.order = FALSE, call = TRUE, ...) ## S3 method for class 'factor' GRP(X, ..., group.sizes = TRUE, drop = FALSE, return.groups = TRUE, call = TRUE) ## S3 method for class 'qG' GRP(X, ..., group.sizes = TRUE, return.groups = TRUE, call = TRUE) ## S3 method for class 'pseries' GRP(X, effect = 1L, ..., group.sizes = TRUE, return.groups = TRUE, call = TRUE) ## S3 method for class 'pdata.frame' GRP(X, effect = 1L, ..., group.sizes = TRUE, return.groups = TRUE, call = TRUE) ## S3 method for class 'grouped_df' GRP(X, ..., return.groups = TRUE, call = TRUE) # Identify, get group names, and convert GRP object to factor is.GRP(x) GRPnames(x, force.char = TRUE) as.factor_GRP(x, ordered = FALSE) # Fast, class-agnostic version of dplyr::group_by for use with fast functions, see details fgroup_by(X, ..., sort = TRUE, decreasing = FALSE, na.last = TRUE, return.order = FALSE) gby(X, ..., sort = TRUE, decreasing = FALSE, na.last = TRUE, return.order = FALSE) # Get grouping columns from a grouped data frame created with dplyr::group_by or fgroup_by fgroup_vars(X, return = "data") # Ungroup grouped data frame created with dplyr::group_by or fgroup_by fungroup(X, ...) ## S3 method for class 'GRP' print(x, n = 6, ...) ## S3 method for class 'GRP' plot(x, breaks = "auto", type = "s", horizontal = FALSE, ...)
X |
a vector, list of columns or data frame (default method), or a classed object (conversion / extractor methods). |
|||||||||||||||||||||||||||||||||||||||||
x |
a GRP object. |
|||||||||||||||||||||||||||||||||||||||||
by |
if |
|||||||||||||||||||||||||||||||||||||||||
sort |
logical. This argument only affects character vectors / columns passed. If |
|||||||||||||||||||||||||||||||||||||||||
ordered |
logical. |
|||||||||||||||||||||||||||||||||||||||||
decreasing |
logical. Should the sort order be increasing or decreasing? Can be a vector of length equal to the number of arguments in |
|||||||||||||||||||||||||||||||||||||||||
na.last |
logical. If missing values are encountered in grouping vector/columns, assign them to the last group (argument passed to |
|||||||||||||||||||||||||||||||||||||||||
return.groups |
logical. Include the unique groups in the created GRP object. |
|||||||||||||||||||||||||||||||||||||||||
return.order |
logical. Include the output from |
|||||||||||||||||||||||||||||||||||||||||
group.sizes |
logical. |
|||||||||||||||||||||||||||||||||||||||||
drop |
logical. |
|||||||||||||||||||||||||||||||||||||||||
call |
logical. |
|||||||||||||||||||||||||||||||||||||||||
force.char |
logical. Always output group names as character vector, even if a single numeric vector was passed to |
|||||||||||||||||||||||||||||||||||||||||
effect |
plm methods: Select which panel identifier should be used as grouping variable. 1L takes the first variable in the |
|||||||||||||||||||||||||||||||||||||||||
return |
an integer or string specifying what
|
n |
integer. Number of groups to print out. |
breaks |
integer. Number of breaks in the histogram of group-sizes. |
type |
linetype for plot. |
horizontal |
logical. |
... |
for |
GRP
is a central function in the collapse package because it provides the key inputs to facilitate easy and efficient groupwise-programming at the C/C++
level: Information about (1) the number of groups (2) an integer group-id indicating which values / rows belong to which group and (3) information about the size of each group. Provided with these informations, collapse's Fast Statistical Functions pre-allocate intermediate and result vectors of the right sizes and (in most cases) perform grouped statistical computations in a single pass through the data.
The sorting and ordering functionality for GRP
only affects (2), that is groups receive different integer-id's depending on whether the groups are sorted sort = TRUE
, and in which order (argument decreasing
). This in-turn changes the order of values/rows in the output of collapse functions. Note that sort = FALSE
is only effective on character vectors, numeric grouping vectors will always produce ordered groupings.
Next to GRP
, there is the function fgroup_by
as a significantly faster alternative to dplyr::group_by
. It creates a grouped data frame by attaching a 'GRP' object to a data frame. collapse functions with a grouped_df method applied to that data frame will yield grouped computations. Note that fgroup_by
can only be used in combination with collapse functions, not with dplyr verbs such as summarize
or mutate
. The converse is not true, you can group data with dplyr::group_by
and then apply collapse functions. Note also the fgroup_by
is class-agnostic, i.e. the classes of the data frame or list passed are preserved, and all standard methods (like subsetting with `[`
or print
methods) apply to the grouped object. Apart from the class 'grouped_df' which is added behind any classes the object might inherit (apart from 'data.frame'), a class 'GRP_df' is added in front. This class responds to print
method and subset (`[`
) methods. Both first call the corresponding method for the object and then print / attach the grouping information. print.GRP_df
prints below the object print one line indicating the grouping variables, followed, in square brackets, by the following information: [number of groups | average group size (standard-deviation of group sizes)]
.
GRP
is an S3 generic function with one default method supporting vector and list input and several conversion methods:
The conversion of factors to 'GRP' objects by GRP.factor
involves obtaining the number of groups calling ng <- fnlevels(f)
and then computing the count of each level using tabulate(f, ng)
. The integer group-id (2) is already given by the factor itself after removing the levels and class attributes and replacing any missing values with ng + 1L
. The levels are put in a list and moved to position (4) in the 'GRP' object, which is reserved for the unique groups. Going from factor to 'GRP' object thus only requires a tabulation of the levels, whereas creating a factor from a 'GRP' object using as.factor_GRP
does not involve any computations, but may involve interactions if multiple grouping columns were used (which are then interacted to produce unique factor levels) or as.character
conversions if the grouping column(s) were numeric (which are potentially expensive).
The method GRP.grouped_df
takes the 'groups' attribute from a grouped data frame and converts it to a 'GRP' object. If the grouped data frame was generated using fgroup_by
, all work is done already. If it was created using dplyr::group_by
, a C routine is called to efficiently convert the grouping object.
A list-like object of class ‘GRP’ containing information about the number of groups, the observations (rows) belonging to each group, the size of each group, the unique group names / definitions, whether the groups are ordered or not and (optionally) the ordering vector used to perform the ordering. The object is structured as follows:
List-index | Element-name | Content type | Content description | |||
[[1]] | N.groups | integer(1) |
Number of Groups | |||
[[2]] | group.id | integer(NROW(X)) |
An integer group-identifier | |||
[[3]] | group.sizes | integer(N.groups) |
Vector of group sizes | |||
[[4]] | groups | unique(X) or NULL |
Unique groups (same format as input, sorted if sort = TRUE ), or NULL if return.groups = FALSE |
|||
[[5]] | group.vars | character |
The names of the grouping variables | |||
[[6]] | ordered | logical(2) |
[1]- TRUE if sort = TRUE , [2]- TRUE if X already sorted |
|||
[[7]] | order | integer(NROW(X)) or NULL |
Ordering vector from radixorderv or NULL if return.order = FALSE (the default) |
|||
[[8]] | call | call() or NULL |
The GRP() call, obtained from match.call() , or NULL if call = FALSE
|
## default method GRP(mtcars$cyl) GRP(mtcars, ~ cyl + vs + am) # Or GRP(mtcars, c("cyl","vs","am")) or GRP(mtcars, c(2,8:9)) g <- GRP(mtcars, ~ cyl + vs + am) # Saving the object print(g) # Printing it plot(g) # Plotting it GRPnames(g) # Retain group names fsum(mtcars, g) # Compute the sum of mtcars, grouped by variables cyl, vs and am ## Convert factor to GRP object and vice-versa GRP(iris$Species) as.factor_GRP(g) ## dplyr integration library(dplyr) mtcars %>% group_by(cyl,vs,am) %>% GRP # Get GRP object from a dplyr grouped tibble mtcars %>% group_by(cyl,vs,am) %>% fmean # Grouped mean using dplyr grouping mtcars %>% fgroup_by(cyl,vs,am) %>% fmean # Faster alternative with collapse grouping mtcars %>% fgroup_by(cyl,vs,am) # Print method for grouped data frame
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.