Recode numeric variables into equal-ranged groups
Recode numeric variables into equal ranged, grouped factors,
i.e. a variable is cut into a smaller number of groups, where each group
has the same value range. group_labels()
creates the related value
labels. group_var_if()
and group_labels_if()
are scoped
variants of group_var()
and group_labels()
, where grouping
will be applied only to those variables that match the logical condition
of predicate
.
group_var( x, ..., size = 5, as.num = TRUE, right.interval = FALSE, n = 30, append = TRUE, suffix = "_gr" ) group_var_if( x, predicate, size = 5, as.num = TRUE, right.interval = FALSE, n = 30, append = TRUE, suffix = "_gr" ) group_labels(x, ..., size = 5, right.interval = FALSE, n = 30) group_labels_if(x, predicate, size = 5, right.interval = FALSE, n = 30)
x |
A vector or data frame. |
... |
Optional, unquoted names of variables that should be selected for
further processing. Required, if |
size |
Numeric; group-size, i.e. the range for grouping. By default,
for each 5 categories of |
as.num |
Logical, if |
right.interval |
Logical; if |
n |
Sets the maximum number of groups that are defined when auto-grouping is on
( |
append |
Logical, if |
suffix |
Indicates which suffix will be added to each dummy variable.
Use |
predicate |
A predicate function to be applied to the columns. The
variables for which |
If size
is set to a specific value, the variable is recoded
into several groups, where each group has a maximum range of size
.
Hence, the amount of groups differ depending on the range of x
.
If size = "auto"
, the variable is recoded into a maximum of
n
groups. Hence, independent from the range of
x
, always the same amount of groups are created, so the range
within each group differs (depending on x
's range).
right.interval
determins which boundary values to include when
grouping is done. If TRUE
, grouping starts with the lower
bound of size
. For example, having a variable ranging from
50 to 80, groups cover the ranges from 50-54, 55-59, 60-64 etc.
If FALSE
(default), grouping starts with the upper bound
of size
. In this case, groups cover the ranges from
46-50, 51-55, 56-60, 61-65 etc. Note: This will cover
a range from 46-50 as first group, even if values from 46 to 49
are not present. See 'Examples'.
If you want to split a variable into a certain amount of equal
sized groups (instead of having groups where values have all the same
range), use the split_var
function!
group_var()
also works on grouped data frames (see group_by
).
In this case, grouping is applied to the subsets of variables
in x
. See 'Examples'.
For group_var()
, a grouped variable, either as numeric or as factor (see paramter as.num
). If x
is a data frame, only the grouped variables will be returned.
For group_labels()
, a string vector or a list of string vectors containing labels based on the grouped categories of x
, formatted as "from lower bound to upper bound", e.g. "10-19" "20-29" "30-39"
etc. See 'Examples'.
Variable label attributes (see, for instance,
set_label
) are preserved. Usually you should use
the same values for size
and right.interval
in
group_labels()
as used in the group_var
function if you want
matching labels for the related recoded variable.
split_var
to split variables into equal sized groups,
group_str
for grouping string vectors or
rec_pattern
and rec
for another convenient
way of recoding variables into smaller groups.
age <- abs(round(rnorm(100, 65, 20))) age.grp <- group_var(age, size = 10) hist(age) hist(age.grp) age.grpvar <- group_labels(age, size = 10) table(age.grp) print(age.grpvar) # histogram with EUROFAMCARE sample dataset # variable not grouped library(sjlabelled) data(efc) hist(efc$e17age, main = get_label(efc$e17age)) # bar plot with EUROFAMCARE sample dataset # grouped variable ageGrp <- group_var(efc$e17age) ageGrpLab <- group_labels(efc$e17age) barplot(table(ageGrp), main = get_label(efc$e17age), names.arg = ageGrpLab) # within a pipe-chain library(dplyr) efc %>% select(e17age, c12hour, c160age) %>% group_var(size = 20) # create vector with values from 50 to 80 dummy <- round(runif(200, 50, 80)) # labels with grouping starting at lower bound group_labels(dummy) # labels with grouping startint at upper bound group_labels(dummy, right.interval = TRUE) # works also with gouped data frames mtcars %>% group_var(disp, size = 4, append = FALSE) %>% table() mtcars %>% group_by(cyl) %>% group_var(disp, size = 4, append = FALSE) %>% table()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.