sjmisc: split_var – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

split_var

Split numeric variables into smaller groups

Description

Recode numeric variables into equal sized groups, i.e. a variable is cut into a smaller number of groups at specific cut points. split_var_if() is a scoped variant of split_var(), where transformation will be applied only to those variables that match the logical condition of predicate.

Usage

split_var(
  x,
  ...,
  n,
  as.num = FALSE,
  val.labels = NULL,
  var.label = NULL,
  inclusive = FALSE,
  append = TRUE,
  suffix = "_g"
)

split_var_if(
  x,
  predicate,
  n,
  as.num = FALSE,
  val.labels = NULL,
  var.label = NULL,
  inclusive = FALSE,
  append = TRUE,
  suffix = "_g"
)

Arguments

`x`	A vector or data frame.
`...`	Optional, unquoted names of variables that should be selected for further processing. Required, if `x` is a data frame (and no vector) and only selected variables from `x` should be processed. You may also use functions like `:` or tidyselect's select-helpers. See 'Examples' or package-vignette.
`n`	The new number of groups that `x` should be split into.
`as.num`	Logical, if `TRUE`, return value will be numeric, not a factor.
`val.labels`	Optional character vector, to set value label attributes of recoded variable (see vignette Labelled Data and the sjlabelled-Package). If `NULL` (default), no value labels will be set. Value labels can also be directly defined in the `rec`-syntax, see 'Details'.
`var.label`	Optional string, to set variable label attribute for the returned variable (see vignette Labelled Data and the sjlabelled-Package). If `NULL` (default), variable label attribute of `x` will be used (if present). If empty, variable label attributes will be removed.
`inclusive`	Logical; if `TRUE`, cut point value are included in the preceding group. This may be necessary if cutting a vector into groups does not define proper ("equal sized") group sizes. See 'Note' and 'Examples'.
`append`	Logical, if `TRUE` (the default) and `x` is a data frame, `x` including the new variables as additional columns is returned; if `FALSE`, only the new variables are returned.
`suffix`	Indicates which suffix will be added to each dummy variable. Use `"numeric"` to number dummy variables, e.g. x_1, x_2, x_3 etc. Use `"label"` to add value label, e.g. x_low, x_mid, x_high. May be abbreviated.
`predicate`	A predicate function to be applied to the columns. The variables for which `predicate` returns `TRUE` are selected.

Details

split_var() splits a variable into equal sized groups, where the amount of groups depends on the n-argument. Thus, this functions cuts a variable into groups at the specified quantiles.

By contrast, group_var recodes a variable into groups, where groups have the same value range (e.g., from 1-5, 6-10, 11-15 etc.).

split_var() also works on grouped data frames (see group_by). In this case, splitting is applied to the subsets of variables in x. See 'Examples'.

Value

A grouped variable with equal sized groups. If x is a data frame, for append = TRUE, x including the grouped variables as new columns is returned; if append = FALSE, only the grouped variables will be returned. If append = TRUE and suffix = "", recoded variables will replace (overwrite) existing variables.

Note

In case a vector has only few number of unique values, splitting into equal sized groups may fail. In this case, use the inclusive-argument to shift a value at the cut point into the lower, preceeding group to get equal sized groups. See 'Examples'.

Examples

data(efc)
# non-grouped
table(efc$neg_c_7)

# split into 3 groups
table(split_var(efc$neg_c_7, n = 3))

# split multiple variables into 3 groups
split_var(efc, neg_c_7, pos_v_4, e17age, n = 3, append = FALSE)
frq(split_var(efc, neg_c_7, pos_v_4, e17age, n = 3, append = FALSE))

# original
table(efc$e42dep)

# two groups, non-inclusive cut-point
# vector split leads to unequal group sizes
table(split_var(efc$e42dep, n = 2))

# two groups, inclusive cut-point
# group sizes are equal
table(split_var(efc$e42dep, n = 2, inclusive = TRUE))

# Unlike dplyr's ntile(), split_var() never splits a value
# into two different categories, i.e. you always get a clean
# separation of original categories
library(dplyr)

x <- dplyr::ntile(efc$neg_c_7, n = 3)
table(efc$neg_c_7, x)

x <- split_var(efc$neg_c_7, n = 3)
table(efc$neg_c_7, x)

# works also with gouped data frames
mtcars %>%
  split_var(disp, n = 3, append = FALSE) %>%
  table()

mtcars %>%
  group_by(cyl) %>%
  split_var(disp, n = 3, append = FALSE) %>%
  table()

sjmisc

Data and Variable Transformation Functions

v2.8.6

GPL-3

Authors

Daniel Lüdecke [aut, cre] (<https://orcid.org/0000-0002-8895-3206>), Iago Giné-Vázquez [ctb], Alexander Bartel [ctb] (<https://orcid.org/0000-0002-1280-6138>)

Initial release