sjmisc: dicho – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

dicho

Dichotomize variables

Description

Dichotomizes variables into dummy variables (0/1). Dichotomization is either done by median, mean or a specific value (see dich.by). dicho_if() is a scoped variant of dicho(), where recoding will be applied only to those variables that match the logical condition of predicate.

Usage

dicho(
  x,
  ...,
  dich.by = "median",
  as.num = FALSE,
  var.label = NULL,
  val.labels = NULL,
  append = TRUE,
  suffix = "_d"
)

dicho_if(
  x,
  predicate,
  dich.by = "median",
  as.num = FALSE,
  var.label = NULL,
  val.labels = NULL,
  append = TRUE,
  suffix = "_d"
)

Arguments

`x`	A vector or data frame.
`...`	Optional, unquoted names of variables that should be selected for further processing. Required, if `x` is a data frame (and no vector) and only selected variables from `x` should be processed. You may also use functions like `:` or tidyselect's select-helpers. See 'Examples' or package-vignette.
`dich.by`	Indicates the split criterion where a variable is dichotomized. Must be one of the following values (may be abbreviated): `"median"` or `"md"` by default, `x` is split into two groups at the median. `"mean"` or `"m"` splits `x` into two groups at the mean of `x`. numeric value splits `x` into two groups at the specific value. Note that the value is inclusive, i.e. `dich.by = 10` will split `x` into one group with values from lowest to 10 and another group with values greater than 10.
`as.num`	Logical, if `TRUE`, return value will be numeric, not a factor.
`var.label`	Optional string, to set variable label attribute for the returned variable (see vignette Labelled Data and the sjlabelled-Package). If `NULL` (default), variable label attribute of `x` will be used (if present). If empty, variable label attributes will be removed.
`val.labels`	Optional character vector (of length two), to set value label attributes of dichotomized variable (see `set_labels`). If `NULL` (default), no value labels will be set.
`append`	Logical, if `TRUE` (the default) and `x` is a data frame, `x` including the new variables as additional columns is returned; if `FALSE`, only the new variables are returned.
`suffix`	Indicates which suffix will be added to each dummy variable. Use `"numeric"` to number dummy variables, e.g. x_1, x_2, x_3 etc. Use `"label"` to add value label, e.g. x_low, x_mid, x_high. May be abbreviated.
`predicate`	A predicate function to be applied to the columns. The variables for which `predicate` returns `TRUE` are selected.

Details

dicho() also works on grouped data frames (see group_by). In this case, dichotomization is applied to the subsets of variables in x. See 'Examples'.

Value

x, dichotomized. If x is a data frame, for append = TRUE, x including the dichotomized. variables as new columns is returned; if append = FALSE, only the dichotomized variables will be returned. If append = TRUE and suffix = "", recoded variables will replace (overwrite) existing variables.

Note

Variable label attributes are preserved (unless changed via var.label-argument).

Examples

data(efc)
summary(efc$c12hour)
# split at median
table(dicho(efc$c12hour))
# split at mean
table(dicho(efc$c12hour, dich.by = "mean"))
# split between value lowest to 30, and above 30
table(dicho(efc$c12hour, dich.by = 30))

# sample data frame, values from 1-4
head(efc[, 6:10])

# dichtomized values (1 to 2 = 0, 3 to 4 = 1)
library(dplyr)
efc %>%
  select(6:10) %>%
  dicho(dich.by = 2) %>%
  head()

# dichtomize several variables in a data frame
dicho(efc, c12hour, e17age, c160age, append = FALSE)

# dichotomize and set labels
frq(dicho(
  efc, e42dep,
  var.label = "Dependency (dichotomized)",
  val.labels = c("lower", "higher"),
  append = FALSE
))

# works also with gouped data frames
mtcars %>%
  dicho(disp, append = FALSE) %>%
  table()

mtcars %>%
  group_by(cyl) %>%
  dicho(disp, append = FALSE) %>%
  table()

# dichotomizing grouped data frames leads to different
# results for a dichotomized variable, because the split
# value is different for each group.
# compare:
mtcars %>%
  group_by(cyl) %>%
  summarise(median = median(disp))

median(mtcars$disp)

# dichotomize only variables with more than 10 unique values
p <- function(x) dplyr::n_distinct(x) > 10
dicho_if(efc, predicate = p, append = FALSE)

sjmisc

Data and Variable Transformation Functions

v2.8.6

GPL-3

Authors

Daniel Lüdecke [aut, cre] (<https://orcid.org/0000-0002-8895-3206>), Iago Giné-Vázquez [ctb], Alexander Bartel [ctb] (<https://orcid.org/0000-0002-1280-6138>)

Initial release