finalfit: summary_factorlist – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

summary_factorlist

Summarise a set of factors (or continuous variables) by a dependent variable

Description

A function that takes a single dependent variable with a vector of explanatory variable names (continuous or categorical variables) to produce a summary table.

Usage

summary_factorlist(.data, dependent = NULL, explanatory, cont = "mean",
  cont_nonpara = NULL, cont_cut = 5, cont_range = FALSE, p = FALSE,
  p_cont_para = "aov", p_cat = "chisq", column = TRUE,
  total_col = FALSE, orderbytotal = FALSE, digits = c(1, 1, 3, 1),
  na_include = FALSE, na_include_dependent = FALSE,
  na_complete_cases = FALSE, na_to_p = FALSE, fit_id = FALSE,
  add_dependent_label = FALSE, dependent_label_prefix = "Dependent: ",
  dependent_label_suffix = "", add_col_totals = FALSE,
  include_col_totals_percent = TRUE, col_totals_rowname = NULL,
  col_totals_prefix = "", add_row_totals = FALSE,
  include_row_missing_col = TRUE, row_totals_colname = "Total N",
  row_missing_colname = "Missing N", catTest = NULL)

Arguments

`.data`	Dataframe.
`dependent`	Character vector of length 1: name of dependent variable (2 to 5 factor levels).
`explanatory`	Character vector of any length: name(s) of explanatory variables.
`cont`	Summary for continuous explanatory variables: "mean" (standard deviation) or "median" (interquartile range). If "median" then non-parametric hypothesis test performed (see below).
`cont_nonpara`	Numeric vector of form e.g. `c(1,2)`. Specify which variables to perform non-parametric hypothesis tests on and summarise with "median".
`cont_cut`	Numeric: number of unique values in continuous variable at which to consider it a factor.
`cont_range`	Logical. Median is show with 1st and 3rd quartiles.
`p`	Logical: Include null hypothesis statistical test.
`p_cont_para`	Character. Continuous variable parametric test. One of either "aov" (analysis of variance) or "t.test" for Welch two sample t-test. Note continuous non-parametric test is always Kruskal Wallis (kruskal.test) which in two-group setting is equivalent to Mann-Whitney U /Wilcoxon rank sum test. For continous dependent and continuous explanatory, the parametric test p-value returned is for the Pearson correlation coefficient. The non-parametric equivalent is for the p-value for the Spearman correlation coefficient.
`p_cat`	Character. Categorical variable test. One of either "chisq" or "fisher".
`column`	Logical: Compute margins by column rather than row.
`total_col`	Logical: include a total column summing across factor levels.
`orderbytotal`	Logical: order final table by total column high to low.
`digits`	Number of digits to round to (1) mean/median, (2) standard deviation / interquartile range, (3) p-value, (4) count percentage.
`na_include`	Logical: make explanatory variables missing data explicit (`NA`).
`na_include_dependent`	Logical: make dependent variable missing data explicit.
`na_complete_cases`	Logical: include only rows with complete data.
`na_to_p`	Logical: include missing as group in statistical test.
`fit_id`	Logical: allows merging via `finalfit_merge`.
`add_dependent_label`	Add the name of the dependent label to the top left of table.
`dependent_label_prefix`	Add text before dependent label.
`dependent_label_suffix`	Add text after dependent label.
`add_col_totals`	Logical. Include column total n.
`include_col_totals_percent`	Include column percentage of total.
`col_totals_rowname`	Logical. Row name for column totals.
`col_totals_prefix`	Character. Prefix to column totals, e.g. "N=".
`add_row_totals`	Logical. Include row totals. Note this differs from `total_col` above particularly for continuous explanatory variables.
`include_row_missing_col`	Logical. Include missing data total for each row. Only used when `add_row_totals` is `TRUE`.
`row_totals_colname`	Character. Column name for row totals.
`row_missing_colname`	Character. Column name for missing data totals for each row.
`catTest`	Deprecated. See `p_cat` above.

Details

This function aims to produce publication-ready summary tables for categorical or continuous dependent variables. It usually takes a categorical dependent variable to produce a cross table of counts and proportions expressed as percentages or summarised continuous explanatory variables. However, it will take a continuous dependent variable to produce mean (standard deviation) or median (interquartile range) for use with linear regression models.

Value

Returns a factorlist dataframe.

Examples

library(finalfit)
library(dplyr)
# Load example dataset, modified version of survival::colon
data(colon_s)

# Table 1 - Patient demographics ----
explanatory = c("age", "age.factor", "sex.factor", "obstruct.factor")
dependent = "perfor.factor"
colon_s %>%
  summary_factorlist(dependent, explanatory, p=TRUE)

# summary.factorlist() is also commonly used to summarise any number of
# variables by an outcome variable (say dead yes/no).

# Table 2 - 5 yr mortality ----
explanatory = c("age.factor", "sex.factor", "obstruct.factor", "perfor.factor")
dependent = "mort_5yr"
colon_s %>%
  summary_factorlist(dependent, explanatory)

finalfit

Quickly Create Elegant Regression Results Tables and Plots when Modelling

v1.0.2

MIT + file LICENCE

Authors

Ewen Harrison [aut, cre], Tom Drake [aut], Riinu Ots [aut]

Initial release