Aggregating functions
The mosaic
package makes several summary statistic functions (like mean
and sd
)
formula aware.
mean_(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) mean(x, ...) median(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) range(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sd(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) max(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) min(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) IQR(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) fivenum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) iqr(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) prod(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) sum(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) favstats(x, ..., data = NULL, groups = NULL, na.rm = TRUE) quantile(x, ..., data = NULL, groups = NULL, na.rm = getOption("na.rm", FALSE)) var(x, y = NULL, na.rm = getOption("na.rm", FALSE), ..., data = NULL) cor(x, y = NULL, ..., data = NULL) cov(x, y = NULL, ..., data = NULL)
x |
a numeric vector or a formula |
... |
additional arguments |
data |
a data frame in which to evaluate formulas (or bare names).
Note that the default is |
groups |
a grouping variable, typically a name of a variable in |
na.rm |
a logical indicating whether |
y |
a numeric vector or a formula |
Many of these functions mask core R functions to provide an additional formula
interface. Old behavior should be unchanged. But if the first argument is a formula,
that formula, together with data
are used to generate the numeric vector(s)
to be summarized. Formulas of the shape x ~ a
or ~ x | a
can be used to
produce summaries of x
for each subset defined by a
. Two-way aggregation
can be achieved using formulas of the form x ~ a + b
or x ~ a | b
. See
the examples.
Earlier versions of these functions supported a "bare name + data frame" interface. This functionality has been removed since it was (a) ambiguous in some cases, (b) unnecessary, and (c) difficult to maintain.
mean(HELPrct$age) mean( ~ age, data = HELPrct) mean( ~ drugrisk, na.rm = TRUE, data = HELPrct) mean(age ~ shuffle(sex), data = HELPrct) mean(age ~ shuffle(sex), data = HELPrct, .format = "table") # wrap in data.frame() to auto-convert awkward variable names data.frame(mean(age ~ shuffle(sex), data = HELPrct, .format = "table")) mean(age ~ sex + substance, data = HELPrct) mean( ~ age | sex + substance, data = HELPrct) mean( ~ sqrt(age), data = HELPrct) sum( ~ age, data = HELPrct) sd(HELPrct$age) sd( ~ age, data = HELPrct) sd(age ~ sex + substance, data = HELPrct) var(HELPrct$age) var( ~ age, data = HELPrct) var(age ~ sex + substance, data = HELPrct) IQR(width ~ sex, data = KidsFeet) iqr(width ~ sex, data = KidsFeet) favstats(width ~ sex, data = KidsFeet) cor(length ~ width, data = KidsFeet) cov(length ~ width, data = KidsFeet) tally(is.na(mcs) ~ is.na(pcs), data = HELPmiss) cov(mcs ~ pcs, data = HELPmiss) # NA because of missing data cov(mcs ~ pcs, data = HELPmiss, use = "complete") # ignore missing data # alternative approach using filter explicitly cov(mcs ~ pcs, data = HELPmiss %>% filter(!is.na(mcs) & !is.na(pcs)))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.