Bayesian aggregate treatment effects model
Bayesian inference on parameters of an average treatment effects model that's appropriate to the supplied individual- or group-level data, using Hamiltonian Monte Carlo in Stan. (For overall package help file see baggr-package)
baggr( data, model = NULL, pooling = "partial", effect = NULL, covariates = c(), prior_hypermean = NULL, prior_hypersd = NULL, prior_hypercor = NULL, prior_beta = NULL, prior = NULL, ppd = FALSE, test_data = NULL, quantiles = seq(0.05, 0.95, 0.1), outcome = "outcome", group = "group", treatment = "treatment", silent = FALSE, warn = TRUE, ... )
data |
data frame with summary or individual level data to meta-analyse |
model |
if |
pooling |
Type of pooling;
choose from |
effect |
Label for effect. Will default to "mean" in most cases, "log OR" in logistic model,
quantiles in |
covariates |
Character vector with column names in |
prior_hypermean |
prior distribution for hypermean; you can use "plain text" notation like
|
prior_hypersd |
prior for hyper-standard deviation, used
by Rubin and |
prior_hypercor |
prior for hypercorrelation matrix, used by the |
prior_beta |
prior for regression coefficients if |
prior |
alternative way to specify all priors as a named list with |
ppd |
logical; use prior predictive distribution? (p.p.d.) Default is no.
If |
test_data |
data for cross-validation; NULL for no validation, otherwise a data frame
with the same columns as |
quantiles |
if |
outcome |
character; column name in (individual-level)
|
group |
character; column name in |
treatment |
character; column name in (individual-level) |
silent |
Whether to silence messages about prior settings and about other automatic behaviour. |
warn |
print an additional warning if Rhat exceeds 1.05 |
... |
extra options passed to Stan function, e.g. |
Running baggr requires 1/ data preparation, 2/ choice of model, 3/ choice of priors.
All three are discussed in depth in the package vignette (vignette("baggr")).
Data. For aggregate data models you need a data frame with columns
tau and se or tau, mu, se.tau, se.mu.
An additional column can be used to provide labels for each group
(by default column group is used if available, but this can be
customised – see the example below).
For individual level data three columns are needed: outcome, treatment, group. These
are identified by using the outcome, treatment and group arguments.
Many data preparation steps can be done through a helper function prepare_ma.
It can convert individual to summary-level data, calculate
odds/risk ratios (with/without corrections) in binary data, standardise variables and more.
Using it will automatically format data inputs to work with baggr().
Models. Available models are:
for the continuous variable means:
"rubin" model for average treatment effect, "mutau"
version which takes into account means of control groups, "full",
which works with individual-level data
for continuous variable quantiles: '"quantiles"“ model (see Meager, 2019 in references)
for binary data: "logit" model can be used on individual-level data;
you can also analyse continuous statistics such as
log odds ratios and logs risk ratios using the models listed above;
see vignette("baggr_binary") for tutorial with examples
If no model is specified, the function tries to infer the appropriate model automatically. Additionally, the user must specify type of pooling. The default is always partial pooling.
Covariates.
Both aggregate and individual-level data can include extra columns, given by covariates argument
(specified as a character vector of column names) to be used in regression models.
We also refer to impact of these covariates as fixed effects.
Two types of covariates may be present in your data:
In "rubin" and "mutau" models, covariates that change according to group unit.
In that case, the model accounting
for the group covariates is a
meta-regression
model. It can be modelled on summary-level data.
In "logit" and "full" models, covariates that change according to individual unit.
Then, the model can be called a
mixed model
. It has to be fitted to individual-level data. Note that the first case can also be
accounted for by using a mixed model.
Priors. It is optional to specify priors yourself,
as the package will try propose an appropriate
prior for the input data if you do not pass a prior argument.
To set the priors yourself, use prior_ arguments. For specifying many priors at once
(or re-using between models), a single prior = list(...) argument can be used instead.
Appropriate examples are given in vignette("baggr").
Outputs. By default, some outputs are printed. There is also a
plot method for baggr objects which you can access via baggr_plot (or simply plot()).
Other standard functions for working with baggr object are
treatment_effect for distribution of hyperparameters
group_effects for distributions of group-specific parameters
fixed_effects for coefficients in (meta-)regression
effect_draw and effect_plot for posterior predictive distributions
baggr_compare for comparing multiple baggr models
loocv for cross-validation
pp_check for posterior predictive checks
baggr class structure: a list including Stan model fit
alongside input data, pooling metrics, various model properties.
If test data is used, mean value of -2*lpd is reported as mean_lpd
Witold Wiecek, Rachael Meager
df_pooled <- data.frame("tau" = c(1, -1, .5, -.5, .7, -.7, 1.3, -1.3),
"se" = rep(1, 8),
"state" = datasets::state.name[1:8])
baggr(df_pooled) #baggr automatically detects the input data
# same model, but with correct labels,
# different pooling & passing some options to Stan
baggr(df_pooled, group = "state", pooling = "full", iter = 500)
# model with different (very informative) priors
baggr(df_pooled, prior_hypersd = normal(0, 2))
# "mu & tau" model, using a built-in dataset
# prepare_ma() can summarise individual-level data
ms <- microcredit_simplified
ms$outcome <- microcredit_simplified$consumerdurables + 1
microcredit_summary_data <- prepare_ma(ms)
baggr(microcredit_summary_data, model = "mutau",
pooling = "partial", prior_hypercor = lkj(1),
prior_hypersd = normal(0,10),
prior_hypermean = multinormal(c(0,0),matrix(c(10,3,3,10),2,2)))Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.