Optimal Full Matching
In matchit
, setting method = "full"
performs optimal full matching, which is a form of subclassification wherein all units, both treatment and control (i.e., the "full" sample), are assigned to a subclass and receive at least one match. The matching is optimal in the sense that that sum of the absolute distances between the treated and control units in each subclass are as small as possible. The method relies on and is a wrapper for optmatch::fullmatch
.
Advantages of optimal full matching include that the matching order is not required to be specified, units do not need to be discarded, and it is less likely that extreme within-subclass distances will be large, unlike with standard subclassification. The primary output of full matching is a set of matching weights that can be applied to the matched sample; in this way, full matching can be seen as a robust alternative to propensity score weighting, robust in the sense that the propensity score model does not need to be correct to estimate the treatment effect without bias.
This page details the allowable arguments with method = "fullmatch"
. See matchit
for an explanation of what each argument means in a general context and how it can be specified.
Below is how matchit
is used for optimal full matching:
matchit(formula, data = NULL, method = "full", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, caliper = NULL, std.caliper = TRUE, verbose = FALSE, ...)
formula |
a two-sided |
data |
a data frame containing the variables named in |
method |
set here to |
distance |
the distance measure to be used. See |
link |
when |
distance.options |
a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to |
estimand |
a string containing the desired estimand. Allowable options include |
exact |
for which variables exact matching should take place. Exact matching is processed using |
mahvars |
for which variables Mahalanobis distance matching should take place when a distance measure other than |
discard |
a string containing a method for discarding units outside a region of common support. Only allowed when |
reestimate |
if |
s.weights |
the variable containing sampling weights to be incorporated into propensity score models and balance statistics. |
caliper |
the width(s) of the caliper(s) used for caliper matching. Calipers are processed by |
std.caliper |
|
verbose |
|
... |
additional arguments passed to |
The arguments replace
, m.order
, and ratio
are ignored with a warning.
Mahalanobis distance matching can be done one of two ways:
1) If no propensity score needs to be estimated, distance
should be set to "mahalanobis"
, and Mahalanobis distance matching will occur on all the variables in formula
. Arguments to discard
and mahvars
will be ignored, and a caliper can only be placed on named variables. For example, to perform simple Mahalanobis distance matching, the following could be run:
matchit(treat ~ X1 + X2, method = "nearest", distance = "mahalanobis")
With this code, the Mahalanobis distance is computed using X1
and X2
, and matching occurs on this distance. The distance
component of the matchit
output will be empty.
2) If a propensity score needs to be estimated for any reason, e.g., for common support with discard
or for creating a caliper, distance
should be whatever method is used to estimate the propensity score or a vector of distance measures, i.e., it should not be "mahalanobis"
. Use mahvars
to specify the variables used to create the Mahalanobis distance. For example, to perform Mahalanobis within a propensity score caliper, the following could be run:
matchit(treat ~ X1 + X2 + X3, method = "nearest", distance = "glm", caliper = .25, mahvars = ~ X1 + X2)
With this code, X1
, X2
, and X3
are used to estimate the propensity score (using the "glm"
method, which by default is logistic regression), which is used to create a matching caliper. The actual matching occurs on the Mahalanobis distance computed only using X1
and X2
, which are supplied to mahvars
. Units whose propensity score difference is larger than the caliper will not be paired, and some treated units may therefore not receive a match. The estimated propensity scores will be included in the distance
component of the matchit
output. See Examples.
When sampling weights are supplied through the s.weights
argument, the covariance matrix of the covariates used in the Mahalanobis distance is not weighted by the sampling weights.
All outputs described in matchit
are returned with method = "full"
except for match.matrix
. This is because matching strata are not indexed by treated units as they are in some other forms of matching.
Due to what appears to be a bug in optmatch (version 0.9-13), calipers can only be used when min.controls
is left at its default.
The option "optmatch_max_problem_size"
is automatically set to Inf
during the matching process, different from its default in optmatch. This enables matching problems of any size to be run, but may also let huge, infeasible problems get through and potentially take a long time or crash R. See optmatch::setMaxProblemSize
for more details.
In a manuscript, be sure to cite the following paper if using matchit
with method = "full"
:
Hansen, B. B., & Klopfer, S. O. (2006). Optimal Full Matching and Related Designs via Network Flows. Journal of Computational and Graphical Statistics, 15(3), 609–627. doi: 10.1198/106186006X137047
For example, a sentence might read:
Optimal full matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the optmatch package (Hansen & Klopfer, 2006).
Theory is also developed in the following article:
Hansen, B. B. (2004). Full Matching in an Observational Study of Coaching for the SAT. Journal of the American Statistical Association, 99(467), 609–618. doi: 10.1198/016214504000000647
matchit
for a detailed explanation of the inputs and outputs of a call to matchit
.
optmatch::fullmatch
, which is the workhorse.
method_optimal
for optimal pair matching, which is a special case of optimal full matching, and which relies on similar machinery. Results from method = "optimal"
can be replicated with method = "full"
by setting min.controls
, max.controls
, and mean.controls
to the desired ratio
.
data("lalonde") # Optimal full PS matching m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "full") m.out1 summary(m.out1) # Optimal full Mahalanobis distance matching within a PS caliper m.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "full", caliper = .01, mahvars = ~ age + educ + re74 + re75) m.out2 summary(m.out2) # Optimal full Mahalanobis distance matching within calipers # of 500 on re74 and re75 m.out3 <- matchit(treat ~ age + educ + re74 + re75, data = lalonde, distance = "mahalanobis", method = "full", caliper = c(re74 = 500, re75 = 500), std.caliper = FALSE) m.out3 summary(m.out3, addlvariables = ~race + nodegree + married, data = lalonde)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.