Genetic Matching
In matchit
, setting method = "genetic"
performs genetic matching. Genetic matching is a form of nearest neighbor matching where distances are computed as the generalized Mahalanobis distance, which is a generalization of the Mahalanobis distance with a scaling factor for each covariate that represents the importance of that covariate to the distance. A genetic algorithm is used to select the scaling factors. The scaling factors are chosen as those which maximize a criterion related to covariate balance, which can be chosen, but which by default is the smallest p-value in covariate balance tests among the covariates. This method relies on and is a wrapper for Matching::GenMatch
and Matching::Match
, which use rgenoud::genoud
to perform the optimization using the genetic algorithm.
This page details the allowable arguments with method = "genmatch"
. See matchit
for an explanation of what each argument means in a general context and how it can be specified.
Below is how matchit
is used for genetic matching:
matchit(formula, data = NULL, method = "genetic", distance = "glm", link = "logit", distance.options = list(), estimand = "ATT", exact = NULL, mahvars = NULL, discard = "none", reestimate = FALSE, s.weights = NULL, replace = FALSE, m.order = NULL, caliper = NULL, ratio = 1, verbose = FALSE, ...)
formula |
a two-sided |
data |
a data frame containing the variables named in |
method |
set here to |
distance |
the distance measure to be used. See |
link |
when |
distance.options |
a named list containing additional arguments supplied to the function that estimates the distance measure as determined by the argument to |
estimand |
a string containing the desired estimand. Allowable options include |
exact |
for which variables exact matching should take place. |
mahvars |
when a distance measure other than |
discard |
a string containing a method for discarding units outside a region of common support. Only allowed when |
reestimate |
if |
s.weights |
the variable containing sampling weights to be incorporated into propensity score models and balance statistics. These are also supplied to |
replace |
whether matching should be done with replacement. |
m.order |
the order that the matching takes place. The default for |
caliper |
the width(s) of the caliper(s) used for caliper matching. See Details and Examples. |
std.caliper |
|
ratio |
how many control units should be matched to each treated unit for k:1 matching. Should be a single integer value. |
verbose |
|
... |
additional arguments passed to |
In genetic matching, covariates play three roles: 1) as the variables on which balance is optimized, 2) as the variables in the generalized Mahalanobis distance between units, and 3) in estimating the propensity score. Variables supplied to formula
are always used for role (1), as the variables on which balance is optimized. When distance
is not "mahalanobis"
, the covariates are also used to estimate the propensity score (unless it is supplied). When mahvars
is specified, the named variables will form the covariates that go into the distance matrix. Otherwise, the variables in formula
along with the propensity score will go into the distance matrix. This leads to three ways to use distance
and mahvars
to perform the matching:
1) When distance = "mahalanobis"
, no propensity score is estimated, and the covariates in formula
are used to form the generalized Mahalanobis distance matrix. In this sense, "mahalanobis"
signals that no propensity score is to be estimated and that the matching variables are those in formula
, consistent with setting distance = "mahalanobis"
with other methods.
2) When distance
is not "mahalanobis"
and mahvars
is not specified, the covariates in formula
along with the propensity score are used to form the generalized Mahalanobis distance matrix. This is the default and most typical use of method = "genetic"
in matchit
.
3) When distance
is not "mahalanobis"
and mahvars
is specified, the covariates in mahvars
are used to form the generalized Mahalanobis distance matrix. The covariates in formula
are used to estimate the propensity score and have their balance optimized by the genetic algorithm. The propensity score is not included in the generalized Mahalanobis distance matrix.
When a caliper is specified, any variables mentioned in caliper
, possibly including the propensity score, will be added to the matching variables used to form the generalized Mahalanobis distance matrix. This is because Matching doesn't allow for the separation of caliper variables and matching variables in genetic matching.
The estimand
argument controls whether control units are selected to be matched with treated units (estimand = "ATT"
) or treated units are selected to be matched with control units (estimand = "ATC"
). The "focal" group (e.g., the treated units for the ATT) is typically made to be the smaller treatment group, and a warning will be thrown if it is not set that way unless replace = TRUE
. Setting estimand = "ATC"
is equivalent to swapping all treated and control labels for the treatment variable. When estimand = "ATC"
, the default m.order
is "smallest"
, and the match.matrix
component of the output will have the names of the control units as the rownames and be filled with the names of the matched treated units (opposite to when estimand = "ATT"
). Note that the argument supplied to estimand
doesn't necessarily correspond to the estimand actually targeted; it is merely a switch to trigger which treatment group is considered "focal". Note that while GenMatch()
and Matching()
support the ATE as an estimand, matchit()
only supports the ATT and ATC for genetic matching.
All outputs described in matchit
are returned with method = "genetic"
. When repalce = TRUE
, the subclass
component is omitted.
In a manuscript, be sure to cite the following papers if using matchit
with method = "genetic"
:
Diamond, A., & Sekhon, J. S. (2013). Genetic matching for estimating causal effects: A general multivariate matching method for achieving balance in observational studies. Review of Economics and Statistics, 95(3), 932–945. doi: 10.1162/REST_a_00318
Sekhon, J. S. (2011). Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching package for R. Journal of Statistical Software, 42(1), 1–52. doi: 10.18637/jss.v042.i07
For example, a sentence might read:
Genetic matching was performed using the MatchIt package (Ho, Imai, King, & Stuart, 2011) in R, which calls functions from the Matching package (Diamond & Sekhon, 2013; Sekhon, 2011).
matchit
for a detailed explanation of the inputs and outputs of a call to matchit
.
Matching::GenMatch
and Matching::Match
, which do the work.
data("lalonde") # 1:1 genetic matching with PS as a covariate m.out1 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "genetic", pop.size = 10) #use much larger pop.size m.out1 summary(m.out1) # 2:1 genetic matching with replacement without PS m.out2 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "genetic", replace = TRUE, ratio = 2, distance = "mahalanobis", pop.size = 10) #use much larger pop.size m.out2 summary(m.out2) # 1:1 genetic matching on just age, educ, re74, and re75 # within calipers on PS and educ; other variables are # used to estimate PS m.out3 <- matchit(treat ~ age + educ + race + nodegree + married + re74 + re75, data = lalonde, method = "genetic", mahvars = ~ age + educ + re74 + re75, caliper = c(.05, educ = 2), std.caliper = c(TRUE, FALSE), pop.size = 10) #use much larger pop.size m.out3 summary(m.out3)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.