Construct a matched dataset from a matchit object
match.data
and get_matches
create a data frame with additional variables for the distance measure, matching weights, and subclasses after matching. This dataset can be used to estimate treatment effects after matching or subclassification. get_matches
is most useful after matching with replacement; otherwise, match.data
is more flexible. See Details below for the difference between them.
match.data(object, group = "all", distance = "distance", weights = "weights", subclass = "subclass", data = NULL, include.s.weights = TRUE, drop.unmatched = TRUE) get_matches(object, distance = "distance", weights = "weights", subclass = "subclass", id = "id", data = NULL, include.s.weights = TRUE)
object |
a |
group |
which group should comprise the matched dataset: |
distance |
a string containing the name that should be given to the variable containing the distance measure in the data frame output. Default is |
weights |
a string containing the name that should be given to the variable containing the matching weights in the data frame output. Default is |
subclass |
a string containing the name that should be given to the variable containing the subclasses or matched pair membership in the data frame output. Default is |
id |
a string containing the name that should be given to the variable containing the unit IDs in the data frame output. Default is |
data |
a data frame containing the original dataset to which the computed output variables ( |
include.s.weights |
|
drop.unmatched |
|
match.data
creates a dataset with one row per unit. It will be identical to the dataset supplied except that several new columns will be added containing information related to the matching. When drop.unmatched = TRUE
, the default, units with weights of zero, which are those units that were discarded by common support or the caliper or were simply not matched, will be dropped from the dataset, leaving only the subset of matched units. The idea is for the output of match.data
to be used as the dataset input in calls to glm
or similar to estimate treatment effects in the matched sample. It is important to include the weights in the estimation of the effect and its standard error. The subclass column, when created, contains par or subclass membership and should be used to estimate the effect and its standard error. Subclasses will only be included if there is a subclass
component in the matchit
object, which does not occur with matching with replacement, in which case get_matches
should be used. See vignette("estimating-effects")
for information on how to use match.data
output to estimate effects.
get_matches
is similar to match.data
; the primary difference occurs when matching is performed with replacement, i.e., when units do not belong to a single matched pair. In this case, the output of get_matches
will be a dataset that contains one row per unit for each pair they are a part of. For example, if matching was performed with replacement and a control unit was matched to two treated units, that control unit will have two rows in the output dataset, one for each pair it is a part of. Weights are computed for each row, and are equal to the inverse of the number of control units in each control unit's subclass. Unmatched units are dropped. An additional column with unit IDs will be created (named using the id
argument) to identify when the same unit is present in multiple rows. This dataset structure allows for the inclusion of both subclass membership and repeated use of units, unlike the output of match.data
, which lacks subclass membership when matching is done with replacement. A match.matrix
component of the matchit
object must be present to use get_matches
; in some forms of matching, it is absent, in which case match.data
should be used instead. See vignette("estimating-effects")
for information on how to use get_matches
output to estimate effects after matching with replacement.
A data frame containing the data supplied in the data
argument or in the original call to matchit
with the computed output variables appended as additional columns, named according the arguments above. For match.data
, the group
and drop.unmatched
arguments control whether only subsets of the data are returned. See Details above for how match.data
and get_matches
differ. Note that get_matches
sorts the data by subclass and treatment status, unlike match.data
, which uses the order of the data.
The returned data frame will contain the variables in the original data set or dataset supplied to data
, and the following columns:
distance |
The propensity score, if estimated or supplied to the |
weights |
The computed matching weights. These must be used in effect estimation to correctly incorporate the matching. |
subclass |
Matching strata membership. Units with the same value are in the same stratum. |
id |
The ID of each unit, corresponding to the row names in the original data or dataset supplied to |
These columns will take on the name supplied to the corresponding arguments in the call to match.data
or get_matches
. See Examples for an example of rename the distance
column to "prop.score"
.
If data
or the original dataset supplied to matchit
was a data.table
or tbl
, the match.data
output will have the same class, but the get_matches
output will always be a base R data.frame
.
vignette("estimating-effects")
for uses of match.data()
and get_matches()
in estimating treatment effects.
data("lalonde") # 4:1 matching w/replacement m.out1 <- matchit(treat ~ age + educ + married + race + nodegree + re74 + re75, data = lalonde, replace = TRUE, caliper = .05, ratio = 4) m.data1 <- match.data(m.out1, data = lalonde, distance = "prop.score") dim(m.data1) #one row per matched unit head(m.data1, 10) g.matches1 <- get_matches(m.out1, data = lalonde, distance = "prop.score") dim(g.matches1) #multiple rows per matched unit head(g.matches1, 10)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.