Identify the row of y best matching each row of x
For each row of x[, by.x]
,
find the best matching row of
y[, by.y]
, with the best
match defined by grep.
and
split
.
grep.
and split
must
either be missing
or
have the same length as by.x
and by.y
. If grep.[i]
and split[i]
are NA, do a
complete match of x[, by.x[i]]
and y[, by.y[i]]
. Otherwise,
for each row j
, look for a
match for strsplit(x[j, by.x[i]],
split[i])[[1]][1]
among
strsplit(y[, by.y[i]], split[i])
.
See details.
match.data.frame(x, y, by, by.x=by, by.y=by, grep., split, sep=':')
x, y |
data.frames |
by, by.x, by.y |
names of columns of |
grep. |
a character vector of the type of match
for each element of Alternatives are NOTE: These alternatives are not examined
if a unique match is found between
|
split |
A character vector of |
sep |
a |
1. Check by.x, by.y, grep.
and
split
. If((missing(by.x) |
missing(by.y)) && missing(by)) by <- names(x)
2. fullMatch <- (is.na(grep.) & is
.na(split))
. Create keyfx
and
keyfy
by by pasting columns of
x[, by.x[fullMatch]]
and
y[, by.y[fullMatch]]
. Also
create x.
and y.
=
strsplit
of
x[, by.x[!fullMatch]]
.
3. Iterate over rows of x
looking
for the best match. This includes an inner
loop over columns of
x[, by.x[!fullMatch]]
, stopping
on the first unique match. Return (-1) if
no unique match is found.
an integer vector of length nrow(x)
containing the index of the best matching row
of y
or NA
if no adequate match
was found.
Spencer Graves
newdata <- data.frame(state=c("AL", "MI","NY"), surname=c("Rogers", "Rogers", "Smith"), givenName=c("Mike R.", "Mike K.", "Al"), stringsAsFactors=FALSE) reference <- data.frame(state=c("NY", "NY", "MI", "AL", "NY", "MI"), surname=c("Smith", "Rogers", "Rogers (MI)", "Rogers (AL)", "Smith", 'Jones'), givenName=c("John", "Mike", "Mike", "Mike", "T. Albert", 'Al Thomas'), stringsAsFactors=FALSE) newInRef <- match.data.frame(newdata, reference, grep.=c(NA, 'agrep', 'agrep')) all.equal(newInRef, c(4, 3, 5))
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.