crunch: joinDatasets – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

joinDatasets

Add columns from one dataset to another, joining on a key

Description

As base::merge() does for data.frames, this function takes two datasets, matches rows based on a specified key variable, and adds columns from one to the other.

Usage

joinDatasets(
  x,
  y,
  by = intersect(names(x), names(y)),
  by.x = by,
  by.y = by,
  all = FALSE,
  all.x = TRUE,
  all.y = FALSE,
  copy = TRUE
)

extendDataset(
  x,
  y,
  by = intersect(names(x), names(y)),
  by.x = by,
  by.y = by,
  all = FALSE,
  all.x = TRUE,
  all.y = FALSE,
  ...
)

## S3 method for class 'CrunchDataset'
merge(
  x,
  y,
  by = intersect(names(x), names(y)),
  by.x = by,
  by.y = by,
  all = FALSE,
  all.x = TRUE,
  all.y = FALSE,
  ...
)

Arguments

`x`	CrunchDataset to add data to
`y`	CrunchDataset to copy data from. May be filtered by rows and/or columns.
`by`	character, optional shortcut for specifying `by.x` and `by.y` by alias if the key variables have the same alias in both datasets.
`by.x`	CrunchVariable in `x` on which to join, or the alias (following `crunch.namekey.dataset` of a variable. Must be type numeric or text and have all unique, non-missing values.
`by.y`	CrunchVariable in `y` on which to join, or the alias (following `crunch.namekey.dataset` of a variable. Must be type numeric or text and have all unique, non-missing values.
`all`	logical: should all rows in x and y be kept, i.e. a "full outer" join? Only `FALSE` is currently supported.
`all.x`	logical: should all rows in x be kept, i.e. a "left outer" join? Only `TRUE` is currently supported.
`all.y`	logical: should all rows in y be kept, i.e. a "right outer" join? Only `FALSE` is currently supported.
`copy`	logical: make a virtual or materialized join. Default is `TRUE`, which means materialized. Virtual joins are in fact not currently implemented, so the default is the only valid value.
`...`	additional arguments, ignored

Details

Since joining two datasets can sometimes produce unexpected results if the keys differ between the two datasets, you may want to follow the fork-edit-merge workflow for this operation. To do this, fork the dataset with forkDataset(), join the new data to the fork, ensure that the resulting dataset is correct, and merge it back to the original dataset with mergeFork(). For more, see vignette("fork-and-merge", package = "crunch").

Value

x extended by the columns of y, matched on the "by" variables.

crunch

Crunch.io Data Tools

v1.28.0

LGPL (>= 3)

Authors

Greg Freedman Ellis [aut, cre], Jonathan Keane [aut], Mike Malecki [aut], Neal Richardson [aut], Gordon Shotwell [aut]

Initial release