Join SQL tables
These are methods for the dplyr join generics. They are translated to the following SQL queries:
inner_join(x, y)
: SELECT * FROM x JOIN y ON x.a = y.a
left_join(x, y)
: SELECT * FROM x LEFT JOIN y ON x.a = y.a
right_join(x, y)
: SELECT * FROM x RIGHT JOIN y ON x.a = y.a
full_join(x, y)
: SELECT * FROM x FULL JOIN y ON x.a = y.a
semi_join(x, y)
: SELECT * FROM x WHERE EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
anti_join(x, y)
: SELECT * FROM x WHERE NOT EXISTS (SELECT 1 FROM y WHERE x.a = y.a)
## S3 method for class 'tbl_lazy' inner_join( x, y, by = NULL, copy = FALSE, suffix = NULL, auto_index = FALSE, ..., sql_on = NULL, na_matches = c("never", "na") ) ## S3 method for class 'tbl_lazy' left_join( x, y, by = NULL, copy = FALSE, suffix = NULL, auto_index = FALSE, ..., sql_on = NULL, na_matches = c("never", "na") ) ## S3 method for class 'tbl_lazy' right_join( x, y, by = NULL, copy = FALSE, suffix = NULL, auto_index = FALSE, ..., sql_on = NULL, na_matches = c("never", "na") ) ## S3 method for class 'tbl_lazy' full_join( x, y, by = NULL, copy = FALSE, suffix = NULL, auto_index = FALSE, ..., sql_on = NULL, na_matches = c("never", "na") ) ## S3 method for class 'tbl_lazy' semi_join( x, y, by = NULL, copy = FALSE, auto_index = FALSE, ..., sql_on = NULL, na_matches = c("never", "na") ) ## S3 method for class 'tbl_lazy' anti_join( x, y, by = NULL, copy = FALSE, auto_index = FALSE, ..., sql_on = NULL, na_matches = c("never", "na") )
x, y |
A pair of lazy data frames backed by database queries. |
by |
A character vector of variables to join by. If To join by different variables on To join by multiple variables, use a vector with length > 1.
For example, To perform a cross-join, generating all combinations of |
copy |
If This allows you to join tables across srcs, but it's potentially expensive operation so you must opt into it. |
suffix |
If there are non-joined duplicate variables in |
auto_index |
if |
... |
Other parameters passed onto methods. |
sql_on |
A custom join predicate as an SQL expression.
Usually joins use column equality, but you can perform more complex
queries by supply |
na_matches |
Should NA (NULL) values match one another?
The default, "never", is how databases usually work. |
Another tbl_lazy
. Use show_query()
to see the generated
query, and use collect()
to execute the query
and return data to R.
library(dplyr, warn.conflicts = FALSE) band_db <- tbl_memdb(dplyr::band_members) instrument_db <- tbl_memdb(dplyr::band_instruments) band_db %>% left_join(instrument_db) %>% show_query() # Can join with local data frames by setting copy = TRUE band_db %>% left_join(dplyr::band_instruments, copy = TRUE) # Unlike R, joins in SQL don't usually match NAs (NULLs) db <- memdb_frame(x = c(1, 2, NA)) label <- memdb_frame(x = c(1, NA), label = c("one", "missing")) db %>% left_join(label, by = "x") # But you can activate R's usual behaviour with the na_matches argument db %>% left_join(label, by = "x", na_matches = "na") # By default, joins are equijoins, but you can use `sql_on` to # express richer relationships db1 <- memdb_frame(x = 1:5) db2 <- memdb_frame(x = 1:3, y = letters[1:3]) db1 %>% left_join(db2) %>% show_query() db1 %>% left_join(db2, sql_on = "LHS.x < RHS.x") %>% show_query()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.