Nested Mutating joins

Nested mutating joins add columns from y to each of the nested data frames in .nest_data, matching observations based on the keys. There are four nested mutating joins:

Inner join

nest_inner_join() only keeps observations from .nest_data that have a matching key in y.

The most important property of an inner join is that unmatched rows in either input are not included in the result.

Outer joins

There are three outer joins that keep observations that appear in at least one of the data frames:

nest_left_join() keeps all observations in .nest_data.
nest_right_join() keeps all observations in y.
nest_full_join() keeps all observations in .nest_data and y.

Usage

nest_inner_join(
  .data,
  .nest_data,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE
)

nest_left_join(
  .data,
  .nest_data,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE
)

nest_right_join(
  .data,
  .nest_data,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE
)

nest_full_join(
  .data,
  .nest_data,
  y,
  by = NULL,
  copy = FALSE,
  suffix = c(".x", ".y"),
  ...,
  keep = FALSE
)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

y

A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).

by

A character vector of variables to join by or a join specification created with join_by().

If NULL, the default, nest_*_join() will perform a natural join, using all variables in common across each object in .nest_data and y. A message lists the variables so you can check they're correct; suppress the message by supplying by explicitly.

To join on different variables between the objects in .nest_data and y, use a named vector. For example, by = c("a" = "b") will match .nest_data$a to y$b for each object in .nest_data.

To join by multiple variables, use a vector with length >1. For example, by = c("a", "b") will match .nest_data$a to y$a and .nest_data$b to y$b for each object in .nest_data. Use a named vector to match different variables in .nest_data and y. For example, by = c("a" = "b", "c" = "d") will match .nest_data$a to y$b and .nest_data$c to y$d for each object in .nest_data.

To perform a cross-join, generating all combinations of each object in .nest_data and y, use by = character().

copy

If .nest_data and y are not from the same data source and copy = TRUE then y will be copied into the same src as .nest_data. (Need to review this parameter in more detail for applicability with nplyr)

suffix

If there are non-joined duplicate variables in .nest_data and y, these suffixes will be added to the output to disambiguate them. Should be a character vector of length 2.

...

Other parameters passed onto methods. Includes:

na_matches : Should two NA or two NaN values match?
- "na", the default, treats two NA or two NaN values as equal.
- "never" treats two NA or two NaN values as different, and will never match them together or to any other values.
multiple : Handling of rows in .nest_data with multiple matches in y.
- "all" returns every match detected in y.
- "any" returns one match detected in y, with no guarantees on which match will be returned. It is often faster than "first" and "last" if you just need to detect if there is at least one match.
- "first" returns the first match detected in y.
- "last" returns the last match detected in y.
- "warning" throws a warning if multiple matches are detected, and then falls back to "all".
- "error" throws an error if multiple matches are detected.
unmatched : How should unmatched keys that would result in dropped rows be handled?
- "drop" drops unmatched keys from the result.
- "error" throws an error if unmatched keys are detected.

keep

Should the join keys from both .nest_data and y be preserved in the output?

Value

An object of the same type as .data. Each object in the column .nest_data

will also be of the same type as the input. The order of the rows and columns of each object in .nest_data is preserved as much as possible. Each object in .nest_data has the following properties:

For nest_inner_join(), a subset of rows in each object in .nest_data. For nest_left_join(), all rows in each object in .nest_data. For nest_right_join(), a subset of rows in each object in .nest_data, followed by unmatched y rows. For nest_full_join(), all rows in each object in .nest_data, followed by unmatched y rows.
Output columns include all columns from each .nest_data and all non-key columns from y. If keep = TRUE, the key columns from y are included as well.
If non-key columns in any object in .nest_data and y have the same name, suffixes are added to disambiguate. If keep = TRUE and key columns in .nest_data and y have the same name, suffixes are added to disambiguate these as well.
If keep = FALSE, output columns included in by are coerced to their common type between the objects in .nest_data and y.
Groups are taken from .nest_data.

Details

nest_inner_join(), nest_left_join(), nest_right_join(), and nest_full_join() are largely wrappers for dplyr::inner_join(), dplyr::left_join(), dplyr::right_join(), and dplyr::full_join() and maintain the functionality of these verbs within each nested data frame. For more information on inner_join(), left_join(), right_join(), or full_join(), please refer to the documentation in dplyr.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)
gm_codes <- gapminder::country_codes

gm_nest %>% nest_inner_join(country_data, gm_codes, by = "country")
#> # A tibble: 5 × 2
#>   continent country_data      
#>   <fct>     <list>            
#> 1 Asia      <tibble [396 × 7]>
#> 2 Europe    <tibble [360 × 7]>
#> 3 Africa    <tibble [624 × 7]>
#> 4 Americas  <tibble [300 × 7]>
#> 5 Oceania   <tibble [24 × 7]> 
gm_nest %>% nest_left_join(country_data, gm_codes, by = "country")
#> # A tibble: 5 × 2
#>   continent country_data      
#>   <fct>     <list>            
#> 1 Asia      <tibble [396 × 7]>
#> 2 Europe    <tibble [360 × 7]>
#> 3 Africa    <tibble [624 × 7]>
#> 4 Americas  <tibble [300 × 7]>
#> 5 Oceania   <tibble [24 × 7]> 
gm_nest %>% nest_right_join(country_data, gm_codes, by = "country")
#> # A tibble: 5 × 2
#>   continent country_data      
#>   <fct>     <list>            
#> 1 Asia      <tibble [550 × 7]>
#> 2 Europe    <tibble [517 × 7]>
#> 3 Africa    <tibble [759 × 7]>
#> 4 Americas  <tibble [462 × 7]>
#> 5 Oceania   <tibble [209 × 7]>
gm_nest %>% nest_full_join(country_data, gm_codes, by = "country")
#> # A tibble: 5 × 2
#>   continent country_data      
#>   <fct>     <list>            
#> 1 Asia      <tibble [550 × 7]>
#> 2 Europe    <tibble [517 × 7]>
#> 3 Africa    <tibble [759 × 7]>
#> 4 Americas  <tibble [462 × 7]>
#> 5 Oceania   <tibble [209 × 7]>