Summarise each group in nested data frames to fewer rows

nest_summarise() creates a new set of nested data frames. Each will have one (or more) rows for each combination of grouping variables; if there are no grouping variables, the output will have a single row summarising all observations in .nest_data. Each nested data frame will contain one column for each grouping variable and one column for each of the summary statistics that you have specified.

nest_summarise() and nest_summarize() are synonyms.

Usage

nest_summarise(.data, .nest_data, ..., .groups = NULL)

nest_summarize(.data, .nest_data, ..., .groups = NULL)

Arguments

.data

A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).

.nest_data

A list-column containing data frames

...

Name-value pairs of functions. The name will be the name of the variable in the result.

The value can be:

A vector of length 1, e.g. min(x), n(), or sum(is.na(y)).
A vector of length n, e.g., quantile().
A data frame, to add multiple columns from a single expression.

.groups

Grouping structure of the result. Refer to dplyr::summarise() for more up-to-date information.

Value

An object of the same type as .data. Each object in the column .nest_data

will usually be of the same type as the input. Each object in .nest_data has the following properties:

The rows come from the underlying group_keys()
The columns are a combination of the grouping keys and the summary expressions that you provide.
The grouping structure is controlled by the .groups argument, the output may be another grouped_df, a tibble, or a rowwise data frame.
Data frame attributes are not preserved, because nest_summarise() fundamentally creates a new data frame for each object in .nest_data.

Details

nest_summarise() is largely a wrapper for dplyr::summarise() and maintains the functionality of summarise() within each nested data frame. For more information on summarise(), please refer to the documentation in dplyr.

Examples

gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)

# a summary applied to an ungrouped tbl returns a single row
gm_nest %>%
  nest_summarise(
    country_data,
    n = dplyr::n(),
    median_pop = median(pop)
  )
#> # A tibble: 5 × 2
#>   continent country_data    
#>   <fct>     <list>          
#> 1 Asia      <tibble [1 × 2]>
#> 2 Europe    <tibble [1 × 2]>
#> 3 Africa    <tibble [1 × 2]>
#> 4 Americas  <tibble [1 × 2]>
#> 5 Oceania   <tibble [1 × 2]>

# usually, you'll want to group first
gm_nest %>%
  nest_group_by(country_data, country) %>%
  nest_summarise(
    country_data,
    n = dplyr::n(),
    median_pop = median(pop)
  )
#> # A tibble: 5 × 2
#>   continent country_data     
#>   <fct>     <list>           
#> 1 Asia      <tibble [33 × 3]>
#> 2 Europe    <tibble [30 × 3]>
#> 3 Africa    <tibble [52 × 3]>
#> 4 Americas  <tibble [25 × 3]>
#> 5 Oceania   <tibble [2 × 3]>