nest_slice()
lets you index rows in nested data frames by their (integer)
locations. It allows you to select, remove, and duplicate rows. It is
accompanied by a number of helpers for common use cases:
nest_slice_head()
andnest_slice_tail()
select the first or last rows of each nested data frame in.nest_data
.nest_slice_sample()
randomly selects rows from each data frame in.nest_data
.nest_slice_min()
andnest_slice_max()
select the rows with the highest or lowest values of a variable within each nested data frame in.nest_data
.
If .nest_data
is a grouped data frame, the operation will be performed on
each group, so that (e.g.) nest_slice_head(df, nested_dfs, n = 5)
will
return the first five rows in each group for each nested data frame.
Usage
nest_slice(.data, .nest_data, ..., .preserve = FALSE)
nest_slice_head(.data, .nest_data, ...)
nest_slice_tail(.data, .nest_data, ...)
nest_slice_min(.data, .nest_data, order_by, ..., with_ties = TRUE)
nest_slice_max(.data, .nest_data, order_by, ..., with_ties = TRUE)
nest_slice_sample(.data, .nest_data, ..., weight_by = NULL, replace = FALSE)
Arguments
- .data
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).
- .nest_data
A list-column containing data frames
- ...
For
nest_slice()
: Integer row values.Provide either positive values to keep, or negative values to drop. The values provided must be either all positive or all negative. Indices beyond the number of rows in the input are silently ignored.
For
nest_slice_helpers()
, these arguments are passed on to methods.Additionally:
n
,prop
Provide eithern
, the number of rows, orprop
, the proportion of rows to select. If neither are supplied,n = 1
will be used.If a negative value of
n
orprop
is provided, the specified number or proportion of rows will be removed.If
n
is greater than the number of rows in the group (orprop > 1
), the result will be silently truncated to the group size. If the proportion of a group size does not yield an integer number of rows, the absolute value ofprop*nrow(.nest_data)
is rounded down.
- .preserve
Relevant when
.nest_data
is grouped. If.preserve = FALSE
(the default), the grouping structure is recalculated based on the resulting data, otherwise the grouping data is kept as is.- order_by
Variable or function of variables to order by.
- with_ties
Should ties be kept together? The default,
TRUE
, may return more rows than you request. UseFALSE
to ignore ties and return the firstn
rows.- weight_by
Sampling weights. This must evaluate to a vector of non-negative numbers the same length as the input. Weights are automatically standardised to sum to 1.
- replace
Should sampling be performed with (
TRUE
) or without (FALSE
, the default) replacement?
Value
An object of the same type as .data
. Each object in the column .nest_data
will also be of the same type as the input. Each object in .nest_data
has
the following properties:
Each row may appear 0, 1, or many times in the output.
Columns are not modified.
Groups are not modified.
Data frame attributes are preserved.
Details
nest_slice()
and its helpers are largely wrappers for dplyr::slice()
and
its helpers and maintains the functionality of slice()
and its helpers
within each nested data frame. For more information on slice()
or its
helpers, please refer to the documentation in
dplyr
.
See also
Other single table verbs:
nest_arrange()
,
nest_filter()
,
nest_mutate()
,
nest_rename()
,
nest_select()
,
nest_summarise()
Examples
gm_nest <- gapminder::gapminder %>% tidyr::nest(country_data = -continent)
# select the 1st, 3rd, and 5th rows in each data frame in country_data
gm_nest %>% nest_slice(country_data, 1, 3, 5)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [3 × 5]>
#> 2 Europe <tibble [3 × 5]>
#> 3 Africa <tibble [3 × 5]>
#> 4 Americas <tibble [3 × 5]>
#> 5 Oceania <tibble [3 × 5]>
# or select all but the 1st, 3rd, and 5th rows:
gm_nest %>% nest_slice(country_data, -1, -3, -5)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [393 × 5]>
#> 2 Europe <tibble [357 × 5]>
#> 3 Africa <tibble [621 × 5]>
#> 4 Americas <tibble [297 × 5]>
#> 5 Oceania <tibble [21 × 5]>
# first and last rows based on existing order:
gm_nest %>% nest_slice_head(country_data, n = 5)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [5 × 5]>
#> 2 Europe <tibble [5 × 5]>
#> 3 Africa <tibble [5 × 5]>
#> 4 Americas <tibble [5 × 5]>
#> 5 Oceania <tibble [5 × 5]>
gm_nest %>% nest_slice_tail(country_data, n = 5)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [5 × 5]>
#> 2 Europe <tibble [5 × 5]>
#> 3 Africa <tibble [5 × 5]>
#> 4 Americas <tibble [5 × 5]>
#> 5 Oceania <tibble [5 × 5]>
# rows with minimum and maximum values of a variable:
gm_nest %>% nest_slice_min(country_data, lifeExp, n = 5)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [5 × 5]>
#> 2 Europe <tibble [5 × 5]>
#> 3 Africa <tibble [5 × 5]>
#> 4 Americas <tibble [5 × 5]>
#> 5 Oceania <tibble [5 × 5]>
gm_nest %>% nest_slice_max(country_data, lifeExp, n = 5)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [5 × 5]>
#> 2 Europe <tibble [5 × 5]>
#> 3 Africa <tibble [5 × 5]>
#> 4 Americas <tibble [5 × 5]>
#> 5 Oceania <tibble [5 × 5]>
# randomly select rows with or without replacement:
gm_nest %>% nest_slice_sample(country_data, n = 5)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [5 × 5]>
#> 2 Europe <tibble [5 × 5]>
#> 3 Africa <tibble [5 × 5]>
#> 4 Americas <tibble [5 × 5]>
#> 5 Oceania <tibble [5 × 5]>
gm_nest %>% nest_slice_sample(country_data, n = 5, replace = TRUE)
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [5 × 5]>
#> 2 Europe <tibble [5 × 5]>
#> 3 Africa <tibble [5 × 5]>
#> 4 Americas <tibble [5 × 5]>
#> 5 Oceania <tibble [5 × 5]>