Skip to contents

nest_extract() is used to extract capturing groups from a column in a nested data frame using regular expressions into a new column. If the groups don't match, or the input is NA, the output will be NA.


  regex = "([[:alnum:]]+)",
  remove = TRUE,
  convert = FALSE,



A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).


A list-column containing data frames


Column name or position within .nest_data (must be present within all nested data frames in .nest_data). This is passed to tidyselect::vars_pull().

This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).


Names of new variables to create as character vector. Use NA to omit the variable in the output.


A string representing a regular expression used to extract the desired values. There should be one group (defined by ()) for each element of into.


If TRUE, remove input column from output data frame.


If TRUE, will run type.convert() with = TRUE on new columns. This is useful if the component columns are integer, numeric or logical.

NB: this will cause string "NA"s to be converted to NAs.


Additional arguments passed on to tidyr::extract() methods.


An object of the same type as .data. Each object in the column .nest_data

will have new columns created according to the capture groups specified in the regular expression.


nest_extract() is a wrapper for tidyr::extract() and maintains the functionality of extract() within each nested data frame. For more information on extract() please refer to the documentation in 'tidyr'.

See also


gm <- gapminder::gapminder 

gm <- 
  gm %>% 
  dplyr::mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"),
                              size = nrow(gm),
                              replace = TRUE))
gm_nest <- gm %>% tidyr::nest(country_data = -continent)

gm_nest %>% 
               col = comb,
               into = c("var1","var2"),
               regex = "([[:alnum:]]+)-([[:alnum:]]+)")
#> # A tibble: 5 × 2
#>   continent country_data      
#>   <fct>     <list>            
#> 1 Asia      <tibble [396 × 7]>
#> 2 Europe    <tibble [360 × 7]>
#> 3 Africa    <tibble [624 × 7]>
#> 4 Americas  <tibble [300 × 7]>
#> 5 Oceania   <tibble [24 × 7]>