Extract a character column into multiple columns using regex groups in a column of nested data frames
Source:R/nest_extract.R
nest_extract.Rd
nest_extract()
is used to extract capturing groups from a column in a nested
data frame using regular expressions into a new column. If the groups don't
match, or the input is NA, the output will be NA.
Usage
nest_extract(
.data,
.nest_data,
col,
into,
regex = "([[:alnum:]]+)",
remove = TRUE,
convert = FALSE,
...
)
Arguments
- .data
A data frame, data frame extension (e.g., a tibble), or a lazy data frame (e.g., from dbplyr or dtplyr).
- .nest_data
A list-column containing data frames
- col
Column name or position within
.nest_data
(must be present within all nested data frames in.nest_data
). This is passed totidyselect::vars_pull()
.This argument is passed by expression and supports quasiquotation (you can unquote column names or column positions).
- into
Names of new variables to create as character vector. Use
NA
to omit the variable in the output.- regex
A string representing a regular expression used to extract the desired values. There should be one group (defined by
()
) for each element ofinto
.- remove
If
TRUE
, remove input column from output data frame.- convert
If
TRUE
, will runtype.convert()
withas.is = TRUE
on new columns. This is useful if the component columns are integer, numeric or logical.NB: this will cause string
"NA"
s to be converted toNA
s.- ...
Additional arguments passed on to
tidyr::extract()
methods.
Value
An object of the same type as .data
. Each object in the column .nest_data
will have new columns created according to the capture groups specified in the regular expression.
Details
nest_extract()
is a wrapper for tidyr::extract()
and maintains the functionality
of extract()
within each nested data frame. For more information on extract()
please refer to the documentation in 'tidyr'.
See also
Other tidyr verbs:
nest_drop_na()
,
nest_fill()
,
nest_replace_na()
,
nest_separate()
,
nest_unite()
Examples
set.seed(123)
gm <- gapminder::gapminder
gm <-
gm %>%
dplyr::mutate(comb = sample(c(NA, "a-b", "a-d", "b-c", "d-e"),
size = nrow(gm),
replace = TRUE))
gm_nest <- gm %>% tidyr::nest(country_data = -continent)
gm_nest %>%
nest_extract(country_data,
col = comb,
into = c("var1","var2"),
regex = "([[:alnum:]]+)-([[:alnum:]]+)")
#> # A tibble: 5 × 2
#> continent country_data
#> <fct> <list>
#> 1 Asia <tibble [396 × 7]>
#> 2 Europe <tibble [360 × 7]>
#> 3 Africa <tibble [624 × 7]>
#> 4 Americas <tibble [300 × 7]>
#> 5 Oceania <tibble [24 × 7]>