behead()
takes one level of headers from a pivot table and
makes it part of the data. Think of it like tidyr::gather()
, except that
it works when there is more than one row of headers (or more than one column
of row-headers), and it only works on tables that have first come through
as_cells()
or tidyxl::xlsx_cells()
.
Arguments
- cells
Data frame. The cells of a pivot table, usually the output of
as_cells()
ortidyxl::xlsx_cells()
, or of a subsequent operation on those outputs.- direction
The direction between a data cell and its header, one of
"up"
,"right"
,"down"
,"left"
,"up-left"
,"up-right"
,"right-up"
,"right-down"
,"down-right"
,"down-left"
,"left-down"
,"left-up"
. See?direction
."up-ish"
,"down-ish"
,"left-ish"
and"right-ish"
aren't available because they require certain ambiguities that are better handled by usingenhead()
directly rather than viabehead()
.- name
A name to give the new column that will be created, e.g.
"location"
if the headers are locations. Quoted ("location"
, notlocation
) because it doesn't refer to an actual object.- values
Optional. The column of
cells
to use as the values of each header. Given as a bare variable name. If omitted (the default), thetypes
argument will be used instead.- types
The name of the column that names the data type of each cell. Usually called
data_types
(the default), this is a character column that names the other columns incells
that contain the values of each cell. E.g. a cell with a character value will have"character"
in this column. Unquoted(data_types
, not"data_types"
) because it refers to an actual object.- formatters
A named list of functions for formatting each data type in a set of headers of mixed data types, e.g. when some headers are dates and others are characters. These can be given as
character = toupper
orcharacter = ~ toupper(.x)
, similar to purrr::map.- drop_na
logical Whether to filter out headers that have
NA
in thevalue
column. Default:TRUE
. This can happen with the output oftidyxl::xlsx_cells()
, when an empty cell exists because it has formatting applied to it, but should be ignored.- ...
Passed to dplyr::filter. logical predicates defined in terms of the variables in
.data
. Multiple conditions are combined with&
. Only rows where the condition evaluates toTRUE
are kept.The arguments in
...
are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See the dplyrvignette("programming")
for an introduction to these concepts.
Examples
# A simple table with a row of headers
(x <- data.frame(a = 1:2, b = 3:4))
#> a b
#> 1 1 3
#> 2 2 4
# Make a tidy representation of each cell
(cells <- as_cells(x, col_names = TRUE))
#> # A tibble: 6 × 5
#> row col data_type chr int
#> <int> <int> <chr> <chr> <int>
#> 1 1 1 chr a NA
#> 2 2 1 int NA 1
#> 3 3 1 int NA 2
#> 4 1 2 chr b NA
#> 5 2 2 int NA 3
#> 6 3 2 int NA 4
# Strip the cells in row 1 (the original headers) and use them as data
behead(cells, "N", foo)
#> # A tibble: 4 × 6
#> row col data_type chr int foo
#> <int> <int> <chr> <chr> <int> <chr>
#> 1 2 1 int NA 1 a
#> 2 3 1 int NA 2 a
#> 3 2 2 int NA 3 b
#> 4 3 2 int NA 4 b
# More complex example: pivot table with several layers of headers
(x <- purpose$`up-left left-up`)
#> X2 X3 X4 X5 X6 X7
#> 1 <NA> <NA> Female <NA> Male <NA>
#> 2 <NA> <NA> 0 - 6 7 - 10 0 - 6 7 - 10
#> 3 Bachelor's degree 15 - 24 7000 27000 <NA> 13000
#> 4 <NA> 25 - 44 12000 137000 9000 81000
#> 5 <NA> 45 - 64 10000 64000 7000 66000
#> 6 <NA> 65+ <NA> 18000 7000 17000
#> 7 Certificate 15 - 24 29000 161000 30000 190000
#> 8 <NA> 25 - 44 34000 179000 31000 219000
#> 9 <NA> 45 - 64 30000 210000 23000 199000
#> 10 <NA> 65+ 12000 77000 8000 107000
#> 11 Diploma 15 - 24 <NA> 14000 9000 11000
#> 12 <NA> 25 - 44 10000 66000 8000 47000
#> 13 <NA> 45 - 64 6000 68000 5000 58000
#> 14 <NA> 65+ 5000 41000 1000 34000
#> 15 No Qualification 15 - 24 10000 43000 12000 37000
#> 16 <NA> 25 - 44 11000 36000 21000 50000
#> 17 <NA> 45 - 64 19000 91000 17000 75000
#> 18 <NA> 65+ 16000 118000 9000 66000
#> 19 Postgraduate qualification 15 - 24 <NA> 6000 <NA> <NA>
#> 20 <NA> 25 - 44 5000 86000 7000 60000
#> 21 <NA> 45 - 64 6000 55000 6000 68000
#> 22 <NA> 65+ <NA> 13000 <NA> 18000
# Make a tidy representation
cells <- as_cells(x)
head(cells)
#> # A tibble: 6 × 4
#> row col data_type chr
#> <int> <int> <chr> <chr>
#> 1 1 1 chr NA
#> 2 2 1 chr NA
#> 3 3 1 chr Bachelor's degree
#> 4 4 1 chr NA
#> 5 5 1 chr NA
#> 6 6 1 chr NA
tail(cells)
#> # A tibble: 6 × 4
#> row col data_type chr
#> <int> <int> <chr> <chr>
#> 1 17 6 chr 75000
#> 2 18 6 chr 66000
#> 3 19 6 chr NA
#> 4 20 6 chr 60000
#> 5 21 6 chr 68000
#> 6 22 6 chr 18000
# Strip the headers and make them into data
tidy <-
cells %>%
behead("up-left", Sex) %>%
behead("up", `Sense of purpose`) %>%
behead("left-up", `Highest qualification`) %>%
behead("left", `Age group (Life-stages)`) %>%
dplyr::mutate(count = as.integer(chr)) %>%
dplyr::select(-row, -col, -data_type, -chr)
head(tidy)
#> # A tibble: 6 × 5
#> Sex `Sense of purpose` Highest qualificatio…¹ Age group (Life-stag…² count
#> <chr> <chr> <chr> <chr> <int>
#> 1 Female 0 - 6 Bachelor's degree 15 - 24 7000
#> 2 Female 0 - 6 Bachelor's degree 25 - 44 12000
#> 3 Female 0 - 6 Bachelor's degree 45 - 64 10000
#> 4 Female 0 - 6 Bachelor's degree 65+ NA
#> 5 Female 7 - 10 Bachelor's degree 15 - 24 27000
#> 6 Female 7 - 10 Bachelor's degree 25 - 44 137000
#> # ℹ abbreviated names: ¹`Highest qualification`, ²`Age group (Life-stages)`
# Check against the provided 'tidy' version of the data.
dplyr::anti_join(tidy, purpose$Tidy)
#> Joining with `by = join_by(Sex, `Sense of purpose`, `Highest qualification`,
#> `Age group (Life-stages)`)`
#> # A tibble: 1 × 5
#> Sex `Sense of purpose` `Highest qualification` Age group (Life-stag…¹ count
#> <chr> <chr> <chr> <chr> <int>
#> 1 Male 0 - 6 Postgraduate qualificat… 15 - 24 NA
#> # ℹ abbreviated name: ¹`Age group (Life-stages)`
# The provided 'tidy' data is missing a row for Male 15-24-year-olds with a
# postgraduate qualification and a sense of purpose between 0 and 6. That
# seems to have been an oversight by Statistics New Zealand.
cells <- tibble::tribble(
~X1, ~adult, ~juvenile,
"LION", 855, 677,
"male", 496, 322,
"female", 359, 355,
"TIGER", 690, 324,
"male", 381, 222,
"female", 309, 102
)
cells <- as_cells(cells, col_names = TRUE)
cells %>%
behead_if(chr == toupper(chr), direction = "left-up", name = "species") %>%
behead("left", "sex") %>%
behead("up", "age") %>%
dplyr::select(species, sex, age, population = dbl)
#> # A tibble: 12 × 4
#> species sex age population
#> <chr> <chr> <chr> <dbl>
#> 1 LION NA adult 855
#> 2 LION male adult 496
#> 3 LION female adult 359
#> 4 LION NA juvenile 677
#> 5 LION male juvenile 322
#> 6 LION female juvenile 355
#> 7 TIGER NA adult 690
#> 8 TIGER male adult 381
#> 9 TIGER female adult 309
#> 10 TIGER NA juvenile 324
#> 11 TIGER male juvenile 222
#> 12 TIGER female juvenile 102