behead()
takes one level of headers from a pivot table and
makes it part of the data. Think of it like tidyr::gather()
, except that
it works when there is more than one row of headers (or more than one column
of row-headers), and it only works on tables that have first come through
as_cells()
or tidyxl::xlsx_cells()
.
behead( cells, direction, name, values = NULL, types = data_type, formatters = list(), drop_na = TRUE ) behead_if( cells, ..., direction, name, values = NULL, types = data_type, formatters = list(), drop_na = TRUE )
cells | Data frame. The cells of a pivot table, usually the output of
|
---|---|
direction | The direction between a data cell and its header, one of
|
name | A name to give the new column that will be created, e.g.
|
values | Optional. The column of |
types | The name of the column that names the data type of each cell.
Usually called |
formatters | A named list of functions for formatting each data type in
a set of headers of mixed data types, e.g. when some headers are dates and
others are characters. These can be given as |
drop_na | logical Whether to filter out headers that have |
... | Passed to dplyr::filter. logical predicates defined in terms of
the variables in The arguments in |
A data frame
#> a b #> 1 1 3 #> 2 2 4#> # A tibble: 6 x 5 #> row col data_type chr int #> <int> <int> <chr> <chr> <int> #> 1 1 1 chr a NA #> 2 2 1 int NA 1 #> 3 3 1 int NA 2 #> 4 1 2 chr b NA #> 5 2 2 int NA 3 #> 6 3 2 int NA 4# Strip the cells in row 1 (the original headers) and use them as data behead(cells, "N", foo)#> # A tibble: 4 x 6 #> row col data_type chr int foo #> <int> <int> <chr> <chr> <int> <chr> #> 1 2 1 int NA 1 a #> 2 3 1 int NA 2 a #> 3 2 2 int NA 3 b #> 4 3 2 int NA 4 b# More complex example: pivot table with several layers of headers (x <- purpose$`up-left left-up`)#> X2 X3 X4 X5 X6 X7 #> 1 <NA> <NA> Female <NA> Male <NA> #> 2 <NA> <NA> 0 - 6 7 - 10 0 - 6 7 - 10 #> 3 Bachelor's degree 15 - 24 7000 27000 <NA> 13000 #> 4 <NA> 25 - 44 12000 137000 9000 81000 #> 5 <NA> 45 - 64 10000 64000 7000 66000 #> 6 <NA> 65+ <NA> 18000 7000 17000 #> 7 Certificate 15 - 24 29000 161000 30000 190000 #> 8 <NA> 25 - 44 34000 179000 31000 219000 #> 9 <NA> 45 - 64 30000 210000 23000 199000 #> 10 <NA> 65+ 12000 77000 8000 107000 #> 11 Diploma 15 - 24 <NA> 14000 9000 11000 #> 12 <NA> 25 - 44 10000 66000 8000 47000 #> 13 <NA> 45 - 64 6000 68000 5000 58000 #> 14 <NA> 65+ 5000 41000 1000 34000 #> 15 No Qualification 15 - 24 10000 43000 12000 37000 #> 16 <NA> 25 - 44 11000 36000 21000 50000 #> 17 <NA> 45 - 64 19000 91000 17000 75000 #> 18 <NA> 65+ 16000 118000 9000 66000 #> 19 Postgraduate qualification 15 - 24 <NA> 6000 <NA> <NA> #> 20 <NA> 25 - 44 5000 86000 7000 60000 #> 21 <NA> 45 - 64 6000 55000 6000 68000 #> 22 <NA> 65+ <NA> 13000 <NA> 18000#> # A tibble: 6 x 4 #> row col data_type chr #> <int> <int> <chr> <chr> #> 1 1 1 chr NA #> 2 2 1 chr NA #> 3 3 1 chr Bachelor's degree #> 4 4 1 chr NA #> 5 5 1 chr NA #> 6 6 1 chr NA#> # A tibble: 6 x 4 #> row col data_type chr #> <int> <int> <chr> <chr> #> 1 17 6 chr 75000 #> 2 18 6 chr 66000 #> 3 19 6 chr NA #> 4 20 6 chr 60000 #> 5 21 6 chr 68000 #> 6 22 6 chr 18000# Strip the headers and make them into data tidy <- cells %>% behead("up-left", Sex) %>% behead("up", `Sense of purpose`) %>% behead("left-up", `Highest qualification`) %>% behead("left", `Age group (Life-stages)`) %>% dplyr::mutate(count = as.integer(chr)) %>% dplyr::select(-row, -col, -data_type, -chr) head(tidy)#> # A tibble: 6 x 5 #> Sex `Sense of purpose` `Highest qualificati… `Age group (Life-stage… count #> <chr> <chr> <chr> <chr> <int> #> 1 Female 0 - 6 Bachelor's degree 15 - 24 7000 #> 2 Female 0 - 6 Bachelor's degree 25 - 44 12000 #> 3 Female 0 - 6 Bachelor's degree 45 - 64 10000 #> 4 Female 0 - 6 Bachelor's degree 65+ NA #> 5 Female 7 - 10 Bachelor's degree 15 - 24 27000 #> 6 Female 7 - 10 Bachelor's degree 25 - 44 137000#>#> # A tibble: 1 x 5 #> Sex `Sense of purpose` `Highest qualification` `Age group (Life-stag… count #> <chr> <chr> <chr> <chr> <int> #> 1 Male 0 - 6 Postgraduate qualificat… 15 - 24 NA# The provided 'tidy' data is missing a row for Male 15-24-year-olds with a # postgraduate qualification and a sense of purpose between 0 and 6. That # seems to have been an oversight by Statistics New Zealand. cells <- tibble::tribble( ~X1, ~adult, ~juvenile, "LION", 855, 677, "male", 496, 322, "female", 359, 355, "TIGER", 690, 324, "male", 381, 222, "female", 309, 102 ) cells <- as_cells(cells, col_names = TRUE) cells %>% behead_if(chr == toupper(chr), direction = "left-up", name = "species") %>% behead("left", "sex") %>% behead("up", "age") %>% dplyr::select(species, sex, age, population = dbl)#> # A tibble: 12 x 4 #> species sex age population #> <chr> <chr> <chr> <dbl> #> 1 LION NA adult 855 #> 2 LION male adult 496 #> 3 LION female adult 359 #> 4 LION NA juvenile 677 #> 5 LION male juvenile 322 #> 6 LION female juvenile 355 #> 7 TIGER NA adult 690 #> 8 TIGER male adult 381 #> 9 TIGER female adult 309 #> 10 TIGER NA juvenile 324 #> 11 TIGER male juvenile 222 #> 12 TIGER female juvenile 102