behead() takes one level of headers from a pivot table and makes it part of the data. Think of it like tidyr::gather(), except that it works when there is more than one row of headers (or more than one column of row-headers), and it only works on tables that have first come through as_cells() or tidyxl::xlsx_cells().

behead(
  cells,
  direction,
  name,
  values = NULL,
  types = data_type,
  formatters = list(),
  drop_na = TRUE
)

behead_if(
  cells,
  ...,
  direction,
  name,
  values = NULL,
  types = data_type,
  formatters = list(),
  drop_na = TRUE
)

Arguments

cells

Data frame. The cells of a pivot table, usually the output of as_cells() or tidyxl::xlsx_cells(), or of a subsequent operation on those outputs.

direction

The direction between a data cell and its header, one of "up", "right", "down", "left", "up-left", "up-right", "right-up", "right-down", "down-right", "down-left", "left-down", "left-up". See ?direction. "up-ish", "down-ish", "left-ish" and "right-ish" aren't available because they require certain ambiguities that are better handled by using enhead() directly rather than via behead().

name

A name to give the new column that will be created, e.g. "location" if the headers are locations. Quoted ("location", not location) because it doesn't refer to an actual object.

values

Optional. The column of cells to use as the values of each header. Given as a bare variable name. If omitted (the default), the types argument will be used instead.

types

The name of the column that names the data type of each cell. Usually called data_types (the default), this is a character column that names the other columns in cells that contain the values of each cell. E.g. a cell with a character value will have "character" in this column. Unquoted(data_types, not "data_types") because it refers to an actual object.

formatters

A named list of functions for formatting each data type in a set of headers of mixed data types, e.g. when some headers are dates and others are characters. These can be given as character = toupper or character = ~ toupper(.x), similar to purrr::map.

drop_na

logical Whether to filter out headers that have NA in the value column. Default: TRUE. This can happen with the output of tidyxl::xlsx_cells(), when an empty cell exists because it has formatting applied to it, but should be ignored.

...

Passed to dplyr::filter. logical predicates defined in terms of the variables in .data. Multiple conditions are combined with &. Only rows where the condition evaluates to TRUE are kept.

The arguments in ... are automatically quoted and evaluated in the context of the data frame. They support unquoting and splicing. See the dplyr vignette("programming") for an introduction to these concepts.

Value

A data frame

Examples

# A simple table with a row of headers (x <- data.frame(a = 1:2, b = 3:4))
#> a b #> 1 1 3 #> 2 2 4
# Make a tidy representation of each cell (cells <- as_cells(x, col_names = TRUE))
#> # A tibble: 6 x 5 #> row col data_type chr int #> <int> <int> <chr> <chr> <int> #> 1 1 1 chr a NA #> 2 2 1 int NA 1 #> 3 3 1 int NA 2 #> 4 1 2 chr b NA #> 5 2 2 int NA 3 #> 6 3 2 int NA 4
# Strip the cells in row 1 (the original headers) and use them as data behead(cells, "N", foo)
#> # A tibble: 4 x 6 #> row col data_type chr int foo #> <int> <int> <chr> <chr> <int> <chr> #> 1 2 1 int NA 1 a #> 2 3 1 int NA 2 a #> 3 2 2 int NA 3 b #> 4 3 2 int NA 4 b
# More complex example: pivot table with several layers of headers (x <- purpose$`up-left left-up`)
#> X2 X3 X4 X5 X6 X7 #> 1 <NA> <NA> Female <NA> Male <NA> #> 2 <NA> <NA> 0 - 6 7 - 10 0 - 6 7 - 10 #> 3 Bachelor's degree 15 - 24 7000 27000 <NA> 13000 #> 4 <NA> 25 - 44 12000 137000 9000 81000 #> 5 <NA> 45 - 64 10000 64000 7000 66000 #> 6 <NA> 65+ <NA> 18000 7000 17000 #> 7 Certificate 15 - 24 29000 161000 30000 190000 #> 8 <NA> 25 - 44 34000 179000 31000 219000 #> 9 <NA> 45 - 64 30000 210000 23000 199000 #> 10 <NA> 65+ 12000 77000 8000 107000 #> 11 Diploma 15 - 24 <NA> 14000 9000 11000 #> 12 <NA> 25 - 44 10000 66000 8000 47000 #> 13 <NA> 45 - 64 6000 68000 5000 58000 #> 14 <NA> 65+ 5000 41000 1000 34000 #> 15 No Qualification 15 - 24 10000 43000 12000 37000 #> 16 <NA> 25 - 44 11000 36000 21000 50000 #> 17 <NA> 45 - 64 19000 91000 17000 75000 #> 18 <NA> 65+ 16000 118000 9000 66000 #> 19 Postgraduate qualification 15 - 24 <NA> 6000 <NA> <NA> #> 20 <NA> 25 - 44 5000 86000 7000 60000 #> 21 <NA> 45 - 64 6000 55000 6000 68000 #> 22 <NA> 65+ <NA> 13000 <NA> 18000
# Make a tidy representation cells <- as_cells(x) head(cells)
#> # A tibble: 6 x 4 #> row col data_type chr #> <int> <int> <chr> <chr> #> 1 1 1 chr NA #> 2 2 1 chr NA #> 3 3 1 chr Bachelor's degree #> 4 4 1 chr NA #> 5 5 1 chr NA #> 6 6 1 chr NA
tail(cells)
#> # A tibble: 6 x 4 #> row col data_type chr #> <int> <int> <chr> <chr> #> 1 17 6 chr 75000 #> 2 18 6 chr 66000 #> 3 19 6 chr NA #> 4 20 6 chr 60000 #> 5 21 6 chr 68000 #> 6 22 6 chr 18000
# Strip the headers and make them into data tidy <- cells %>% behead("up-left", Sex) %>% behead("up", `Sense of purpose`) %>% behead("left-up", `Highest qualification`) %>% behead("left", `Age group (Life-stages)`) %>% dplyr::mutate(count = as.integer(chr)) %>% dplyr::select(-row, -col, -data_type, -chr) head(tidy)
#> # A tibble: 6 x 5 #> Sex `Sense of purpose` `Highest qualificati… `Age group (Life-stage… count #> <chr> <chr> <chr> <chr> <int> #> 1 Female 0 - 6 Bachelor's degree 15 - 24 7000 #> 2 Female 0 - 6 Bachelor's degree 25 - 44 12000 #> 3 Female 0 - 6 Bachelor's degree 45 - 64 10000 #> 4 Female 0 - 6 Bachelor's degree 65+ NA #> 5 Female 7 - 10 Bachelor's degree 15 - 24 27000 #> 6 Female 7 - 10 Bachelor's degree 25 - 44 137000
# Check against the provided 'tidy' version of the data. dplyr::anti_join(tidy, purpose$Tidy)
#> Joining, by = c("Sex", "Sense of purpose", "Highest qualification", "Age group (Life-stages)")
#> # A tibble: 1 x 5 #> Sex `Sense of purpose` `Highest qualification` `Age group (Life-stag… count #> <chr> <chr> <chr> <chr> <int> #> 1 Male 0 - 6 Postgraduate qualificat… 15 - 24 NA
# The provided 'tidy' data is missing a row for Male 15-24-year-olds with a # postgraduate qualification and a sense of purpose between 0 and 6. That # seems to have been an oversight by Statistics New Zealand. cells <- tibble::tribble( ~X1, ~adult, ~juvenile, "LION", 855, 677, "male", 496, 322, "female", 359, 355, "TIGER", 690, 324, "male", 381, 222, "female", 309, 102 ) cells <- as_cells(cells, col_names = TRUE) cells %>% behead_if(chr == toupper(chr), direction = "left-up", name = "species") %>% behead("left", "sex") %>% behead("up", "age") %>% dplyr::select(species, sex, age, population = dbl)
#> # A tibble: 12 x 4 #> species sex age population #> <chr> <chr> <chr> <dbl> #> 1 LION NA adult 855 #> 2 LION male adult 496 #> 3 LION female adult 359 #> 4 LION NA juvenile 677 #> 5 LION male juvenile 322 #> 6 LION female juvenile 355 #> 7 TIGER NA adult 690 #> 8 TIGER male adult 381 #> 9 TIGER female adult 309 #> 10 TIGER NA juvenile 324 #> 11 TIGER male juvenile 222 #> 12 TIGER female juvenile 102