Data cells in a table are associated with header cells by proximity.
enhead()
joins a data frame of data cells to a data frame of header cells,
choosing the nearest header cells in the given direction. See ?direction
.
enhead(data_cells, header_cells, direction, drop = TRUE)
data_cells | Data frame of data cells with at least the columns 'row'
and 'column', which are |
---|---|
header_cells | Data frame of header cells with at least the columns 'row' and 'column', which are numeric/integer vectors. |
direction | The direction between a data cell and its header, one of
|
drop | Logical vector length 1. Whether data cells that can't be
associated with a header should be dropped. Default: |
#> #>#>#> #>#>#> #># Load some pivoted data (x <- purpose$`up-left left-up`)#> X2 X3 X4 X5 X6 X7 #> 1 <NA> <NA> Female <NA> Male <NA> #> 2 <NA> <NA> 0 - 6 7 - 10 0 - 6 7 - 10 #> 3 Bachelor's degree 15 - 24 7000 27000 <NA> 13000 #> 4 <NA> 25 - 44 12000 137000 9000 81000 #> 5 <NA> 45 - 64 10000 64000 7000 66000 #> 6 <NA> 65+ <NA> 18000 7000 17000 #> 7 Certificate 15 - 24 29000 161000 30000 190000 #> 8 <NA> 25 - 44 34000 179000 31000 219000 #> 9 <NA> 45 - 64 30000 210000 23000 199000 #> 10 <NA> 65+ 12000 77000 8000 107000 #> 11 Diploma 15 - 24 <NA> 14000 9000 11000 #> 12 <NA> 25 - 44 10000 66000 8000 47000 #> 13 <NA> 45 - 64 6000 68000 5000 58000 #> 14 <NA> 65+ 5000 41000 1000 34000 #> 15 No Qualification 15 - 24 10000 43000 12000 37000 #> 16 <NA> 25 - 44 11000 36000 21000 50000 #> 17 <NA> 45 - 64 19000 91000 17000 75000 #> 18 <NA> 65+ 16000 118000 9000 66000 #> 19 Postgraduate qualification 15 - 24 <NA> 6000 <NA> <NA> #> 20 <NA> 25 - 44 5000 86000 7000 60000 #> 21 <NA> 45 - 64 6000 55000 6000 68000 #> 22 <NA> 65+ <NA> 13000 <NA> 18000#> # A tibble: 6 x 4 #> row col data_type chr #> <int> <int> <chr> <chr> #> 1 3 1 chr Bachelor's degree #> 2 7 1 chr Certificate #> 3 11 1 chr Diploma #> 4 15 1 chr No Qualification #> 5 19 1 chr Postgraduate qualification #> 6 3 2 chr 15 - 24# Select the cells containing the values data_cells <- filter(cells, row >= 3, col >= 3) %>% transmute(row, col, count = as.integer(chr)) head(data_cells)#> # A tibble: 6 x 3 #> row col count #> <int> <int> <int> #> 1 3 3 7000 #> 2 4 3 12000 #> 3 5 3 10000 #> 4 7 3 29000 #> 5 8 3 34000 #> 6 9 3 30000# Select the headers qualification <- filter(cells, col == 1) %>% select(row, col, qualification = chr) age <- filter(cells, col == 2) %>% select(row, col, age = chr) gender <- filter(cells, row == 1) %>% select(row, col, gender = chr) satisfaction <- filter(cells, row == 2) %>% select(row, col, satisfaction = chr) # From each data cell, search for the nearest one of each of the headers data_cells %>% enhead(gender, "up-left") %>% enhead(satisfaction, "up") %>% enhead(qualification, "left-up") %>% enhead(age, "left") %>% select(-row, -col)#> # A tibble: 72 x 5 #> count gender satisfaction qualification age #> <int> <chr> <chr> <chr> <chr> #> 1 7000 Female 0 - 6 Bachelor's degree 15 - 24 #> 2 12000 Female 0 - 6 Bachelor's degree 25 - 44 #> 3 10000 Female 0 - 6 Bachelor's degree 45 - 64 #> 4 27000 Female 7 - 10 Bachelor's degree 15 - 24 #> 5 137000 Female 7 - 10 Bachelor's degree 25 - 44 #> 6 64000 Female 7 - 10 Bachelor's degree 45 - 64 #> 7 18000 Female 7 - 10 Bachelor's degree 65+ #> 8 9000 Male 0 - 6 Bachelor's degree 25 - 44 #> 9 7000 Male 0 - 6 Bachelor's degree 45 - 64 #> 10 7000 Male 0 - 6 Bachelor's degree 65+ #> # … with 62 more rows# The `drop` argument controls what happens when for some cells there is no # header in the given direction. When `drop = TRUE` (the default), cells that # can't be joined to a header are dropped. Otherwise they are kept. enhead(data_cells, gender, "up")#> # A tibble: 33 x 4 #> row col count gender #> <int> <int> <int> <chr> #> 1 3 3 7000 Female #> 2 4 3 12000 Female #> 3 5 3 10000 Female #> 4 7 3 29000 Female #> 5 8 3 34000 Female #> 6 9 3 30000 Female #> 7 10 3 12000 Female #> 8 12 3 10000 Female #> 9 13 3 6000 Female #> 10 14 3 5000 Female #> # … with 23 more rowsenhead(data_cells, gender, "up", drop = FALSE)#> # A tibble: 72 x 4 #> row col count gender #> <int> <int> <int> <chr> #> 1 3 3 7000 Female #> 2 4 3 12000 Female #> 3 5 3 10000 Female #> 4 7 3 29000 Female #> 5 8 3 34000 Female #> 6 9 3 30000 Female #> 7 10 3 12000 Female #> 8 12 3 10000 Female #> 9 13 3 6000 Female #> 10 14 3 5000 Female #> # … with 62 more rows