Main
Importing the data
Spreadsheet cells are imported with the xlsx_cells()
function, which returns a data frame of all the cells in all the requested sheets. By default, every sheet is imported, but we don’t have to worry about that in this case because there is only one sheet in the file. We can also straightaway discard rows above 14 and below 56, and columns beyond 20.
cells <-
xlsx_cells(path) %>%
dplyr::filter(!is_blank, between(row, 14L, 56L), col <= 20) %>%
select(row, col, data_type, numeric, character, date)
Cell formatting isn’t required for this vignette, but if it were, it would be imported via xlsx_formats(path)
.
formatting <- xlsx_formats(path)
Importing one of the multiples
The small multiples each have exactly one ‘Fixed Price’ header cell, so begin by filtering for those cells, and then move the selection up one row to get the title cells. The title cells are the top-left corner cell of each table.
title <-
dplyr::filter(cells, character == "Fixed Price") %>%
select(row, col) %>%
mutate(row = row - 1L) %>%
inner_join(cells, by = c("row", "col"))
Use these title cells to partition the sheet.
partitions <- partition(cells, title)
Taking one of the partitions, unpivot with behead()
. The compass directions "NNW"
and "N"
express the direction from each data cell to its header. "NNW"
means “look up and then left to find the nearest header.”
partitions$cells[[1]] %>%
behead("NNW", "title") %>%
behead("NNW", "price") %>%
behead("N", "bid_offer") %>%
print(n = Inf)
## # A tibble: 24 x 9
## row col data_type numeric character date title price
## <int> <int> <chr> <dbl> <chr> <dttm> <chr> <chr>
## 1 17 17 numeric 1.89 <NA> NA IF NWPL R… Fixed…
## 2 17 18 numeric 1.91 <NA> NA IF NWPL R… Fixed…
## 3 18 17 numeric 2.06 <NA> NA IF NWPL R… Fixed…
## 4 18 18 numeric 2.08 <NA> NA IF NWPL R… Fixed…
## 5 19 17 numeric 2.40 <NA> NA IF NWPL R… Fixed…
## 6 19 18 numeric 2.42 <NA> NA IF NWPL R… Fixed…
## 7 20 17 numeric 2.59 <NA> NA IF NWPL R… Fixed…
## 8 20 18 numeric 2.61 <NA> NA IF NWPL R… Fixed…
## 9 21 17 numeric 2.58 <NA> NA IF NWPL R… Fixed…
## 10 21 18 numeric 2.60 <NA> NA IF NWPL R… Fixed…
## 11 22 17 numeric 3.36 <NA> NA IF NWPL R… Fixed…
## 12 22 18 numeric 3.38 <NA> NA IF NWPL R… Fixed…
## 13 23 17 numeric 2.63 <NA> NA IF NWPL R… Fixed…
## 14 23 18 numeric 2.65 <NA> NA IF NWPL R… Fixed…
## 15 19 19 numeric -0.565 <NA> NA IF NWPL R… Basis
## 16 19 20 numeric -0.545 <NA> NA IF NWPL R… Basis
## 17 20 19 numeric -0.494 <NA> NA IF NWPL R… Basis
## 18 20 20 numeric -0.474 <NA> NA IF NWPL R… Basis
## 19 21 19 numeric -0.585 <NA> NA IF NWPL R… Basis
## 20 21 20 numeric -0.565 <NA> NA IF NWPL R… Basis
## 21 22 19 numeric -0.295 <NA> NA IF NWPL R… Basis
## 22 22 20 numeric -0.275 <NA> NA IF NWPL R… Basis
## 23 23 19 numeric -0.530 <NA> NA IF NWPL R… Basis
## 24 23 20 numeric -0.510 <NA> NA IF NWPL R… Basis
## # … with 1 more variable: bid_offer <chr>
The same procedure can be mapped to every small multiple.
unpivoted <-
purrr::map_dfr(partitions$cells,
~ .x %>%
behead("NNW", "title") %>%
behead("NNW", "price") %>%
behead("N", "bid_offer")) %>%
select(-data_type, -character, -date)
unpivoted
## # A tibble: 240 x 6
## row col numeric title price bid_offer
## <int> <int> <dbl> <chr> <chr> <chr>
## 1 17 17 1.89 IF NWPL Rocky Mountains Fixed Price BID
## 2 17 18 1.91 IF NWPL Rocky Mountains Fixed Price OFFER
## 3 18 17 2.06 IF NWPL Rocky Mountains Fixed Price BID
## 4 18 18 2.08 IF NWPL Rocky Mountains Fixed Price OFFER
## 5 19 17 2.40 IF NWPL Rocky Mountains Fixed Price BID
## 6 19 18 2.42 IF NWPL Rocky Mountains Fixed Price OFFER
## 7 20 17 2.59 IF NWPL Rocky Mountains Fixed Price BID
## 8 20 18 2.61 IF NWPL Rocky Mountains Fixed Price OFFER
## 9 21 17 2.58 IF NWPL Rocky Mountains Fixed Price BID
## 10 21 18 2.60 IF NWPL Rocky Mountains Fixed Price OFFER
## # … with 230 more rows
So far, only the column headers have been joined, but there are also row headers on the left-hand side of the spreadsheet. The following code incorporates these into the final dataset.
row_headers <-
cells %>%
dplyr::filter(between(row, 17, 56), between(col, 2, 4)) %>%
# Concatenate rows like "Dec-01", "to", "Mar-02"
mutate(character = ifelse(!is.na(character),
character,
format(date, origin="1899-12-30", "%b-%y"))) %>%
select(row, col, character) %>%
nest(-row) %>%
mutate(row_header = map(data,
~ str_trim(paste(.x$character, collapse = " ")))) %>%
unnest(row_header) %>%
mutate(col = 2L) %>%
select(row, row_header)
## Warning: All elements of `...` must be named.
## Did you want `data = c(col, character)`?
unpivoted <- left_join(unpivoted, row_headers, by = "row")
unpivoted
## # A tibble: 240 x 7
## row col numeric title price bid_offer row_header
## <int> <int> <dbl> <chr> <chr> <chr> <chr>
## 1 17 17 1.89 IF NWPL Rocky Mount… Fixed Pri… BID Cash
## 2 17 18 1.91 IF NWPL Rocky Mount… Fixed Pri… OFFER Cash
## 3 18 17 2.06 IF NWPL Rocky Mount… Fixed Pri… BID ROM
## 4 18 18 2.08 IF NWPL Rocky Mount… Fixed Pri… OFFER ROM
## 5 19 17 2.40 IF NWPL Rocky Mount… Fixed Pri… BID Dec-01
## 6 19 18 2.42 IF NWPL Rocky Mount… Fixed Pri… OFFER Dec-01
## 7 20 17 2.59 IF NWPL Rocky Mount… Fixed Pri… BID Dec-01 to Mar-…
## 8 20 18 2.61 IF NWPL Rocky Mount… Fixed Pri… OFFER Dec-01 to Mar-…
## 9 21 17 2.58 IF NWPL Rocky Mount… Fixed Pri… BID Apr-02 to Oct-…
## 10 21 18 2.60 IF NWPL Rocky Mount… Fixed Pri… OFFER Apr-02 to Oct-…
## # … with 230 more rows