9.4 Toronto Transit Commission
This table shows the number of trips recorded on the Toronto Transit Commission per year, by type of ticket, person, vehicle, and weeday/weekend/holiday.
Sharla Gelfand’s annotated screenshot explains the structure, and see her excellent blog post for how she wrangled it with standard tidyverse tools. I show here an alternative method with tidyxl and unpivotr.
Download the file. Original source.
9.4.1 The full code listing
cells <-
xlsx_cells(smungs::toronto_transit) %>%
dplyr::filter(!is_blank, row >= 6)
fare <-
cells %>%
dplyr::filter(col == 2,
!str_detect(character, "^ "),
!str_detect(character, "TOTAL")) %>%
select(row, col, fare = character)
cells %>%
behead("up", "year", formatters = list(character = str_trim)) %>%
behead("left-up", "context") %>%
behead("left", "media", formatters = list(character = str_trim)) %>%
enhead(fare, "left-up") %>%
dplyr::filter(!str_detect(media, "TOTAL")) %>%
separate(year, c("year", "note"), sep = " ", fill = "right") %>%
select(year, context, fare, media, count = numeric)
## # A tibble: 1,188 x 5
## year context fare media count
## <chr> <chr> <chr> <chr> <dbl>
## 1 2017 WHO ADULT TOKENS 76106
## 2 2016 WHO ADULT TOKENS 102073
## 3 2015 WHO ADULT TOKENS 110945
## 4 2014 WHO ADULT TOKENS 111157
## 5 2013 WHO ADULT TOKENS 112360
## 6 2012 WHO ADULT TOKENS 117962
## 7 2011 WHO ADULT TOKENS 124748
## 8 2010 WHO ADULT TOKENS 120366
## 9 2009 WHO ADULT TOKENS 114686
## 10 2008 WHO ADULT TOKENS 94210
## # … with 1,178 more rows
9.4.2 Step by step
Although the annotations point out that there are really three separate tables
(WHO
, WHERE
and WHEN
), they can be imported as one.
Column 2 has two levels of headers in it: the fare in bold (“ADULT”, “BUS”, etc.), and the media used to pay for it indented by a few spaces (“TOKENS”, “WEEKLY PASS”, etc.).
Because behead()
can’t distinguish between different levels of headers in the
same column, we need to put the bold fare headers into a separate variable on
their own, and enhead()
them back onto the rest of the table later.
Unfortunately the fare headers in the “WHEN” context aren’t bold, so rather than filter for bold headers, instead we filter for headers that aren’t indented by spaces. We also filter out any “TOTAL” headers.
cells <-
xlsx_cells(smungs::toronto_transit) %>%
dplyr::filter(!is_blank, row >= 6)
fare <-
cells %>%
dplyr::filter(col == 2,
!str_detect(character, "^ "), # Filter out indented headers
!str_detect(character, "TOTAL")) %>% # Filteroout totals
select(row, col, fare = character)
fare
## # A tibble: 7 x 3
## row col fare
## <int> <int> <chr>
## 1 7 2 ADULT
## 2 21 2 SENIOR/STUDENT
## 3 31 2 CHILDREN
## 4 43 2 BUS
## 5 46 2 RAIL
## 6 53 2 WEEKDAY
## 7 54 2 WEEKEND/HOLIDAY
ttc <-
cells %>%
behead("up", "year") %>%
behead("left-up", "context") %>%
behead("left", "media") %>%
enhead(fare, "left-up") %>%
dplyr::filter(!str_detect(media, "TOTAL")) %>%
select(year, context, fare, media, count = numeric)
ttc
## # A tibble: 1,188 x 5
## year context fare media count
## <chr> <chr> <chr> <chr> <dbl>
## 1 "2017" WHO ADULT " TOKENS" 76106
## 2 "2016" WHO ADULT " TOKENS" 102073
## 3 " 2015 *" WHO ADULT " TOKENS" 110945
## 4 "2014" WHO ADULT " TOKENS" 111157
## 5 "2013" WHO ADULT " TOKENS" 112360
## 6 "2012" WHO ADULT " TOKENS" 117962
## 7 "2011" WHO ADULT " TOKENS" 124748
## 8 "2010" WHO ADULT " TOKENS" 120366
## 9 "2009" WHO ADULT " TOKENS" 114686
## 10 "2008" WHO ADULT " TOKENS" 94210
## # … with 1,178 more rows
There’s a bit more cosmetic cleaning to do. The indentation can be trimmed from
the media
and the year
headers, and the asterisk removed from the year 2015 *
.
ttc %>%
mutate(year = str_trim(year), media = str_trim(media)) %>%
separate(year, c("year", "note"), sep = " ", fill = "right") %>%
select(-note)
## # A tibble: 1,188 x 5
## year context fare media count
## <chr> <chr> <chr> <chr> <dbl>
## 1 2017 WHO ADULT TOKENS 76106
## 2 2016 WHO ADULT TOKENS 102073
## 3 2015 WHO ADULT TOKENS 110945
## 4 2014 WHO ADULT TOKENS 111157
## 5 2013 WHO ADULT TOKENS 112360
## 6 2012 WHO ADULT TOKENS 117962
## 7 2011 WHO ADULT TOKENS 124748
## 8 2010 WHO ADULT TOKENS 120366
## 9 2009 WHO ADULT TOKENS 114686
## 10 2008 WHO ADULT TOKENS 94210
## # … with 1,178 more rows