9.5 Ground water

Tweet by Beck Frydenborg about tidying data with R

If the cells containing U didn’t exist, then this spreadsheet would be a textbook example of unpivoting a pivot table. There are two rows of column headers, as well as two columns of row headers, so you would use behead() for each header.

Download the file. Synthesised from the original tweet.

x <-
  xlsx_cells(smungs::groundwater) %>%
  dplyr::filter(!is_blank) %>%
  select(row, col, data_type, character, numeric) %>%
  behead("up-left", "sample-type") %>%
  behead("up-left", "site") %>%
  behead("left", "parameter") %>%
  behead("left", "unit")
x
## # A tibble: 17 x 9
##      row   col data_type character numeric `sample-type` site  parameter            unit 
##    <int> <int> <chr>     <chr>       <dbl> <chr>         <chr> <chr>                <chr>
##  1     4     3 numeric   <NA>        3.2   ground water  A     Nitrogen, Kjeldahl   mn/L 
##  2     6     3 numeric   <NA>        0.025 ground water  A     Nitrate Nitrite as N mg/L 
##  3     6     4 character U          NA     ground water  A     Nitrate Nitrite as N mg/L 
##  4     8     3 numeric   <NA>        0.04  ground water  A     Phosphorus as P      mg/L 
##  5     4     5 numeric   <NA>        1.2   ground water  B     Nitrogen, Kjeldahl   mn/L 
##  6     6     5 numeric   <NA>        0.025 ground water  B     Nitrate Nitrite as N mg/L 
##  7     6     6 character U          NA     ground water  B     Nitrate Nitrite as N mg/L 
##  8     8     5 numeric   <NA>        0.17  ground water  B     Phosphorus as P      mg/L 
##  9     4     7 numeric   <NA>        0.5   ground water  C     Nitrogen, Kjeldahl   mn/L 
## 10     6     7 numeric   <NA>        0.025 ground water  C     Nitrate Nitrite as N mg/L 
## 11     6     8 character U          NA     ground water  C     Nitrate Nitrite as N mg/L 
## 12     8     7 numeric   <NA>        0.062 ground water  C     Phosphorus as P      mg/L 
## 13     4     9 numeric   <NA>        0.4   ground water  D     Nitrogen, Kjeldahl   mn/L 
## 14     6     9 numeric   <NA>        0.025 ground water  D     Nitrate Nitrite as N mg/L 
## 15     6    10 character U          NA     ground water  D     Nitrate Nitrite as N mg/L 
## 16     8     9 numeric   <NA>        0.04  ground water  D     Phosphorus as P      mg/L 
## 17     8    10 character J3         NA     ground water  D     Phosphorus as P      mg/L

So what to do about the U cells? We don’t know what they mean, but perhaps they are some kind of flag, to inform the interpretation of the numbers. If that’s the case, then they should appear in the same row of the final data frame as the numbers.

Something like tidyr::spread() would work, except that instead of spreading the values in just one column, we need to spread the values in both the character and numeric columns, depending on the value in the data_type column. This is what spatter() is for.

x %>%
  select(-col) %>%
  spatter(data_type) %>%
  select(-row)
## # A tibble: 12 x 6
##    `sample-type` site  parameter            unit  character numeric
##    <chr>         <chr> <chr>                <chr> <chr>       <dbl>
##  1 ground water  A     Nitrogen, Kjeldahl   mn/L  <NA>        3.2  
##  2 ground water  B     Nitrogen, Kjeldahl   mn/L  <NA>        1.2  
##  3 ground water  C     Nitrogen, Kjeldahl   mn/L  <NA>        0.5  
##  4 ground water  D     Nitrogen, Kjeldahl   mn/L  <NA>        0.4  
##  5 ground water  A     Nitrate Nitrite as N mg/L  U           0.025
##  6 ground water  B     Nitrate Nitrite as N mg/L  U           0.025
##  7 ground water  C     Nitrate Nitrite as N mg/L  U           0.025
##  8 ground water  D     Nitrate Nitrite as N mg/L  U           0.025
##  9 ground water  A     Phosphorus as P      mg/L  <NA>        0.04 
## 10 ground water  B     Phosphorus as P      mg/L  <NA>        0.17 
## 11 ground water  C     Phosphorus as P      mg/L  <NA>        0.062
## 12 ground water  D     Phosphorus as P      mg/L  J3          0.04

Compare that with the results of spread(), which can only spread one value column at a time.

x %>%
  select(-col) %>%
  spread(data_type, character)
## # A tibble: 17 x 7
##      row `sample-type` site  parameter            unit  character numeric
##    <int> <chr>         <chr> <chr>                <chr> <chr>     <chr>  
##  1     4 ground water  D     Nitrogen, Kjeldahl   mn/L  <NA>      <NA>   
##  2     4 ground water  C     Nitrogen, Kjeldahl   mn/L  <NA>      <NA>   
##  3     4 ground water  B     Nitrogen, Kjeldahl   mn/L  <NA>      <NA>   
##  4     4 ground water  A     Nitrogen, Kjeldahl   mn/L  <NA>      <NA>   
##  5     6 ground water  A     Nitrate Nitrite as N mg/L  <NA>      <NA>   
##  6     6 ground water  B     Nitrate Nitrite as N mg/L  <NA>      <NA>   
##  7     6 ground water  C     Nitrate Nitrite as N mg/L  <NA>      <NA>   
##  8     6 ground water  D     Nitrate Nitrite as N mg/L  <NA>      <NA>   
##  9     6 ground water  A     Nitrate Nitrite as N mg/L  U         <NA>   
## 10     6 ground water  B     Nitrate Nitrite as N mg/L  U         <NA>   
## 11     6 ground water  C     Nitrate Nitrite as N mg/L  U         <NA>   
## 12     6 ground water  D     Nitrate Nitrite as N mg/L  U         <NA>   
## 13     8 ground water  A     Phosphorus as P      mg/L  <NA>      <NA>   
## 14     8 ground water  D     Phosphorus as P      mg/L  <NA>      <NA>   
## 15     8 ground water  C     Phosphorus as P      mg/L  <NA>      <NA>   
## 16     8 ground water  B     Phosphorus as P      mg/L  <NA>      <NA>   
## 17     8 ground water  D     Phosphorus as P      mg/L  J3        <NA>
x %>%
  select(-col) %>%
  spread(data_type, numeric)
## # A tibble: 17 x 7
##      row `sample-type` site  parameter            unit  character numeric
##    <int> <chr>         <chr> <chr>                <chr>     <dbl>   <dbl>
##  1     4 ground water  A     Nitrogen, Kjeldahl   mn/L         NA   3.2  
##  2     4 ground water  B     Nitrogen, Kjeldahl   mn/L         NA   1.2  
##  3     4 ground water  C     Nitrogen, Kjeldahl   mn/L         NA   0.5  
##  4     4 ground water  D     Nitrogen, Kjeldahl   mn/L         NA   0.4  
##  5     6 ground water  A     Nitrate Nitrite as N mg/L         NA  NA    
##  6     6 ground water  B     Nitrate Nitrite as N mg/L         NA  NA    
##  7     6 ground water  C     Nitrate Nitrite as N mg/L         NA  NA    
##  8     6 ground water  D     Nitrate Nitrite as N mg/L         NA  NA    
##  9     6 ground water  A     Nitrate Nitrite as N mg/L         NA   0.025
## 10     6 ground water  B     Nitrate Nitrite as N mg/L         NA   0.025
## 11     6 ground water  C     Nitrate Nitrite as N mg/L         NA   0.025
## 12     6 ground water  D     Nitrate Nitrite as N mg/L         NA   0.025
## 13     8 ground water  D     Phosphorus as P      mg/L         NA  NA    
## 14     8 ground water  A     Phosphorus as P      mg/L         NA   0.04 
## 15     8 ground water  B     Phosphorus as P      mg/L         NA   0.17 
## 16     8 ground water  C     Phosphorus as P      mg/L         NA   0.062
## 17     8 ground water  D     Phosphorus as P      mg/L         NA   0.04