Creating nests without tidyr

How to construct nested data frames from scratch, without tidyr

Duncan Garmonsway
June 22, 2016

Unless you begin with an unnested data frame, creating a nested data frame needs a little trick. Here it is.

Nested data frames

The tidyr package has a handy function for nesting data frames. Hadley Wickham describes it thus:

In a grouped data frame, you have one row per observation, and additional metadata define the groups. In a nested data frame, you have one row per group, and the individual observations are stored in a column that is a list of data frames. This is a useful structure when you have lists of other objects (like models) with one element per group.

Here’s a small example:


library(dplyr)
library(tidyr)

iris_nested <-
  iris %>%
  group_by(Species) %>%
  sample_n(2) %>%
  nest
iris_nested
## # A tibble: 3 x 2
##   Species    data            
##   <fct>      <list>          
## 1 setosa     <tibble [2 × 4]>
## 2 versicolor <tibble [2 × 4]>
## 3 virginica  <tibble [2 × 4]>
iris_nested %>% str
## Classes 'tbl_df', 'tbl' and 'data.frame':    3 obs. of  2 variables:
##  $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
##  $ data   :List of 3
##   ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of  4 variables:
##   .. ..$ Sepal.Length: num  5.1 5.2
##   .. ..$ Sepal.Width : num  3.8 3.4
##   .. ..$ Petal.Length: num  1.6 1.4
##   .. ..$ Petal.Width : num  0.2 0.2
##   ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of  4 variables:
##   .. ..$ Sepal.Length: num  5.7 5.6
##   .. ..$ Sepal.Width : num  3 3
##   .. ..$ Petal.Length: num  4.2 4.1
##   .. ..$ Petal.Width : num  1.2 1.3
##   ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of  4 variables:
##   .. ..$ Sepal.Length: num  7.2 6.3
##   .. ..$ Sepal.Width : num  3 2.5
##   .. ..$ Petal.Length: num  5.8 5
##   .. ..$ Petal.Width : num  1.6 1.9

Interestingly, the nested column isn’t a vector like ordinary columns; it’s a list. Actually lists are just one kind of vector – the non-atomic kind (composed of parts, i.e vectors and other lists), whereas integer/character/etc. vectors are the atomic kind (not composed of parts). This is nicely explained in Advanced R by Hadley Wickham.


is.atomic(vector(mode = "character", length = 2))
## [1] TRUE
is.atomic(vector(mode = "list", length = 2))
## [1] FALSE

Please say it’s a data frame

Data frames, which are a list of vectors, handle list-type columns perfectly well, but data-frame-construction functions don’t. So when I tried to create one from scratch (rather than by nesting an existing data frame as above), I lost a lot of time mucking about with data.frame() and the like.


data.frame(X1 = 1:2, X2 = list(iris, mtcars))
## Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 150, 32
as.data.frame(list(X1 = 1:2, X2 = list(iris, mtcars)))
## Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 150, 32

It’s a data frame because I say so

The answer is to simply tell R that the data structure is a data frame by setting its class and giving it a “row.names” attribute.


x <- list(X1 = 1:2, X2 = list(iris[1:2, 1:2], iris[3:5, 1:4]))
structure(x, class = c("tbl_df", "data.frame"), row.names = 1:2)
##   X1                                                         X2
## 1  1                                         5.1, 4.9, 3.5, 3.0
## 2  2 4.7, 4.6, 5.0, 3.2, 3.1, 3.6, 1.3, 1.5, 1.4, 0.2, 0.2, 0.2

Invading the nest

Accessing the nested column by the usual subsetting operators, $, [ and [[, is a little clumsy.


x$X2 # Returns the list of data frames
## [[1]]
##   Sepal.Length Sepal.Width
## 1          5.1         3.5
## 2          4.9         3.0
## 
## [[2]]
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3          4.7         3.2          1.3         0.2
## 4          4.6         3.1          1.5         0.2
## 5          5.0         3.6          1.4         0.2
x$X2[2] # Returns the second data frame, wrapped in a list
## [[1]]
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3          4.7         3.2          1.3         0.2
## 4          4.6         3.1          1.5         0.2
## 5          5.0         3.6          1.4         0.2

x$X2[[2]] # Returns the second data frame -- probably what you want
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3          4.7         3.2          1.3         0.2
## 4          4.6         3.1          1.5         0.2
## 5          5.0         3.6          1.4         0.2
x[["X2"]][[2]] # Returns the second data frame -- also probably what you want
##   Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3          4.7         3.2          1.3         0.2
## 4          4.6         3.1          1.5         0.2
## 5          5.0         3.6          1.4         0.2

[Edit: The code below no longer works in the latest version of R.]


x[2, "X2"] # Returns the second data frame wrapped in another data frame
## Error in x[2, "X2"]: incorrect number of dimensions
x[2, "X2", drop = TRUE] # Same -- ignores "drop"
## Error in x[2, "X2", drop = TRUE]: incorrect number of dimensions
x[2, "X2"][1, "X2"][1, "X2"] # The `[` goes around in circles
## Error in x[2, "X2"]: incorrect number of dimensions

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Reuse

Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at https://github.com/nacnudus/duncangarmonsway, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

Citation

For attribution, please cite this work as

Garmonsway (2016, June 22). Duncan Garmonsway: Creating nests without tidyr. Retrieved from https://nacnudus.github.io/duncangarmonsway/posts/2016-06-22-nests/

BibTeX citation

@misc{garmonsway2016creating,
  author = {Garmonsway, Duncan},
  title = {Duncan Garmonsway: Creating nests without tidyr},
  url = {https://nacnudus.github.io/duncangarmonsway/posts/2016-06-22-nests/},
  year = {2016}
}