How to construct nested data frames from scratch, without tidyr
Unless you begin with an unnested data frame, creating a nested data frame needs a little trick. Here it is.
The tidyr
package has a handy function for nesting data frames. Hadley Wickham describes it thus:
In a grouped data frame, you have one row per observation, and additional metadata define the groups. In a nested data frame, you have one row per group, and the individual observations are stored in a column that is a list of data frames. This is a useful structure when you have lists of other objects (like models) with one element per group.
Here’s a small example:
iris_nested <-
iris %>%
group_by(Species) %>%
sample_n(2) %>%
## # A tibble: 3 x 2
## Species data
## <fct> <list>
## 1 setosa <tibble [2 × 4]>
## 2 versicolor <tibble [2 × 4]>
## 3 virginica <tibble [2 × 4]>
iris_nested %>% str
## Classes 'tbl_df', 'tbl' and 'data.frame': 3 obs. of 2 variables:
## $ Species: Factor w/ 3 levels "setosa","versicolor",..: 1 2 3
## $ data :List of 3
## ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 4 variables:
## .. ..$ Sepal.Length: num 5.1 5.2
## .. ..$ Sepal.Width : num 3.8 3.4
## .. ..$ Petal.Length: num 1.6 1.4
## .. ..$ Petal.Width : num 0.2 0.2
## ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 4 variables:
## .. ..$ Sepal.Length: num 5.7 5.6
## .. ..$ Sepal.Width : num 3 3
## .. ..$ Petal.Length: num 4.2 4.1
## .. ..$ Petal.Width : num 1.2 1.3
## ..$ :Classes 'tbl_df', 'tbl' and 'data.frame': 2 obs. of 4 variables:
## .. ..$ Sepal.Length: num 7.2 6.3
## .. ..$ Sepal.Width : num 3 2.5
## .. ..$ Petal.Length: num 5.8 5
## .. ..$ Petal.Width : num 1.6 1.9
Interestingly, the nested column isn’t a vector like ordinary columns; it’s a list. Actually lists are just one kind of vector – the non-atomic kind (composed of parts, i.e vectors and other lists), whereas integer/character/etc. vectors are the atomic kind (not composed of parts). This is nicely explained in Advanced R by Hadley Wickham.
is.atomic(vector(mode = "character", length = 2))
## [1] TRUE
is.atomic(vector(mode = "list", length = 2))
## [1] FALSE
Data frames, which are a list of vectors, handle list-type columns perfectly well, but data-frame-construction functions don’t. So when I tried to create one from scratch (rather than by nesting an existing data frame as above), I lost a lot of time mucking about with data.frame()
and the like.
data.frame(X1 = 1:2, X2 = list(iris, mtcars))
## Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 150, 32 = 1:2, X2 = list(iris, mtcars)))
## Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : arguments imply differing number of rows: 150, 32
The answer is to simply tell R that the data structure is a data frame by setting its class and giving it a “row.names” attribute.
x <- list(X1 = 1:2, X2 = list(iris[1:2, 1:2], iris[3:5, 1:4]))
structure(x, class = c("tbl_df", "data.frame"), row.names = 1:2)
## X1 X2
## 1 1 5.1, 4.9, 3.5, 3.0
## 2 2 4.7, 4.6, 5.0, 3.2, 3.1, 3.6, 1.3, 1.5, 1.4, 0.2, 0.2, 0.2
Accessing the nested column by the usual subsetting operators, $
, [
and [[
, is a little clumsy.
x$X2 # Returns the list of data frames
## [[1]]
## Sepal.Length Sepal.Width
## 1 5.1 3.5
## 2 4.9 3.0
## [[2]]
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
x$X2[2] # Returns the second data frame, wrapped in a list
## [[1]]
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
x$X2[[2]] # Returns the second data frame -- probably what you want
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
x[["X2"]][[2]] # Returns the second data frame -- also probably what you want
## Sepal.Length Sepal.Width Petal.Length Petal.Width
## 3 4.7 3.2 1.3 0.2
## 4 4.6 3.1 1.5 0.2
## 5 5.0 3.6 1.4 0.2
[Edit: The code below no longer works in the latest version of R.]
x[2, "X2"] # Returns the second data frame wrapped in another data frame
## Error in x[2, "X2"]: incorrect number of dimensions
x[2, "X2", drop = TRUE] # Same -- ignores "drop"
## Error in x[2, "X2", drop = TRUE]: incorrect number of dimensions
x[2, "X2"][1, "X2"][1, "X2"] # The `[` goes around in circles
## Error in x[2, "X2"]: incorrect number of dimensions
If you see mistakes or want to suggest changes, please create an issue on the source repository.
Text and figures are licensed under Creative Commons Attribution CC BY 4.0. Source code is available at, unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".
For attribution, please cite this work as
Garmonsway (2016, June 22). Duncan Garmonsway: Creating nests without tidyr. Retrieved from
BibTeX citation
@misc{garmonsway2016creating, author = {Garmonsway, Duncan}, title = {Duncan Garmonsway: Creating nests without tidyr}, url = {}, year = {2016} }