Pack cell values from separate columns per data type into one list-column
Usage
pack(
cells,
types = data_type,
name = "value",
drop_types = TRUE,
drop_type_cols = TRUE
)
unpack(cells, values = value, name = "data_type", drop_packed = TRUE)Arguments
- cells
A data frame of cells, one row per cell. For
pack()it must have a column that names, for each cell/row, which of the other columns the value is in. Forunpack()it must have a list-column of cell values, where each element is named according to the data type of the value.- types
For
pack(), the name of the column that that names, for each cell/row, which of the other columns the value is in.- name
A string. For
pack(), the name to give the new list-column of values. Forunpack(), the name to give the new column that will name, for each cell, which of the other columns the value is in.- drop_types
For
pack(), whether to drop the column named bytypes.- drop_type_cols
For
pack(), whether to drop the original columns of cell values.- values
For
unpack(), the name of the list-column of cell values.- drop_packed
For
unpack(), whether to drop the column named byvalues.
Details
When cells are represented by rows of a data frame, the values of the cells
will be in different columns according to their data type. For example, the
value of a cell containing text will be in a column called chr (or
character if it came via tidyxl). A column called data_type names, for
each cell, which column its value is in.
pack() rearranges the cell values in a different way, so that they are all
in one column, by
taking each cell value, from whichever column.
making it an element of a list.
naming each element according to the column it came from.
making the list into a new list-column of the original data frame.
By default, the original columns are dropped, and so is the data_type
column.
unpack() is the complement.
This can be useful for dropping all columns of cells except the ones that
contain data. For example, tidyxl::xlsx_cells() returns a very wide data
frame, and to make it narrow you might do:
But what if you don't know in advance that the data types you need are
character, numeric and date? You might also need logical and
error.
Instead, pack() all the data types into a single column, select it, and
then unpack.
Examples
# A normal data frame
w <- data.frame(foo = 1:2,
bar = c("a", "b"),
stringsAsFactors = FALSE)
w
#> foo bar
#> 1 1 a
#> 2 2 b
# The same data, represented by one row per cell, with integer values in the
# `int` column and character values in the `chr` column.
x <- as_cells(w)
x
#> # A tibble: 4 × 5
#> row col data_type chr int
#> <int> <int> <chr> <chr> <int>
#> 1 1 1 int NA 1
#> 2 2 1 int NA 2
#> 3 1 2 chr a NA
#> 4 2 2 chr b NA
# pack() and unpack() are complements
pack(x)
#> # A tibble: 4 × 3
#> row col value
#> <int> <int> <named list>
#> 1 1 1 <int [1]>
#> 2 2 1 <int [1]>
#> 3 1 2 <chr [1]>
#> 4 2 2 <chr [1]>
unpack(pack(x))
#> # A tibble: 4 × 5
#> row col data_type chr int
#> <int> <int> <chr> <chr> <int>
#> 1 1 1 int NA 1
#> 2 2 1 int NA 2
#> 3 1 2 chr a NA
#> 4 2 2 chr b NA
# Drop non-data columns from a wide data frame of cells from tidyxl
if (require(tidyxl)) {
cells <- tidyxl::xlsx_cells(system.file("extdata", "purpose.xlsx", package = "unpivotr"))
cells
pack(cells) %>%
dplyr::select(row, col, value) %>%
unpack()
}
#> Loading required package: tidyxl
#> # A tibble: 1,721 × 7
#> row col data_type blank character error numeric
#> <int> <int> <chr> <lgl> <chr> <chr> <dbl>
#> 1 1 1 character NA The data is 'Sense of purpose by h… NA NA
#> 2 2 1 character NA It can be found in the section 'Pe… NA NA
#> 3 3 1 character NA The description provided by Statis… NA NA
#> 4 5 1 character NA The 2014 New Zealand General Socia… NA NA
#> 5 6 1 character NA It provides new and redeveloped da… NA NA
#> 6 7 1 character NA Symbols used in this table: NA NA
#> 7 8 1 character NA S - Data has been suppressed. NA NA
#> 8 9 1 character NA * - Relative sampling error of 50 … NA NA
#> 9 10 1 character NA For more tables using the NZGSS 20… NA NA
#> 10 11 1 character NA Data quality NA NA
#> # ℹ 1,711 more rows