include_blank_cells = FALSEhad a bug that returned blank cells as an empty row in the
_as dates when the underscore is followed by a date-ish character like
include_blank_cells = FALSEin
maybe_xlsx()is provided for checking whether a file might be in the xlsx format. It is impossible to be sure from the magic number alone, because the magic numbers are either common to all zip files, or common to other Microsoft Office files (e.g. .doc, .ppt).
tidy_xlsx(), which has been deprecated.
xlsx_cells()returns a single data frame of all the cells in scope (the whole workbook, or chosen sheets), rather than a list of separate data frames for each sheet.
xlsx_formats()performs orders of magnitude faster.
xlsx_validation()imports validation rules from cells that restrict data input, such as cells that require a selection from a drop-down list. See the vignette
vignette("data-validation-rules", package = "tidyxl").
xlsx_names()imports defined names (aka named ranges/formulas), which can be used to filter for particular ranges of cells by name. Use
is_range()to filter for ones that are named ranges, and then read joining rules to cells for how to join cell ranges to cell addresses. This will become easier in a future release.
is_range()checks whether a formula is simply ranges of cells.
xlex()tokenises formulas. This is useful for detecting spreadsheet smells like embedded constants and deep nesting. There is a demo Shiny app, and a vignette
vignette("smells", package = "tidyxl"). A vector of Excel function names
excel_functionscan be used to separated built-in functions from custom functions. More experimental features will be implemented in the off-CRAN package lexl before becoming part of tidyxl.
xlsx_cells()$character_formattedis a new column for the in-cell formatting of text (#5). This is for when different parts of text in a single cell have been formatted differently from one another.
is_date_format()checks whether a number format string is a date format. This is useful if a cell formula contains a number formatting string (e.g.
TEXT(45678,"yyyy")), and you need to know that the constant 45678 is a date in order to recover it at full resolution (rather than parsing the character output “2025” as a year).
xlsx_color_theme()and it’s British alias
xlsx_colour_theme()returns the theme colour palette used in a file. This is useful to monitor use of a corporate standard theme.
xlsx_color_standardand it’s British alias
xlsx_colour_standardare data frames of the standard Excel palette (
B2(for more details see issue #7).
tidy_xlsx()has been deprecated in favour of
xlsx_cells(), which returns a data frame of all the cells in the workbook (or in the requested sheets), and
xlsx_formats(), which returns a lookup list of cell formats.
tidy_xlsx()and one of it’s replacments
contenthas been replaced by
is_blank, a logical value indicating whether the cell contains data. Please replace
!is_blankto filter out blank cells (ones with formatting but no value).
formula_typehas been replaced by
is_array, a logical value indicating whether the cell’s formula is an array formula or not. In Excel array formulas are represented visually by being surrounded by curly braces
tidy_xlsx(), theme colours are given by name rather than by number, e.g.
"[Cyan]0%") are no longer mis-detect as dates (#21).
is_date_format()tests whether a number format is a date format.
xlsx_formats()is now thoroughly tested, and several relatively minor bugs fixed. For example,
"none"when a pattern fill has not been set, and escape-backslashes are consistently omitted from numFmts.
"1"to support files created by the
"Normal") instead of by index integer. All the vectors under
x$formats$styleare named according to the style names.
x$data$sheet$style_format_idhas been renamed to
x$data$sheet$style_formatand its type changed from integer (index into style formats) to character (still an index, but looking up the named vectors by name). There are examples in the README and vignette.