Changelog
Source:NEWS.md
tidyxl 1.0.10
- Fixed a bug in the support for formatted strings, which sometimes weren’t being imported (#92)
tidyxl 1.0.8
CRAN release: 2022-09-01
- tidyxl is now licensed as MIT (#63).
- Fixed a unicode bug (#64)
- Fixed two crashes (#71, #73)
- Fixed a rare bug in number formats (#52, #75, @da11an)
- New feature: Import outline levels of rows and columns (#77, #78)
- New feature: Import raw cell value into
content
column (#80) - Compatibility with upcoming Rcpp release (#86, @Enchufa2)
tidyxl 1.0.7
CRAN release: 2020-11-16
- Update namespace in C++ code for compatibility with the latest version of PEGTL, wrapped in the {piton} package.
tidyxl 1.0.4
CRAN release: 2019-01-02
- Compatibility: imports cell data validation rules from files created by Office 365 (#46).
tidyxl 1.0.3
CRAN release: 2018-06-21
- Noticeably faster for large files.
- Omission of blank cells with
include_blank_cells = FALSE
had a bug that returned blank cells as an empty row in thexlsx_cells()
data frame.
tidyxl 1.0.2
- Correctly constructs formulas where references are preceded by operators, e.g.
-F10
(#26 @cablegui). - No longer misinterprets date formats that use underscores
_
as dates when the underscore is followed by a date-ish character likeM
(#24). - Optionally omits blank cells with
include_blank_cells = FALSE
inxlsx_cells()
(#25). - Doesn’t crash reading files with certain colour themes (#34 @dan-fahey).
tidyxl 1.0.1
CRAN release: 2018-03-26
- Filetype checking is based on the file signature or “magic number”, rather than the filename extension. A new function
maybe_xlsx()
is provided for checking whether a file might be in the xlsx format. It is impossible to be sure from the magic number alone, because the magic numbers are either common to all zip files, or common to other Microsoft Office files (e.g. .doc, .ppt). - Fixed a CRAN warning.
tidyxl 1.0.0
CRAN release: 2017-11-25
New features
-
xlsx_cells()
andxlsx_formats()
replacetidy_xlsx()
, which has been deprecated.xlsx_cells()
returns a single data frame of all the cells in scope (the whole workbook, or chosen sheets), rather than a list of separate data frames for each sheet.xlsx_formats()
performs orders of magnitude faster. -
xlsx_validation()
imports validation rules from cells that restrict data input, such as cells that require a selection from a drop-down list. See the vignettevignette("data-validation-rules", package = "tidyxl")
. -
xlsx_names()
imports defined names (aka named ranges/formulas), which can be used to filter for particular ranges of cells by name. Useis_range()
to filter for ones that are named ranges, and then read joining rules to cells for how to join cell ranges to cell addresses. This will become easier in a future release. -
is_range()
checks whether a formula is simply ranges of cells. -
xlex()
tokenises formulas. This is useful for detecting spreadsheet smells like embedded constants and deep nesting. There is a demo Shiny app, and a vignettevignette("smells", package = "tidyxl")
. A vector of Excel function namesexcel_functions
can be used to separated built-in functions from custom functions. More experimental features will be implemented in the off-CRAN package lexl before becoming part of tidyxl. -
xlsx_cells()$character_formatted
is a new column for the in-cell formatting of text (#5). This is for when different parts of text in a single cell have been formatted differently from one another. -
is_date_format()
checks whether a number format string is a date format. This is useful if a cell formula contains a number formatting string (e.g.TEXT(45678,"yyyy")
), and you need to know that the constant 45678 is a date in order to recover it at full resolution (rather than parsing the character output “2025” as a year). -
xlsx_color_theme()
and it’s British aliasxlsx_colour_theme()
returns the theme colour palette used in a file. This is useful to monitor use of a corporate standard theme. -
xlsx_color_standard
and it’s British aliasxlsx_colour_standard
are data frames of the standard Excel palette (red
,blue
, etc.). - Shared formulas are propogated to all the cells that use the same formula definition. Relative cell references are handled, so that the formula
=A1*2
in cellB1
becomes=A2*2
in cellB2
(for more details see issue #7). - Formatting of alignment and cell protection is returned (#20).
Breaking changes and deprecations
-
tidy_xlsx()
has been deprecated in favour ofxlsx_cells()
, which returns a data frame of all the cells in the workbook (or in the requested sheets), andxlsx_formats()
, which returns a lookup list of cell formats. - In
tidy_xlsx()
and one of it’s replacmentsxlsx_cells()
- the column
content
has been replaced byis_blank
, a logical value indicating whether the cell contains data. Please replace!is.na(content)
with!is_blank
to filter out blank cells (ones with formatting but no value). - the column
formula_type
has been replaced byis_array
, a logical value indicating whether the cell’s formula is an array formula or not. In Excel array formulas are represented visually by being surrounded by curly braces{}
. - The order of columns has been changed so that the more useful columns are visible in narrow consoles.
- the column
- in
xlsx_formats()
andtidy_xlsx()
, theme colours are given by name rather than by number, e.g."accent6"
instead of4
.
Minor fixes and improvements
- Certain unusual custom number formats that specify colours (e.g.
"[Cyan]0%"
) are no longer mis-detect as dates (#21).is_date_format()
tests whether a number format is a date format. -
xlsx_formats()
is now thoroughly tested, and several relatively minor bugs fixed. For example,xlsx_formats(path)$local$fill$patternFill$patternType
consistently returnsNA
and never"none"
when a pattern fill has not been set, and escape-backslashes are consistently omitted from numFmts.
New dependency
xlex()
, is_range()
and the handling of relative references in shared formulas requires a dependency on the piton package, which wraps the PEGTL C++ parser generator.
tidyxl 0.2.3
CRAN release: 2017-07-08
- Imports dates on or before 1900-02-28 dates correctly, and only warns on the impossible date 1900-02-29, returning NA (following readxl).
- Fixes subsecond rounding (following readxl (fixes #14))
- Imports styles correctly from LibreOffice files (interprets ‘true’ and ‘false’ as booleans, as well as the 0 and 1 used by Microsoft Excel, and defaults to ‘true’ when not present, e.g. applyFill)
- Fixes a bug that caused some LibreOffice files to crash R, when styles were declared with gaps in the sequence of xfIds.
- Imports comments attached to blank cells (fixes #10)
- Includes the sheet and cell address in warnings of impossible datetimes from 1900-02-29.
tidyxl 0.2.1.9000
- Checks the value of the
date1904
attribute for"false"
or"1"
to support files created by theopenxlsx
package (#8). - Fixed a bug that only imported the first line of multiline comments (#9).
- Encodes cell and comment text as UTF-8 (#12).
- Finds worksheets more reliably in files not created by Excel (part of #13).
- Falls back to default styles when none defined (#13).
- Imports dates with greater precision (part of #14).
- Fixed the order of worksheets (#15)
tidyxl 0.2.1
CRAN release: 2017-01-29
- Fixed a major bug: dates were parsed incorrectly because the offsets for the 1900 and 1904 systems were the wrong way around.
- Added a warning when dates suffer from the Excel off-by-one bug.
- Fixed a bug that misinterpreted some number formats as dates (confused by the ‘d’ in ‘[Red]’, which looks like a ‘d’ in ‘d/m/y’.)
- Added support for xlsx files created by Gnumeric (a single, unnamed cell formatting style).
- Fixed the checkUserInterrupt to work every 1000th cell.
- Added many tests.
- Removed lots of redundant code.
tidyxl 0.2.0
- Breaking change: Look up style formats by style name (e.g.
"Normal"
) instead of by index integer. All the vectors underx$formats$style
are named according to the style names.x$data$sheet$style_format_id
has been renamed tox$data$sheet$style_format
and its type changed from integer (index into style formats) to character (still an index, but looking up the named vectors by name). There are examples in the README and vignette. - Simplified some variable names in the C++ code.