Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

id_tbl

Tabular ICU data classes


Description

In order to simplify handling or tabular ICU data, ricu provides two S3 classes, id_tbl and ts_tbl. The two classes essentially consist of a data.table object, alongside some meta data and S3 dispatch is used to enable more natural behavior for some data manipulation tasks. For example, when merging two tables, a default for the by argument can be chosen more sensibly if columns representing patient ID and timestamp information can be identified.

Usage

id_tbl(..., id_vars = 1L)

is_id_tbl(x)

as_id_tbl(x, id_vars = NULL, by_ref = FALSE)

ts_tbl(..., id_vars = 1L, index_var = NULL, interval = NULL)

is_ts_tbl(x)

as_ts_tbl(x, id_vars = NULL, index_var = NULL, interval = NULL, by_ref = FALSE)

validate_tbl(x)

Arguments

...

forwarded to data.table::data.table() or generic consistency

id_vars

Column name(s) to be used as id column(s)

x

Object to query/operate on

by_ref

Logical flag indicating whether to perform the operation by reference

index_var

Column name of the index column

interval

Time series interval length specified as scalar-valued difftime object

Details

The two classes are designed for two often encountered data scenarios:

  • id_tbl objects can be used to represent static (with respect to relevant time scales) patient data such as patient age and such an object is simply a data.table combined with a non-zero length character vector valued attribute marking the columns tracking patient ID information (id_vars). All further columns are considered as data_vars.

  • ts_tbl objects are used for grouped time series data. A data.table object again is augmented by attributes, including a non-zero length character vector identifying patient ID columns (id_vars), a string, tracking the column holding time-stamps (index_var) and a scalar difftime object determining the time-series step size interval. Again, all further columns are treated as data_vars.

Owing to the nested structure of required meta data, ts_tbl inherits from id_tbl. Furthermore, both classes inherit from data.table. As such, data.table reference semantics are available for some operations, indicated by presence of a by_ref argument. At default, value, by_ref is set to FALSE as this is in line with base R behavior at the cost of potentially incurring unnecessary data copies. Some care has to be taken when passing by_ref = TRUE and enabling by reference operations as this can have side effects (see examples).

For instantiating ts_tbl objects, both index_var and interval can be automatically determined if not specified. For the index column, the only requirement is that a single difftime column is present, while for the time step, the minimal difference between two consecutive observations is chosen (and all differences are therefore required to be multiples of the minimum difference).

Upon instantiation, the data might be rearranged: columns are reordered such that ID columns are moved to the front, followed by the index column and a data.table::key() is set on meta columns, causing rows to be sorted accordingly. Moving meta columns to the front is done for reasons of convenience for printing, while setting a key on meta columns is done to improve efficiency of subsequent transformations such as merging or grouped operations. Furthermore, NA values in either ID or index columns are not allowed and therefore corresponding rows are silently removed.

Coercion between id_tbl and ts_tbl by default keeps intersecting attributes fixed and new attributes are by default inferred as for class instantiation. Each class comes with a class-specific implementation of the S3 generic function validate_tbl() which returns TRUE if the object is considered valid or a string outlining the type of validation failure that was encountered. Validity requires

  1. inheriting from data.table and unique column names

  2. for id_tbl that all columns specified by the non-zero length character vector holding onto the id_vars specification are available

  3. for ts_tbl that the string-valued index_var column is available and does not intersect with id_vars and that the index column obeys the specified interval.

Finally, inheritance can be checked by calling is_id_tbl() and is_ts_tbl(). Note that due to ts_tbl inheriting from id_tbl, is_id_tbl() returns TRUE for both id_tbl and ts_tbl objects, while is_ts_tbl() only returns TRUE for ts_tbl objects.

Value

Constructors id_tbl()/ts_tbl(), as well as coercion functions as_id_tbl()/as_ts_tbl() return id_tbl/ts_tbl objects respectively, while inheritance testers is_id_tbl()/is_ts_tbl() return logical flags and validate_tbl() returns either TRUE or a string describing the validation failure.

Relationship to data.table

Both id_tbl and ts_tbl inherit from data.table and as such, functions intended for use with data.table objects can be applied to id_tbl and ts_tbl as well. But there are some caveats: Many functions introduced by data.table are not S3 generic and therefore they would have to be masked in order to retain control over how they operate on objects inheriting form data.table. Take for example the function data.table::setnames(), which changes column names by reference. Using this function, the name of an index column of an id_tbl object can me changed without updating the attribute marking the column as such and thusly leaving the object in an inconsistent state. Instead of masking the function setnames(), an alternative is provided as rename_cols(). In places where it is possible to seamlessly insert the appropriate function (such as base::names<-() or base::colnames<-()) and the responsibility for not using data.table::setnames() in a way that breaks the id_tbl object is left to the user.

Owing to data.table heritage, one of the functions that is often called on id_tbl and ts_tbl objects is base S3 generic [base::[()]. As this function is capable of modifying the object in a way that makes it incompatible with attached meta data, an attempt is made at preserving as much as possible and if all fails, a data.table object is returned instead of an object inheriting form id_tbl. If for example the index column is removed (or modified in a way that makes it incompatible with the interval specification) from a ts_tbl, an id_tbl is returned. If however the ID column is removed the only sensible thing to return is a data.table (see examples).

Examples

tbl <- id_tbl(a = 1:10, b = rnorm(10))
is_id_tbl(tbl)
is_ts_tbl(tbl)

dat <- data.frame(a = 1:10, b = hours(1:10), c = rnorm(10))
tbl <- as_ts_tbl(dat, "a")
is_id_tbl(tbl)
is_ts_tbl(tbl)

tmp <- as_id_tbl(tbl)
is_ts_tbl(tbl)
is_ts_tbl(tmp)

tmp <- as_id_tbl(tbl, by_ref = TRUE)
is_ts_tbl(tbl)
is_ts_tbl(tmp)

tbl <- id_tbl(a = 1:10, b = rnorm(10))
names(tbl) <- c("c", "b")
tbl

tbl <- id_tbl(a = 1:10, b = rnorm(10))
validate_tbl(data.table::setnames(tbl, c("c", "b")))

tbl <- id_tbl(a = 1:10, b = rnorm(10))
validate_tbl(rename_cols(tbl, c("c", "b")))

tbl <- ts_tbl(a = rep(1:2, each = 5), b = hours(rep(1:5, 2)), c = rnorm(10))
tbl[, c("a", "c"), with = FALSE]
tbl[, c("b", "c"), with = FALSE]
tbl[, list(a, b = as.double(b), c)]

ricu

Intensive Care Unit Data with R

v0.1.3
GPL-3
Authors
Nicolas Bennett [aut, cre], Drago Plecko [aut], Ida-Fong Ukor [aut]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.