Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

attach_src

Data attach utilities


Description

Making a dataset available to ricu consists of 3 steps: downloading (download_src()), importing (import_src()) and attaching (attach_src()). While downloading and importing are one-time procedures, attaching of the dataset is repeated every time the package is loaded. Briefly, downloading loads the raw dataset from the internet (most likely in .csv format), importing consists of some preprocessing to make the data available more efficiently and attaching sets up the data for use by the package.

Usage

attach_src(x, ...)

## S3 method for class 'src_cfg'
attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...)

## S3 method for class 'character'
attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...)

setup_src_env(x, env, ...)

## S3 method for class 'src_cfg'
setup_src_env(x, env, data_dir = src_data_dir(x), ...)

new_src_tbl(files, col_cfg, tbl_cfg, prefix, src_env)

as_src_tbl(x, ...)

new_src_env(x, env = new.env(parent = data_env()))

as_src_env(x)

Arguments

x

Data source to attach

...

Forwarded to further calls to attach_src()

assign_env

Environment in which the data source will become available

data_dir

Directory used to look for fst::fst() files; NULL calls data_dir() using the source name as subdir argument

env

Environment where data proxy objects are created

files

File names of fst files that will be used to create a prt object (see also prt::new_prt())

col_cfg

Coerced to col_cfg by calling as_col_cfg()

tbl_cfg

Coerced to tbl_cfg by calling as_tbl_cfg()

prefix

Character vector valued data source name(s) (used as class prefix)

src_env

The data source environment (as src_env object)

Details

Attaching a dataset sets up two types of S3 classes: a single src_env object, containing as many src_tbl objects as tables are associated with the dataset. A src_env is an environment with an id_cfg attribute, as well as sub-classes as specified by the data source class_prefix configuration setting (see load_src_cfg()). All src_env objects created by calling attach_src() represent environments that are direct descendants of the data environment and are bound to the respective dataset name within that environment. While attach_src() does not immediately instantiate a src_env object, it rather creates a promise using base::delayedAssign() which evaluates to a src_env upon first access. This allows for data sources to be set up where the data is missing in a way that prompts the user to download and import the data when first accessed.

Additionally, attach_src() creates an active binding using base::makeActiveBinding(), binding a function to the dataset name within the environment passed as assign_env, which retrieves the respective src_env from the data environment. This shortcut is set up for convenience, such that for example the MIMIC-III demo dataset not only is available as ricu::data::mimic_demo, but also as ricu::mimic_demo (or if the package namespace is attached, simply as mimic_demo). The ricu namespace contains objects mimic, mimic_demo, eicu, etc. which are used as such links when loading the package. However, new data sets can be set up an accessed in the same way.

If set up correctly, it is not necessary for the user to directly call attach_src(). When the package is loaded, the default data sources are attached automatically. This default can be controlled by setting as environment variable RICU_SRC_LOAD a comma separated list of data source names before loading the library. Setting this environment variable as

Sys.setenv(RICU_SRC_LOAD = "mimic_demo,eciu_demo")

will change the default of loading both MIMIC-III and eICU, alongside the respective demo datasets, and HiRID, to just the two demo datasets. For setting an environment variable upon startup of the R session, refer to base::.First.sys().

The src_env promise for each data source is created using the S3 generic function setup_src_env(). This function checks if all required files are available from data_dir. If files are missing the user is prompted for download in interactive sessions and an error is thrown otherwise. As soon as all required data is available, a src_tbl object is created per table and assigned to the src_env.

The S3 class src_tbl inherits from prt, which represents a partitioned fst file. In addition to the prt object, meta data in the form of col_cfg and tbl_cfg is associated with a src_tbl object (see load_src_cfg()). Furthermore, as with src_env, sub-classes are added as specified by the source configuration class_prefix entry. This allows certain functionality, for example data loading, to be adapted to data source-specific requirements.

Value

The constructors new_src_env()/new_src_tbl() as well as coercion functions as_src_env()/as_src_tbl() return src_env and src_tbl objects respectively. The function attach_src() is called for side effects and returns NULL invisibly, while setup_src_env() instantiates and returns a src_env object.

Examples

## Not run: 

Sys.setenv(RICU_SRC_LOAD = "")
library(ricu)

ls(envir = data)
exists("mimic_demo")

attach_src("mimic_demo")

ls(envir = data)
exists("mimic_demo")

mimic_demo


## End(Not run)

ricu

Intensive Care Unit Data with R

v0.1.3
GPL-3
Authors
Nicolas Bennett [aut, cre], Drago Plecko [aut], Ida-Fong Ukor [aut]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.