Data attach utilities
Making a dataset available to ricu
consists of 3 steps: downloading
(download_src()
), importing (import_src()
) and attaching
(attach_src()
). While downloading and importing are one-time procedures,
attaching of the dataset is repeated every time the package is loaded.
Briefly, downloading loads the raw dataset from the internet (most likely
in .csv
format), importing consists of some preprocessing to make the
data available more efficiently and attaching sets up the data for use by
the package.
attach_src(x, ...) ## S3 method for class 'src_cfg' attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...) ## S3 method for class 'character' attach_src(x, assign_env = NULL, data_dir = src_data_dir(x), ...) setup_src_env(x, env, ...) ## S3 method for class 'src_cfg' setup_src_env(x, env, data_dir = src_data_dir(x), ...) new_src_tbl(files, col_cfg, tbl_cfg, prefix, src_env) as_src_tbl(x, ...) new_src_env(x, env = new.env(parent = data_env())) as_src_env(x)
x |
Data source to attach |
... |
Forwarded to further calls to |
assign_env |
Environment in which the data source will become available |
data_dir |
Directory used to look for |
env |
Environment where data proxy objects are created |
files |
File names of |
col_cfg |
Coerced to |
tbl_cfg |
Coerced to |
prefix |
Character vector valued data source name(s) (used as class prefix) |
src_env |
The data source environment (as |
Attaching a dataset sets up two types of S3 classes: a single src_env
object, containing as many src_tbl
objects as tables are associated with
the dataset. A src_env
is an environment with an id_cfg
attribute, as
well as sub-classes as specified by the data source class_prefix
configuration setting (see load_src_cfg()
). All src_env
objects created
by calling attach_src()
represent environments that are direct
descendants of the data
environment and are bound to the respective
dataset name within that environment. While attach_src()
does not
immediately instantiate a src_env
object, it rather creates a promise
using base::delayedAssign()
which evaluates to a src_env
upon first
access. This allows for data sources to be set up where the data is missing
in a way that prompts the user to download and import the data when first
accessed.
Additionally, attach_src()
creates an active binding using
base::makeActiveBinding()
, binding a function to the dataset name within
the environment passed as assign_env
, which retrieves the respective
src_env
from the data
environment. This shortcut is set up for
convenience, such that for example the MIMIC-III demo dataset not only is
available as ricu::data::mimic_demo
, but also as ricu::mimic_demo
(or if
the package namespace is attached, simply as mimic_demo
). The ricu
namespace contains objects mimic
, mimic_demo
, eicu
, etc. which are
used as such links when loading the package. However, new data sets can be
set up an accessed in the same way.
If set up correctly, it is not necessary for the user to directly call
attach_src()
. When the package is loaded, the default data sources are
attached automatically. This default can be controlled by setting as
environment variable RICU_SRC_LOAD
a comma separated list of data source
names before loading the library. Setting this environment variable as
Sys.setenv(RICU_SRC_LOAD = "mimic_demo,eciu_demo")
will change the default of loading both MIMIC-III and eICU, alongside the
respective demo datasets, and HiRID, to just the two demo datasets. For
setting an environment variable upon startup of the R session, refer to
base::.First.sys()
.
The src_env
promise for each data source is created using the S3 generic
function setup_src_env()
. This function checks if all required files are
available from data_dir
. If files are missing the user is prompted for
download in interactive sessions and an error is thrown otherwise. As soon
as all required data is available, a src_tbl
object is created per table
and assigned to the src_env
.
The S3 class src_tbl
inherits from prt
, which
represents a partitioned fst
file. In addition to the prt
object, meta data in the form of col_cfg
and tbl_cfg
is associated with
a src_tbl
object (see load_src_cfg()
). Furthermore, as with src_env
,
sub-classes are added as specified by the source configuration
class_prefix
entry. This allows certain functionality, for example data
loading, to be adapted to data source-specific requirements.
The constructors new_src_env()
/new_src_tbl()
as well as coercion
functions as_src_env()
/as_src_tbl()
return src_env
and src_tbl
objects respectively. The function attach_src()
is called for side
effects and returns NULL
invisibly, while setup_src_env()
instantiates
and returns a src_env
object.
## Not run: Sys.setenv(RICU_SRC_LOAD = "") library(ricu) ls(envir = data) exists("mimic_demo") attach_src("mimic_demo") ls(envir = data) exists("mimic_demo") mimic_demo ## End(Not run)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.