pcalg: rmvDAG – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

rmvDAG

Generate Multivariate Data according to a DAG

Description

Generate multivariate data with dependency structure specified by a (given) DAG (Directed Acyclic Graph) with nodes corresponding to random variables. The DAG has to be topologically ordered.

Usage

rmvDAG(n, dag,
       errDist = c("normal", "cauchy", "t4", "mix", "mixt3", "mixN100"),
       mix = 0.1, errMat = NULL, back.compatible = FALSE,
       use.node.names = !back.compatible)

Arguments

`n`	number of samples that should be drawn. (integer)
`dag`	a graph object describing the DAG; must contain weights for all the edges. The nodes must be topologically sorted. (For topological sorting use `tsort` from the RBGL package.)
`errDist`	string specifying the distribution of each node. Currently, the options "normal", "t4", "cauchy", "mix", "mixt3" and "mixN100" are supported. The first three generate standard normal-, t(df=4)- and cauchy-random numbers. The options containing the word "mix" create standard normal random variables with a mix of outliers. The outliers for the options "mix", "mixt3", "mixN100" are drawn from a standard cauchy, t(df=3) and N(0,100) distribution, respectively. The fraction of outliers is determined by the `mix` argument.
`mix`	for the `"mix"` error distributuion, `mix` specifies the fraction of “outlier” samples (i.e., Cauchy, t_3* or N(0,100)).
`errMat`	numeric n p* matrix specifiying the error vectors e_i (see Details), instead of specifying `errDist` (and maybe `mix`).
`back.compatible`	logical indicating if the data generated should be the same as with pcalg version 1.0-6 and earlier (where `wgtMatrix()` differed).
`use.node.names`	logical indicating if the column names of the result matrix should equal `nodes(dag)`, very sensibly, but new, hence the default.

Details

Each node is visited in the topological order. For each node i we generate a p-dimensional value X_i in the following way: Let X_1,…,X_k denote the values of all neighbours of i with lower order. Let w_1,…,w_k be the weights of the corresponding edges. Furthermore, generate a random vector E_i according to the specified error distribution. Then, the value of X_i is computed as

X_i = w_1*X_1 + … + w_k*X_k + E_i.

If node i has no neighbors with lower order, X_i = E_i is set.

Value

A n*p matrix with the generated data. The p columns correspond to the nodes (i.e., random variables) and each of the n rows correspond to a sample.

Author(s)

Markus Kalisch (kalisch@stat.math.ethz.ch) and Martin Maechler.

Examples

## generate random DAG
p <- 20
rDAG <- randomDAG(p, prob = 0.2, lB=0.1, uB=1)

if (require(Rgraphviz)) {
## plot the DAG
plot(rDAG, main = "randomDAG(20, prob = 0.2, ..)")
}

## generate 1000 samples of DAG using standard normal error distribution
n <- 1000
d.normMat <- rmvDAG(n, rDAG, errDist="normal")

## generate 1000 samples of DAG using standard t(df=4) error distribution
d.t4Mat <- rmvDAG(n, rDAG, errDist="t4")

## generate 1000 samples of DAG using standard normal with a cauchy
## mixture of 30 percent
d.mixMat <- rmvDAG(n, rDAG, errDist="mix",mix=0.3)

require(MASS) ## for mvrnorm()
Sigma <- toeplitz(ARMAacf(0.2, lag.max = p - 1))
dim(Sigma)# p x p
## *Correlated* normal error matrix "e_i" (against model assumption)
eMat <- mvrnorm(n, mu = rep(0, p), Sigma = Sigma)
d.CnormMat <- rmvDAG(n, rDAG, errMat = eMat)

pcalg

Methods for Graphical Models and Causal Inference

v2.7-2

GPL (>= 2)

Authors

Markus Kalisch [aut, cre], Alain Hauser [aut], Martin Maechler [aut], Diego Colombo [ctb], Doris Entner [ctb], Patrik Hoyer [ctb], Antti Hyttinen [ctb], Jonas Peters [ctb], Nicoletta Andri [ctb], Emilija Perkovic [ctb], Preetam Nandy [ctb], Philipp Ruetimann [ctb], Daniel Stekhoven [ctb], Manuel Schuerch [ctb], Marco Eigenmann [ctb], Leonard Henckel [ctb], Joris Mooij [ctb]

Initial release

2021-4-20