Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

compareDag

Compare two DAGs


Description

Function that returns multiple graph metrics to compare two DAGs, known as confusion matrix or error matrix.

Usage

compareDag(ref, test, node.names = NULL)

Arguments

ref

a matrix or a formula statement (see details for format) defining the reference network structure, a directed acyclic graph (DAG). Note that row names must be set or given in node.names if the DAG is given via a formula statement.

test

a matrix or a formula statement (see details for format) defining the test network structure, a directed acyclic graph (DAG). Note that row names must be set or given in node.names if the DAG is given via a formula statement.

node.names

a vector of names if the DAGs are given via formula, see details.

Details

This R function returns standard Directed Acyclic Graph comparison metrics. In statistical classification, those metrics are known as a confusion matrix or error matrix. Those metrics allows visualization of the difference between different DAGs. In the case where comparing TRUTH to learned structure or two learned structures, those metrics allow the user to estimate the performance of the learning algorithm. In order to compute the metrics, a contingency table is computed of a pondered difference of the adjacency matrices od the two graphs.

The returns metrics are: TP = True Positive TN = True Negative FP = False Positive FN = False Negative CP = Condition Positive (ref) CN = Condition Negative (ref) PCP = Predicted Condition Positive (test) PCN = Predicted Condition Negative (test)

True Positive Rate

=\frac{∑ TP}{∑ CP}

False Positive Rate

=\frac{∑ FP}{∑ CN}

Accuracy

=\frac{∑ TP + ∑ TN}{Total population}

G-measure

√ {{\frac {TP}{TP+FP}}\cdot {\frac {TP}{TP+FN}}}

F1-Score

\frac{2 ∑ TP}{2 ∑ TP + ∑ FN + ∑ FP}

Positive Predictive Value

\frac{∑ TP}{∑ PCP}

False Ommision Rate

\frac{∑ FN}{∑ PCN}

Hamming-Distance: Number of changes needed to match the matrices.

The ref or test can be provided using a formula statement (similar to GLM input). A typical formula is ~ node1|parent1:parent2 + node2:node3|parent3. The formula statement have to start with ~. In this example, node1 has two parents (parent1 and parent2). node2 and node3 have the same parent3. The parents names have to exactly match those given in node.names. : is the separtor between either children or parents, | separates children (left side) and parents (right side), + separates terms, . replaces all the variables in node.names.

Value

A list giving DAGs comparison metrics. The metrics are: True Positive Rate, False Positive Rate, Accuracy, G-measure, F1-Score, Positive Predictive Value, False Omission Rate, and the Hamming-Distance.

Author(s)

Gilles Kratzer

References

Sammut, Claude, and Geoffrey I. Webb. (2017). Encyclopedia of machine learning and data mining. Springer.

Further information about abn can be found at:
http://r-bayesian-networks.org

Examples

test.m <- matrix(data = c(0,1,0,
                          0,0,0,
                          1,0,0), nrow = 3, ncol = 3)
    
ref.m <- matrix(data = c(0,0,0,
                          1,0,0,
                          1,0,0), nrow = 3, ncol = 3)
                          
colnames(test.m) <- rownames(test.m) <- colnames(ref.m) <- colnames(ref.m) <- c("a", "b", "c")
                          
compareDag(ref = ref.m, test = test.m)

abn

Modelling Multivariate Data with Additive Bayesian Networks

v2.5-0
GPL (>= 2)
Authors
Gilles Kratzer [aut, cre] (<https://orcid.org/0000-0002-5929-8935>), Fraser Iain Lewis [aut] (<https://orcid.org/0000-0003-4580-2712>), Reinhard Furrer [ctb] (<https://orcid.org/0000-0002-6319-2332>), Marta Pittavino [ctb] (<https://orcid.org/0000-0002-1232-1034>)
Initial release
2021-04-21

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.