Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

prob_dist

Probabilistic set distance


Description

Probabilistic set distance

Usage

prob_dist(U, kNN = 5, robMaha = FALSE, ncores = 1)

Arguments

U

A matrix, from which to detect outliers (rows). E.g. PC scores.

kNN

Number of nearest neighbors to use. Default is 5.

robMaha

Whether to use a robust Mahalanobis distance instead of the normal euclidean distance? Default is FALSE, meaning using euclidean.

ncores

Number of cores to use. Default is 1.

References

Kriegel, Hans-Peter, et al. "LoOP: local outlier probabilities." Proceedings of the 18th ACM conference on Information and knowledge management. ACM, 2009.

See Also

Examples

X <- readRDS(system.file("testdata", "three-pops.rds", package = "bigutilsr"))
svd <- svds(scale(X), k = 10)
U <- svd$u

test <- prob_dist(U)
plof <- test$dist.self / test$dist.nn
plof_ish <- test$dist.self / sqrt(test$dist.nn)
plot(U[, 1:2], col = (plof_ish > tukey_mc_up(plof_ish)) + 1, pch = 20)
plot(U[, 3:4], col = (plof_ish > tukey_mc_up(plof_ish)) + 1, pch = 20)
plot(U[, 5:6], col = (plof_ish > tukey_mc_up(plof_ish)) + 1, pch = 20)

bigutilsr

Utility Functions for Large-scale Data

v0.3.4
GPL-3
Authors
Florian Privé [aut, cre]
Initial release
2021-04-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.