DDoutlier: KNN_SUM – R documentation

Pricing

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

Get Started for Free

Documentation

DDoutlier

KNN_SUM

Sum of distance to k-nearest neighbors

Description

Function to calculate sum of distance to k-nearest neighbors as an outlier score, based on a kd-tree

Usage

KNN_SUM(dataset, k=5)

Arguments

`dataset`	The dataset for which observations have a summed k-nearest neighbors distance returned
`k`	The number of k-nearest neighbors. k has to be smaller than the number of observations in dataset

Details

KNN_SUM computes the sum of distance to neighboring observations. A kd-tree is used for kNN computation, using the kNN() function from the 'dbscan' package. The KNN_SUM function is useful for outlier detection in clustering and other multidimensional domains.

Value

A vector of summed distance for observations. The greater distance, the greater outlierness

Author(s)

Jacob H. Madsen

Examples

# Create dataset and set an optional k
X <- iris[,1:4]
K <- 5

# Find outliers
outlier_score <- KNN_SUM(dataset=X, k=K)

# Sort and find index for most outlying observations
names(outlier_score) <- 1:nrow(X)
sort(outlier_score, decreasing = TRUE)

# Inspect the distribution of outlier scores
hist(outlier_score)

DDoutlier

Distance & Density-Based Outlier Detection

v0.1.0

MIT + file LICENSE

Authors

Jacob H. Madsen <jacob.madsen1@mail.com>

Initial release

KNN_SUM

Description

Usage

Arguments

Details

Value

Author(s)

Examples

DDoutlier

We don't support your browser anymore