Distance-based outlier detection based on user-given neighborhood size
Function to calculate how many observations are within a certain sized neighborhood as an outlier score. Outliers are classified according to a user-given threshold of observations to be within the neighborhood. Suggested by Knorr, M., & Ng, R. T. (1997)
DB(dataset, d = 1, fraction = 0.05)
dataset |
The dataset for which observations are classified as outliers/inliers |
d |
The radius of the neighborhood |
fraction |
The proportion of the number of observations to be within the neighborhood for observations to be classified as inliers. If the proportion of observations within the neighborhood is less than the given fraction, observations are classified as outliers |
DB computes a neighborhood for each observation given a radius (argument 'd') and returns the number of neighbors within the neighborhood. Observations are classified as inliers or outliers, based on a proportion (argument 'fraction') of observations to be within the neighborhood
neighbors |
The number of neighbors within the neighborhood |
classification |
Binary classification of observations as inlier or outlier |
Jacob H. Madsen
Knorr, M., & Ng, R. T. (1997). A Unified Approach for Mining Outliers. In Conf. of the Centre for Advanced Studies on Collaborative Research (CASCON). Toronto, Canada. pp. 236-248. DOI: 10.1145/782010.782021
# Create dataset X <- iris[,1:4] # Classify observations cls_observations <- DB(dataset=X, d=1, fraction=0.05)$classification # Remove outliers from dataset X <- X[cls_observations=='Inlier',]
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.