Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

hist_out

Outlier detection (histogram)


Description

Outlier detection based on departure from histogram. Suitable for compact values (need a space between main values and outliers).

Usage

hist_out(x, breaks = nclass.scottRob, pmax_out = 0.2, nboot = NULL)

Arguments

x

Numeric vector (with compact values).

breaks

Same parameter as for hist(). Default uses a robust version of Scott's rule. You can also use "FD" or nclass.FD for a bit more bins.

pmax_out

Percentage at each side that can be considered outliers at each step. Default is 0.2.

nboot

Number of bootstrap replicates to estimate limits more robustly. Default is NULL (no bootstrap, even if I would recommend to use it).

Value

A list with

  • x: the initial vector, whose outliers have been removed,

  • lim: lower and upper limits for outlier removal,

  • all_lim: all bootstrap replicates for lim (if nboot not NULL).

Examples

set.seed(1)
x <- rnorm(1000)
str(hist_out(x))

# Easy to separate
x2 <- c(x, rnorm(50, mean = 7))
hist(x2, breaks = nclass.scottRob)
str(hist_out(x2))

# More difficult to separate
x3 <- c(x, rnorm(50, mean = 6))
hist(x3, breaks = nclass.scottRob)
str(hist_out(x3))
str(hist_out(x3, nboot = 999))

bigutilsr

Utility Functions for Large-scale Data

v0.3.4
GPL-3
Authors
Florian Privé [aut, cre]
Initial release
2021-04-08

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.