Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

outliers

Outliers data


Description

This dataset is approximately bell shaped, but with some outliers. It is meant to be used for demonstration purposes. If students are tempted to throw out all outliers, then have them work with this data (or use a scaled/centered/shuffled version as errors in a regression problem) and see how many throw away 3/4 of the data before rethinking their strategy.

Usage

data(outliers)

Format

The format is: num [1:100] -1.548 0.172 -0.638 0.233 -0.228 ...

Details

This is simulated data meant to demonstrate "outliers".

Source

Simulated, see the examples section.

Examples

data(outliers)
qqnorm(outliers)
qqline(outliers)
hist(outliers)

o.chuck <- function(x) {  # function to throw away outliers
	qq <- quantile(x, c(1,3)/4, names=FALSE)
	r <- diff(qq) * 1.5
	tst <- x < qq[1] - r | x > qq[2] + r
	if(any(tst)) {
		cat('Removing ', paste(x[tst], collapse=', '), '\n')
		x <- x[!tst]
		out <- Recall(x)
	} else {
		out <- x
	}
	out
}

x <- o.chuck( outliers )
length(x)

if(require(MASS)) {
  char2seed('robust')
  x <- 1:100
  y <- 3 + 2*x + sample(scale(outliers))*10

  plot(x,y)
  fit <- lm(y~x)
  abline(fit, col='red')

  fit.r <- rlm(y~x)
  abline(fit.r, col='blue', lty='dashed')

  rbind(coef(fit), coef(fit.r))
  length(o.chuck(resid(fit)))
}



### The data was generated using code similar to:

char2seed('outlier')

outliers <- rnorm(25)

dir <- 1

while( length(outliers) < 100 ){
	qq <- quantile(c(outliers, dir*Inf), c(1,3)/4)
	outliers <- c(outliers,
		qq[ 1.5 + dir/2 ] + dir*1.55*diff(qq) + dir*abs(rnorm(1)) )
	dir <- -dir
}

TeachingDemos

Demonstrations for Teaching and Learning

v2.12
Artistic-2.0
Authors
Greg Snow
Initial release
2020-04-01

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.