Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

deleteBogusRows

Remove rows in which the proportion of missing data exceeds a threshold.


Description

If cases are mostly missing, delete them. It often happens that when data is imported from other sources, some noise rows exist at the bottom of the input. Anything that is missing in more than, say, 90% of cases is probably useless information. We invented this to deal with problem that MS Excel users often include a marginal note at the bottom of a spread sheet.

Usage

deleteBogusRows(dframe, pm = 0.9, drop = FALSE, verbose = TRUE,
  n = 25)

Arguments

dframe

A data frame or matrix

pm

"proportion missing data" to be tolerated.

drop

Default FALSE: if data frame result is reduced to one row, should R's default drop behavior "demote" this to a column vector.

verbose

Default TRUE. Should a report be printed summarizing information to be delted?

n

Default 25: limit on number of values to print in verbose diagnostic output. If set to NULL or NA, then all of the column values will be printed for the bogus rows.

Value

a data frame, invisibly

Author(s)

Paul Johnson <pauljohn@ku.edu>

Examples

mymat <- matrix(rnorm(10*100), nrow = 10, ncol = 100,
               dimnames = list(1:10, paste0("x", 1:100)))
mymat <- rbind(mymat, c(32, rep(NA, 99)))
mymat2 <- deleteBogusRows(mymat)
mydf <- as.data.frame(mymat)
mydf$someFactor <- factor(sample(c("A", "B"), size = NROW(mydf), replace = TRUE))
mydf2 <- deleteBogusRows(mydf, n = "all")

kutils

Project Management Tools

v1.70
GPL-2
Authors
Paul Johnson [aut, cre], Benjamin Kite [aut], Charles Redmon [aut], Jared Harpole [ctb], Kenna Whitley [ctb], Po-Yi Chen [ctb], Shadi Pirhosseinloo [ctb]
Initial release
2020-04-28

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.