scutr: undersample_mindist – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

undersample_mindist

Undersample a dataset by iteratively removing the observation with the lowest total distance to its neighbors of the same class.

Undersample a dataset by iteratively removing the observation with the lowest total distance to its neighbors of the same class.

undersample_mindist(data, cls, cls_col, m, dist_calc = "euclidean")

`data`	Dataset to undersample. Aside from `cls_col`, must be numeric.
`cls`	Class to be undersampled.
`cls_col`	Column containing class information.
`m`	Desired number of observations after undersampling.
`dist_calc`	Method for distance calculation. See `dist()`.

An undersampled dataframe.

setosa <- iris[iris$Species == "setosa", ]
nrow(setosa)
undersamp <- undersample_mindist(setosa, "setosa", "Species", 50)
nrow(undersamp)

Balancing Multiclass Datasets for Classification Tasks

v0.1.2

MIT + file LICENSE

Authors

Keenan Ganz [aut, cre]

Initial release