Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

undersample_mindist

Undersample a dataset by iteratively removing the observation with the lowest total distance to its neighbors of the same class.


Description

Undersample a dataset by iteratively removing the observation with the lowest total distance to its neighbors of the same class.

Usage

undersample_mindist(data, cls, cls_col, m, dist_calc = "euclidean")

Arguments

data

Dataset to undersample. Aside from cls_col, must be numeric.

cls

Class to be undersampled.

cls_col

Column containing class information.

m

Desired number of observations after undersampling.

dist_calc

Method for distance calculation. See dist().

Value

An undersampled dataframe.

Examples

setosa <- iris[iris$Species == "setosa", ]
nrow(setosa)
undersamp <- undersample_mindist(setosa, "setosa", "Species", 50)
nrow(undersamp)

scutr

Balancing Multiclass Datasets for Classification Tasks

v0.1.2
MIT + file LICENSE
Authors
Keenan Ganz [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.