Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!
Get Started for Free

undersample_hclust

Undersample a dataset by hierarchical clustering.


Description

Undersample a dataset by hierarchical clustering.

Usage

undersample_hclust(
  data,
  cls,
  cls_col,
  m,
  k = 5,
  h = NA,
  dist_calc = "euclidean"
)

Arguments

data

Dataset to be undersampled.

cls

Majority class that will be undersampled.

cls_col

Column in data containing class memberships.

m

Number of samples in undersampled dataset.

k

Number of clusters to derive from clustering.

h

Height at which to cut the clustering tree. k must be NA for this to be used.

dist_calc

Distance calculation method. See dist().

Value

Undersampled dataframe containing only cls.

Examples

table(iris$Species)
undersamp <- undersample_hclust(iris, "setosa", "Species", 15)
nrow(undersamp)

scutr

Balancing Multiclass Datasets for Classification Tasks

v0.1.2
MIT + file LICENSE
Authors
Keenan Ganz [aut, cre]
Initial release

We don't support your browser anymore

Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.