scutr: undersample_hclust – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

undersample_hclust

Undersample a dataset by hierarchical clustering.

Undersample a dataset by hierarchical clustering.

undersample_hclust(
  data,
  cls,
  cls_col,
  m,
  k = 5,
  h = NA,
  dist_calc = "euclidean"
)

`data`	Dataset to be undersampled.
`cls`	Majority class that will be undersampled.
`cls_col`	Column in data containing class memberships.
`m`	Number of samples in undersampled dataset.
`k`	Number of clusters to derive from clustering.
`h`	Height at which to cut the clustering tree. `k` must be `NA` for this to be used.
`dist_calc`	Distance calculation method. See `dist()`.

Undersampled dataframe containing only cls.

table(iris$Species)
undersamp <- undersample_hclust(iris, "setosa", "Species", 15)
nrow(undersamp)

Balancing Multiclass Datasets for Classification Tasks

v0.1.2

MIT + file LICENSE

Authors

Keenan Ganz [aut, cre]

Initial release