superml: smoothMean – R documentation

Become an expert in R — Interactive courses, Cheat Sheets, certificates and more!

smoothMean

smoothMean Calculator

Description

Calculates target encodings using a smoothing parameter and count of categorical variables. This approach is more robust to possibility of leakage and avoid overfitting.

Usage

smoothMean(
  train_df,
  test_df,
  colname,
  target,
  min_samples_leaf = 1,
  smoothing = 1,
  noise_level = 0
)

Arguments

`train_df`	train dataset
`test_df`	test dataset
`colname`	name of categorical column
`target`	name of target column
`min_samples_leaf`	minimum samples to take category average into account
`smoothing`	smoothing effect to balance categorical average vs prior
`noise_level`	random noise to add, optional

Value

a train and test data table with mean encodings of the target for the given categorical variable

Examples

train <- data.frame(region=c('del','csk','rcb','del','csk','pune','guj','del'),
                    win = c(0,1,1,0,0,1,0,1))
test <- data.frame(region=c('rcb','csk','rcb','del','guj','pune','csk','kol'))

# calculate encodings
all_means <- smoothMean(train_df = train,
                         test_df = test,
                         colname = 'region',
                         target = 'win')
train_mean <- all_means$train
test_mean <- all_means$test

superml

Build Machine Learning Models Like Using Python's Scikit-Learn Library in R

v0.5.3

GPL-3 | file LICENSE

Authors

Manish Saraswat [aut, cre]

Initial release