smoothMean Calculator
Calculates target encodings using a smoothing parameter and count of categorical variables. This approach is more robust to possibility of leakage and avoid overfitting.
smoothMean( train_df, test_df, colname, target, min_samples_leaf = 1, smoothing = 1, noise_level = 0 )
train_df |
train dataset |
test_df |
test dataset |
colname |
name of categorical column |
target |
name of target column |
min_samples_leaf |
minimum samples to take category average into account |
smoothing |
smoothing effect to balance categorical average vs prior |
noise_level |
random noise to add, optional |
a train and test data table with mean encodings of the target for the given categorical variable
train <- data.frame(region=c('del','csk','rcb','del','csk','pune','guj','del'), win = c(0,1,1,0,0,1,0,1)) test <- data.frame(region=c('rcb','csk','rcb','del','guj','pune','csk','kol')) # calculate encodings all_means <- smoothMean(train_df = train, test_df = test, colname = 'region', target = 'win') train_mean <- all_means$train test_mean <- all_means$test
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.