One Hot Encoding
One-hot encoding on categorical variables and replace missing values. It is not needed when creating a standard scorecard model, but required in models that without doing woe transformation.
one_hot(dt, var_skip = NULL, var_encode = NULL, nacol_rm = FALSE, ...)
dt |
A data frame. |
var_skip |
Name of categorical variables that will skip for one-hot encoding. Defaults to NULL. |
var_encode |
Name of categorical variables to be one-hot encoded, Defaults to NULL. If it is NULL, then all categorical variables except in var_skip are counted. |
nacol_rm |
Logical. One-hot encoding on categorical variable contains missing values, whether to remove the column generated to indicate the presence of NAs. Defaults to FALSE. |
... |
Additional parameters. |
A data frame
# load germancredit data data(germancredit) library(data.table) dat = rbind( setDT(germancredit)[, c(sample(20,3),21)], data.table(creditability=sample(c("good","bad"),10,replace=TRUE)), fill=TRUE) # one hot encoding ## keep na columns from categorical variable dat_onehot1 = one_hot(dat, var_skip = 'creditability', nacol_rm = FALSE) # default str(dat_onehot1) ## remove na columns from categorical variable dat_onehot2 = one_hot(dat, var_skip = 'creditability', nacol_rm = TRUE) str(dat_onehot2)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.