Costs function for poor classification
classification_cost()
calculates the cost of a poor prediction based on
user-defined costs. The costs are multiplied by the estimated class
probabilities and the mean cost is returned.
classification_cost(data, ...) ## S3 method for class 'data.frame' classification_cost( data, truth, ..., costs = NULL, na_rm = TRUE, event_level = yardstick_event_level() ) classification_cost_vec( truth, estimate, costs = NULL, na_rm = TRUE, event_level = yardstick_event_level(), ... )
data |
A |
... |
A set of unquoted column names or one or more
|
truth |
The column identifier for the true class results
(that is a |
costs |
A data frame with columns
It is often the case that when If any combinations of the levels of If |
na_rm |
A |
event_level |
A single string. Either |
estimate |
If |
As an example, suppose that there are three classes: "A"
, "B"
, and "C"
.
Suppose there is a truly "A"
observation with class probabilities A = 0.3 / B = 0.3 / C = 0.4
. Suppose that, when the true result is class "A"
, the
costs for each class were A = 0 / B = 5 / C = 10
, penalizing the
probability of incorrectly predicting "C"
more than predicting "B"
. The
cost for this prediction would be 0.3 * 0 + 0.3 * 5 + 0.4 * 10
. This
calculation is done for each sample and the individual costs are averaged.
A tibble
with columns .metric
, .estimator
,
and .estimate
and 1 row of values.
For grouped data frames, the number of rows returned will be the same as the number of groups.
For class_cost_vec()
, a single numeric
value (or NA
).
Max Kuhn
Other class probability metrics:
average_precision()
,
gain_capture()
,
mn_log_loss()
,
pr_auc()
,
roc_auc()
,
roc_aunp()
,
roc_aunu()
library(dplyr) # --------------------------------------------------------------------------- # Two class example data(two_class_example) # Assuming `Class1` is our "event", this penalizes false positives heavily costs1 <- tribble( ~truth, ~estimate, ~cost, "Class1", "Class2", 1, "Class2", "Class1", 2 ) # Assuming `Class1` is our "event", this penalizes false negatives heavily costs2 <- tribble( ~truth, ~estimate, ~cost, "Class1", "Class2", 2, "Class2", "Class1", 1 ) classification_cost(two_class_example, truth, Class1, costs = costs1) classification_cost(two_class_example, truth, Class1, costs = costs2) # --------------------------------------------------------------------------- # Multiclass data(hpc_cv) # Define cost matrix from Kuhn and Johnson (2013) hpc_costs <- tribble( ~estimate, ~truth, ~cost, "VF", "VF", 0, "VF", "F", 1, "VF", "M", 5, "VF", "L", 10, "F", "VF", 1, "F", "F", 0, "F", "M", 5, "F", "L", 5, "M", "VF", 1, "M", "F", 1, "M", "M", 0, "M", "L", 1, "L", "VF", 1, "L", "F", 1, "L", "M", 1, "L", "L", 0 ) # You can use the col1:colN tidyselect syntax hpc_cv %>% filter(Resample == "Fold01") %>% classification_cost(obs, VF:L, costs = hpc_costs) # Groups are respected hpc_cv %>% group_by(Resample) %>% classification_cost(obs, VF:L, costs = hpc_costs)
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.