Lift curve
lift_curve()
constructs the full lift curve and returns a
tibble. See gain_curve()
for a closely related concept.
lift_curve(data, ...) ## S3 method for class 'data.frame' lift_curve( data, truth, ..., na_rm = TRUE, event_level = yardstick_event_level() ) autoplot.lift_df(object, ...)
data |
A |
... |
A set of unquoted column names or one or more
|
truth |
The column identifier for the true class results
(that is a |
na_rm |
A |
event_level |
A single string. Either |
object |
The |
There is a ggplot2::autoplot()
method for quickly visualizing the curve. This works for
binary and multiclass output, and also works with grouped data (i.e. from
resamples). See the examples.
A tibble with class lift_df
or lift_grouped_df
having
columns:
.n
- The index of the current sample.
.n_events
- The index of the current unique sample. Values with repeated
estimate
values are given identical indices in this column.
.percent_tested
- The cumulative percentage of values tested.
.lift
- First calculate the cumulative percentage of true results relative to the
total number of true results. Then divide that by .percent_tested
.
The motivation behind cumulative gain and lift charts is as a visual method to determine the effectiveness of a model when compared to the results one might expect without a model. As an example, without a model, if you were to advertise to a random 10\ to capture 10\ advertised to your entire customer base. Given a model that predicts which customers are more likely to respond, the hope is that you can more accurately target 10\ \>10\
The calculation to construct lift curves is as follows:
truth
and estimate
are placed in descending order by the estimate
values (estimate
here is a single column supplied in ...
).
The cumulative number of samples with true results relative to the entire number of true results are found.
The cumulative \ to construct the lift value. This ratio represents the factor of improvement over an uninformed model. Values >1 represent a valuable model. This is the y-axis of the lift chart.
If a multiclass truth
column is provided, a one-vs-all
approach will be taken to calculate multiple curves, one per level.
In this case, there will be an additional column, .level
,
identifying the "one" column in the one-vs-all calculation.
There is no common convention on which factor level should
automatically be considered the "event" or "positive" result
when computing binary classification metrics. In yardstick
, the default
is to use the first level. To alter this, change the argument
event_level
to "second"
to consider the last level of the factor the
level of interest. For multiclass extensions involving one-vs-all
comparisons (such as macro averaging), this option is ignored and
the "one" level is always the relevant result.
Max Kuhn
Other curve metrics:
gain_curve()
,
pr_curve()
,
roc_curve()
# --------------------------------------------------------------------------- # Two class example # `truth` is a 2 level factor. The first level is `"Class1"`, which is the # "event of interest" by default in yardstick. See the Relevant Level # section above. data(two_class_example) # Binary metrics using class probabilities take a factor `truth` column, # and a single class probability column containing the probabilities of # the event of interest. Here, since `"Class1"` is the first level of # `"truth"`, it is the event of interest and we pass in probabilities for it. lift_curve(two_class_example, truth, Class1) # --------------------------------------------------------------------------- # `autoplot()` library(ggplot2) library(dplyr) # Use autoplot to visualize autoplot(lift_curve(two_class_example, truth, Class1)) # Multiclass one-vs-all approach # One curve per level hpc_cv %>% filter(Resample == "Fold01") %>% lift_curve(obs, VF:L) %>% autoplot() # Same as above, but will all of the resamples hpc_cv %>% group_by(Resample) %>% lift_curve(obs, VF:L) %>% autoplot()
Please choose more modern alternatives, such as Google Chrome or Mozilla Firefox.