Previous chapter
WorkshopDay 2
Next chapter

Package Testing

Use the downloaded data set and write down your feedback for the points stated in the previous section. Please find below the table which package you should examine:

Give Feedback, Submit Github Issues

Furthermore, write bug reports, improvements which should be implemented into the package (errors, typos, documentation issues). Write (at least) five issues for the other team on Github.

Implement Based on new Specifications

The package you have chosen needs to be extended to support the following functionality. Addtionally (as a bonus), we like to have tests for each function in our package to ensure they are working as expected.

Clean

Since some models do not handle NA’s well we would like to do mean imputation (replace missing values with average) for numeric values and give the user clear warnings that this has happened. This functionality should be optional for the user. How could we handle categorical variables as well?

Create a test case which ensures above mentioned functionality.

Transform

Some models need numeric variables to have a standard-normal distribution. This can be achieved as follows:

\[ z = \frac{x - \hat{\mu}}{\hat{\sigma}} \] where \(\hat{\mu}\) is the sample mean, \(x\) is the numeric value in question and \(\hat{\sigma}\) the sample standard deviation.

Model

We would now also like to support logistic regression models using glm(..., family = "binomial") in our package. Convert the output probabilities from predict back to a factor and compare with the actual classes. According to the in-sample model accuracy of the tree- and the logistic regression model, which one is better?

Bonus: Model-Plot

Use tidy() from the broom package to extract the estimates of the logistic regression model. You can now plot the p-values using geom_bar(). Convert term to a factor through mutate() and use .$term as levels resulting after the ordering by arrange(desc(p.value)).

Bonus: Model-Performance

In addition to the accuracy/precision estimate we would also like to compute the sensitivity and specificity of the model, see also https://en.wikipedia.org/wiki/Sensitivity_and_specificity.

Super-Bonus: Implement a function to plot the ROC curve and compute the AUC.

Discussion Round

{width=‘50%’}

Discuss finished packages, Homework