Previous chapter
PreparationModify Existing Package
Next chapter

Exercise: Modify an existing package

You find a package on Github named datacleaner which is supposed to clean data sets and handle NA values accordingly: https://github.com/Quantargo/datacleaner. Additionally, the package is supposed to deal with outliers. Currently, the package implements the functions windsorize() and meanimpute() to do data winsorization and mean imputation on data sets.

You decide to use the package from Github but unfortunately some functionality seems to be missing and it some checks seem to be failing. Specifically, you find the following critical points:

  1. The function windsorize() does not seem to be working correctly.
  2. The function windsorize() should give an error with a clear message if either an empty vector or a vector containing only NA’s is passed. Hint: Use the function stop() to throw errors if conditions are met.
  3. You need a function transform_log() to log-transform values. The function should give an error or ideally hints to workarounds if numerical errors arise.
  4. R CMD check (or devtools::check(), Button Check in the Build tab in RStudio) fails with errors, warnings and notes.
  5. The documentation of functions is lacking a clear explanation of what they are doing, see e.g. ?windsorize. Additionally, function examples are missing, e.g., use a vector created using rnorm() to explain how windsorize() is working or exp(rnorm()) for transform_log().

Requirements: This exercise requires a Github account. Please create one if not already done as described in the Git chapter.

Reporting Issues

Test the critical points mentioned in the previous section by yourself and give feedback by using the following steps:

  1. Create a new project from version control and download the package locally in RStudio. Hint: Use the repository URL https://github.com/Quantargo/datacleaner.git.
  2. Install the package locally (button Install & Restart in RStudio)
  3. Test for the critical points mentioned before
  4. Give the package author feedback by submitting Bug and Feature requests at https://github.com/Quantargo/datacleaner/issues. See also https://help.github.com/en/articles/creating-an-issue for creating Issues in Github.

Implementation

Unfortunately, the package author writes you an e-mail which says that she has no time dealing with the issues but pull-requests are highly welcome. You therefore need to implement the fixes and features by yourself and issue a pull-request later. Go with the following steps to accomplish the tasks:

  1. Fork the original git repository to your own Github account using the Fork button, see also https://help.github.com/en/articles/fork-a-repo. Do not forget to also change the remote URL on your local Git-repository using git remote set-url link or do a new clone of your forked repository.
  2. Implement all of the critical points and try to use one commit for each point implemented. Do not forget to mention your Issue numbers using hash tags in the commit messages. Regularly push the commits to your forked repository.
  3. Bonus: Implement test-cases using the testthat package for all relevant functions (see also chapter Testing). Test e.g. if errors are correctly given by the windsorize() function for vectors containing ONLY NAs or zero-length vectors (e.g. numeric(0)).
  4. Check your package and get rid of all errors, warnings, and notes.
  5. Create a new Pull-request which contains all fixes, see also https://help.github.com/en/articles/creating-a-pull-request.