Exercise: Modify an existing package
You find a package on Github named
datacleaner which is supposed to clean data sets and handle
NA values accordingly: https://github.com/Quantargo/datacleaner. Additionally, the package is supposed to deal with outliers. Currently, the package implements the functions
meanimpute() to do data winsorization and mean imputation on data sets.
You decide to use the package from Github but unfortunately some functionality seems to be missing and it some checks seem to be failing. Specifically, you find the following critical points:
- The function
windsorize() does not seem to be working correctly.
- The function
windsorize() should give an error with a clear message if either an empty vector or a vector containing only
NA’s is passed. Hint: Use the function
stop() to throw errors if conditions are met.
- You need a function
transform_log() to log-transform values. The function should give an error or ideally hints to workarounds if numerical errors arise.
R CMD check (or
devtools::check(), Button Check in the Build tab in RStudio) fails with errors, warnings and notes.
- The documentation of functions is lacking a clear explanation of what they are doing, see e.g.
?windsorize. Additionally, function examples are missing, e.g., use a vector created using
rnorm() to explain how
windsorize() is working or
Requirements: This exercise requires a Github account. Please create one if not already done as described in the Git chapter.
Unfortunately, the package author writes you an e-mail which says that she has no time dealing with the issues but pull-requests are highly welcome. You therefore need to implement the fixes and features by yourself and issue a pull-request later. Go with the following steps to accomplish the tasks:
- Fork the original git repository to your own Github account using the Fork button, see also https://help.github.com/en/articles/fork-a-repo. Do not forget to also change the remote URL on your local Git-repository using
git remote set-url link or do a new clone of your forked repository.
- Implement all of the critical points and try to use one commit for each point implemented. Do not forget to mention your Issue numbers using hash tags in the commit messages. Regularly push the commits to your forked repository.
- Bonus: Implement test-cases using the testthat package for all relevant functions (see also chapter Testing). Test e.g. if errors are correctly given by the
windsorize() function for vectors containing
NAs or zero-length vectors (e.g.
- Check your package and get rid of all errors, warnings, and notes.
- Create a new Pull-request which contains all fixes, see also https://help.github.com/en/articles/creating-a-pull-request.