Previous chapter
Git Version ControlIntroduction
Next chapter

Introduction

Reading Materials

Introduction to Git for R Users by Hadley Wickham

RStudio and Git

What is Version Control?

A version control system is a tool that manages changes made to the files and directories in a project. Many version control systems exist; the most popular ones are Subversion (SVN) and Git. Git was developed by Linus Torvalds in 2005 for the Linux kernel development team and by design supports distributed workflows. Git has become the most popular approach for version control in the R package ecosystem - we thus focus on Git for this course here. On the graph below you see how Git has overtaken SVN in Google search volume by 2012:

{width=100%}

Git’s strengths are:

  • Nothing that is saved to Git is ever lost, so you can always go back to see which results were generated by which versions of your programs.
  • Git automatically notifies you when your work conflicts with someone else’s, so it’s harder (but not impossible) to accidentally overwrite work.
  • Git can synchronize work done by different people on different machines, so it scales as your team does.
  • Have you ever accidentally pressed s instead of Ctrl + S to save your file? It’s very easy to accidentally introduce a mistake that takes a few minutes to track down. Git makes this problem easy to spot because it allows you to see exactly what’s changed and undo any mistakes.

Version control isn’t just for software: books, papers, parameter sets, and anything that changes over time or needs to be shared can and should be stored and shared using something like Git.

Using Git

The version control system git is a command line tool which can be downloaded at https://git-scm.com/downloads for all major operating systems. Most editors also integrate with Git and offer nicer visualizations for changes and the Git history. In the next sections we will introduce the command line tool as well as the Git capabilities of RStudio. In practice, we will use both tools in parallel to have the best user experience in terms of features and usability.

Command Line

The git command line tool implements the entire functionality of Git. This also means that the tool implements a lot of options which can be quite hard to grasp at the beginning. To get you started you can inspect the excellent cheatsheet compiled by GitHub with the most important commands: https://education.github.com/git-cheat-sheet-education.pdf. We would recommend you to print this sheet out during the preparation phase and the workshop.

Setup

Most of Git’s settings should be left as they are. However, there are two you should set on every computer you use: your name and your email address. These are recorded in the log every time you commit a change, and are often used to identify the authors of a project’s content in order to give credit (or assign blame, depending on the circumstances).

To change a configuration value for all of your projects on a particular computer, run the following commands with your data:

git config --global user.name "[firstname lastname]
git config --global user.email "[valid-email]"

To see what the current settings are, you can use the command git config --list.

Where does Git store information?

Each of your Git projects has two parts: the files and directories that you create and edit directly, and the extra information that Git records about the project’s history. The combination of these two things is called a repository.

Git stores all of its extra information in a directory called .git located in the root directory of the repository. Git expects this information to be laid out in a very precise way, so you should never edit or delete anything in .git.

Setup & Init CLI

Setup & Init using the Command Line Interface (CLI)

Experienced Git users instinctively start new projects by creating repositories. If you are new to Git, though, or working with people who are, you will often want to convert existing projects into repositories. Doing so is simple: just run

git init

in the project’s root directory, or

git init /path/to/project

Sometimes you will join a project that is already running, inherit a project from someone else, or continue working on one of your own projects on a new machine. In each case, you will clone an existing repository instead of creating a new one. Cloning a repository does exactly what the name suggests: it creates a copy of an existing repository (including all of its history) in a new directory.

To clone a repository, use the command git clone URL, where URL identifies the repository you want to clone. This will normally be something like

https://github.com/myrepo/project.git

and will typically be allowed through most corporate firewalls. Alternatively, git can also be used through SSH (port 22) - the URL will then look as follows:

git@github.com:myrepo/project.git

Although the approach is typically faster - especially for larger repositories and re-uses the public ssh key directly (so even more secure) it typically leads to problems in corporate setups due to the use of port 22 and firewall restrictions.

When you clone a repository, Git uses the name of the existing repository as the name of the clone’s root directory: for example,

git clone /existing/project

will create a new directory called project. If you want to call the clone something else, add the directory name you want to the command:

git clone /existing/project newprojectname

Setup & Init RStudio

Setup & Init Git projects using RStudio

RStudio and Git

Within the R(Studio) environment the initialization/cloning workflow depends on your need to modify and push your changes later to the remote repository.

If there is no need to put your changes to a remote repository again you can use the git init command to turn your local directory to a Git repository. To clone a repository GitHub and install its contents as a package to our local R system use the remotes package from the R console as follows:

install.packages("remotes")
remotes::install_github("username/packagename")

If you need to work further on an existing remote repository we would recommend you to use the File | New Project dialog within RStudio as described in Packages from Git Version Control in the section Creating Packages. This step clones the repository, creates/re-uses the RStudio project file and sets everything up so that the project/package is ready to use.

Within RStudio you should now see a new tab called Git:

{width=60%}

All changes in your Git repository will be shown in the Git tab with the Status Modified:

{width=60%}

Let’s see how the basic Git workflow looks like in the next section.