Previous chapter
Git Version ControlNavigating Git
Next chapter

What is a hash?

Every commit to a repository has a unique identifier called a hash (since it is generated by running the changes through a pseudo-random number generator called a hash function). The hash is written as a 40-character hexadecimal string like 7c35a3ce607a14953f070f0f83b5d74c2296ef93. However, in practice you only have to give Git the first 6 or 8 characters to identify each commit.

Hashes are what enable Git to share data efficiently between repositories. If two files are the same, their hashes are guaranteed to be the same. Similarly, if two commits contain the same files and have the same ancestors, their hashes will be the same as well. Git can therefore tell what information needs to be saved where by comparing hashes rather than comparing entire files.

Excercise

  1. Use git clone to download the repository Quantargo/customxgboost. Hint: Use the Clone or download Button on the Github page to copy the correct repository URL. In corporate settings, use the https link since most SSH ports are blocked by firewalls.
  2. Use cd to go into the newly created customxgboost directory and then run git log.

What is a HEAD in Git?

A Git HEAD can be thought of as a pointer to a commit in the current branch you are working in. The current head can be changed either using git checkout (typically to switch to a different branch) or git reset (to undo changes).

Explanation by Scott Chacon

Head is your current branch. It is a symbolic reference. It is a reference to a branch. You always have HEAD, but HEAD will be pointing to one of these other pointers, to one of the branches that you’re on. It is the parent of your next commit. It is what should be what was last checked-out into your working directory… This is the last known state of what your working directory was.

Description of Git HEAD starting at 46:30:

How can unstage files / undo changes?

Unstage files

To unstage files which have already been added to the staging area using git add can be unstaged with

git reset HEAD <file>

Undo changes to unstaged files

Suppose you have made changes to a file, then decide you want to undo them. Your text editor may be able to do this, but a more reliable way is to let Git do the work. The command:

git checkout -- <file>

will discard the changes that have not yet been staged. (The double dash -- must be there to separate the git checkout command from the names of the file or files you want to recover.)

Use this command carefully: once you discard changes in this way, they are gone forever.

How can I undo changes to staged files?

git reset will unstage files that you previously staged using git add. By combining git reset with git checkout, you can undo changes to a file that you staged changes to. The syntax is as follows.

git reset HEAD <file>
git checkout -- <file>

You may be wondering why there are two commands for re-setting changes. The answer is that unstaging a file and undoing changes are both special cases of more powerful Git operations that you have not yet seen.

Exercise

You are entering the command git status in a specific repository and get the following output:

On branch next
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   test.h


Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   myfile.h

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    myfile1.cpp
    myfile2.cpp

How do I restore an old version of a file?

You previously saw how to use git checkout to undo the changes that you made since the last commit. This command can also be used to go back even further into a file’s history and restore versions of that file from a commit. In this way, you can think of committing as saving your work, and checking out as loading that saved version.

The syntax for restoring an old version takes two arguments: the hash that identifies the version you want to restore, and the name of the file.

For example, if git log shows this:

commit f54aca90848cfe7c0bb536d6d9f86cb011f5369d (HEAD, origin/master, origin/HEAD, master)
Author: mario <mario.annau@gmail.com>
Date:   Wed Feb 20 00:59:45 2019 +0100

    Deactivate openmp conf

commit ac8dacfb24520bbd74344ac8417232fbc113dff8
Author: pmont <pmontm@gmail.com>
Date:   Sun May 6 15:44:48 2018 +0200

    Chaging package version to easily find the custom version

then git checkout ac8dac configure.ac would replace the current version of configure.ac with the version that was committed on May 6. Notice that this is the same syntax that you used to undo the unstaged changes, except -- has been replaced by a hash.

Restoring a file doesn’t erase any of the repository’s history. Instead, the act of restoring the file is saved as another commit, because you might later want to undo your undoing.

One more thing: there’s another feature of git log that will come in handy here. Passing - then a number restricts the output to that many commits. For example, git log -3 configure.ac shows you the last three commits involving configure.ac.

How can I undo all of the changes I have made?

So far, you have seen how to undo changes to a single file at a time using git reset HEAD path/to/file. You will sometimes want to undo changes to many files.

One way to do this is to give git reset a directory. For example, git reset HEAD data will unstage any files from the data directory. Even better, if you don’t provide any files or directories, it will unstage everything. Even even better, HEAD is the default commit to unstage, so you can simply write git reset to unstage everything.

Similarly, git checkout -- data will then restore the files in the data directory to their previous state. You can’t leave the file argument completely blank, but you can refer to the current directory as ... So git checkout -- . will revert all files in the current directory.

How can I remove unwanted files?

Git can help you clean up files that you have told it you don’t want. The command git clean -n will show you a list of files that are in the repository, but whose history Git is not currently tracking. A similar command git clean -f will then delete those files.

Use this command carefully: git clean only works on untracked files, so by definition, their history has not been saved. If you delete them with git clean -f, they’re gone for good.

Exercise

You are entering the command git status in a specific repository and get the following output:

On branch next
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

    new file:   test.h


Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

    modified:   myfile.h

Untracked files:
  (use "git add <file>..." to include in what will be committed)

    myfile1.cpp
    myfile2.cpp

How can I reset HEAD?

Sometimes it can be helpful to reset a single or even multiple commits. This is done by moving the HEAD to a previous commit. Using the command

git reset HEAD~1

resets to the previous commit. To revert 3 commits backwards you can also use

git reset HEAD~3

If you also want to remove all unstaged changes from the directory you can add the option --hard, e.g.

git reset --hard HEAD~1

Be careful with this command since it removes all changes (staged, unstaged, untracked) from your directory to have a clean state.

You can also use hashes to reset to a specific commit. For example, if git log shows this:

commit f54aca90848cfe7c0bb536d6d9f86cb011f5369d (HEAD, origin/master, origin/HEAD, master)
Author: mario <mario.annau@gmail.com>
Date:   Wed Feb 20 00:59:45 2019 +0100

    Deactivate openmp conf

commit ac8dacfb24520bbd74344ac8417232fbc113dff8
Author: pmont <pmontm@gmail.com>
Date:   Sun May 6 15:44:48 2018 +0200

    Chaging package version to easily find the custom version

we could reset to the previous commit either using

git reset HEAD~1

or

git reset ac8dacf

Exercise

The output of git log --format=oneline on branch master of a specific repository is as follows:

f54aca90848cfe7c0bb536d6d9f86cb011f5369d (HEAD) Deactivate openmp conf
ac8dacfb24520bbd74344ac8417232fbc113dff8 Changing package version
6852ad0c2f8b91c2abb92108e62929c78c3827ca Returning to the original name
a1df38ffe4061d76d2f712fe098c918a498add7a Initial commit

What is force push?

In some rare circumstances it might be necessary to do a force push. A force push is conducted using the command git push --force and is able to re-write the git history on the remote. This command is EXTREMELY dangerous and should be handled with care. A force push allows us, for example, to remove git commits from the remote or even re-write the entire git-history from scratch. Force pushing to branches/repostories where you are the only contributer is fine. However, in case you really need to force push, please send an e-mail to other contributers and let them know, that they need to git fetch and git reset origin/branchname. Additionally, you should restrict contributers to force-push at least to the production branch (typically named master).

Exercise

The output of git log --format=oneline on the local branch master of a specific repository looks as follows:

f54aca90848cfe7c0bb536d6d9f86cb011f5369d (HEAD) Deactivate openmp conf
6852ad0c2f8b91c2abb92108e62929c78c3827ca Returning to the original name
a1df38ffe4061d76d2f712fe098c918a498add7a Initial commit

The output of git log --format=oneline on the remote branch origin/master looks as follows:

ac8dacfb24520bbd74344ac8417232fbc113dff8 (origin/master) Changing package version
6852ad0c2f8b91c2abb92108e62929c78c3827ca Returning to the original name
a1df38ffe4061d76d2f712fe098c918a498add7a Initial commit