blog/content/posts/2020-12-07-git-basics/index.md
Bruno BELANYI 39944ed35d
All checks were successful
ci/woodpecker/push/deploy/2 Pipeline was successful
posts: fix typos
2024-11-08 16:06:59 +00:00

12 KiB

title date draft description tags categories favorite
Git Basics 2020-12-07 18:54:31+0100 false The next step after the basics
git
cli
programming
false

Git is a distributed version control system. Originally written by Linus Torvalds to be used with the development of the Linux kernel, it has now become the go-to way to share work between multiple developers.

In this article I will summarise what I feel to be the next-step basics of git, explaining each notion along the way.

I assume at least passing knowledge of git, and will therefore skip the justifications for using git instead of flinging tarballs at one another. I will also be skipping the explanation for the basic workflow of git add, git commit, and git push. You can consider this guide to be aimed at 3rd year students at EPITA, who have used git for a whole year to submit their project but have not explored some of its more powerful features.

Starting out with branches and references

To me, this is the most essential thing you need to remember when you using git. It is part of what makes it special, and will be used though-out your career.

Why you should use branches in git

What makes git so useful, and so powerful, is the fact that it was conceived from the ground up to operate in a decentralised manner, to accommodate the Linux kernel programming workflow.

That model de facto means that branching must be a lightweight operation, and merging should not be hassle. Indeed, as soon as you start having people work in parallel on a decentralised system, you end up creating "hidden branches": each person's development tree is a branch on its own.

If you try merging branches that do not have any conflict, the operation is basically instantaneous: to take advantage of that fact I encourage you to use branches in your workflow when using git.

Where is my HEAD

The notion of HEAD in git can seem strange. You might first have encountered it when checking out an older commit. git status helpfully tells you that you have been guillotined: HEAD detached at 78f604b.

To make it short, HEAD is a reference pointing to the commit that you are currently working on top of. It usually points to a branch name (e.g: main or master), but can also point to a specific commit (such as in the checkout scenario I just mentioned).

Revisions

Most of the commands I will show you need you to provide them with what git calls a revision. This is usually means a way to specify a commit.

There are multiple ways to specify a revision, you should know at least two of them: refname and describeOutput which loosely correspond to branch names and git tags respectively. Note that @ is a shortcut for referring to HEAD.

You can also specify the sha1 commit hash directly, or relative revisions. A relative revision allows you to select the parent of a specific commit, you can use the following revisions specifiers:

  • ~: select the first-parent commit
  • ^: select the nth-parent commit (useful for merge commits)

You can append numbers to those two specifiers, they differ in how they handle merges. If you are applying them to a merge commit, ~2 will give you the grand-parent of your commit, following the "first parent", whereas ^2 will give you the "second parent" of your commit.

History manipulation

Once you start using git for non-trivial projects, using some of the practices that I aim to teach you, rewriting history will become your secret weapon for productivity.

I have to insist on one point though, which is that re-writing history that was published and used by other people is often seen as a faux-pas, or worse! You should only use it on private branches, making sure to never rewrite published history unless absolutely necessary.

Picking cherries

The easiest way to manipulate history is the cherry-pick command. It allows you to "lift" a commit any other place in history, and plop it down in your current branch.

It's the easiest way to manipulate history, allowing you for example to pick a commit which fixes a bug in another branch and apply it onto yours: simply do git cherry-pick <my-commit-with-the-bugfix>.

It is however most likely not what you want to do if you later intend to merge your branch with the one you lifted the commit from. Both sets of commits will have the exact same change, and git will not be able to resolve the conflict. In those cases, consider merging from a common branch whose purpose is applying the fix. In that case, git will happily merge your branches later on without making a fuss.

All your rebase are belong to us

This is probably the single best command in all of git in my mind. Having the access to git rebase allows you to commit as you work, without caring about atomicity, commit messages, or even having working/compiling code.

Rebasing allows you to make various changes to your branch's history:

  • Rewording a commit's message.
  • Reordering commits
  • Removing commits
  • Squashing: merging a commit into another one

This tool allows you to work on your own, commit early and commit often as you work on your changes, and keep a clean result before merging back into the main branch.

Fixup, a practical example

A specific kind of squashing which I use frequently is the notion of fixups. Say you've committed a change (A), and later on notice that it is missing a part of the changeset. You can decide to commit that missing part (A-bis) and annotate it to mean that it is linked to A.

Let's say you have this history:

42sh$ git log --oneline
* 787dd36 (HEAD -> master) Add README
* 8d08529 Add baz
* 7188fb1 Frobulate bar
* 961d8fb Fix foo

And notice that missed a change that belongs to Add baz. You can add it to your staged changes, and issue commit --fixup @~. This will create a commit named fixup! Add baz.

42sh$ git log --oneline
* 92912ee (HEAD -> master) fixup! Add baz
* 787dd36 Add README
* 8d08529 Add baz
* 7188fb1 Frobulate bar
* 961d8fb Fix foo

If you then rebase using -i --autosquash will result in this interactive rebase screen.

pick 961d8fb Fix foo
pick 7188fb1 Frobulate bar
pick 8d08529 Add baz
fixup 92912ee fixup! Add baz
pick 787dd36 Add README

After applying the rebase, you find yourself with the complete change inside Add baz, which can be confirmed with another git log

* 0174e54 (HEAD -> master) Add README
* b0a47ae Add baz
* 7188fb1 Frobulate bar
* 961d8fb Fix foo

This is especially useful when you want to apply suggestion on a merge request after it was reviewed. You can keep a clean history without those pesky Apply suggestion ... commits being part of your history.

Lost commits and the reflog

When doing this kind of history manipulation, you might end up making a mistake and lose a commit that was very important.

Obviously, git has a way to save us in this situation. If we look at the man page for git reflog, we can read the following sentence:

Reference logs, or "reflogs", record when the tips of branches and other
references were updated in the local repository.

What does this mean exactly? Simply put, you can use it to checkout a previous version of your repository, in the state it was in before you manipulated the history. Let's illustrate with a small example.

Mapping lost commits: a practical example

Let's say you have this repository state at the beginning.

42sh$ git log --oneline
* 524de22 (HEAD -> master) Documentation update
* d60ddb5 USELESS COMMIT
* e81b5fb Remove baz dependency
* 44cea7d VERY IMPORTANT COMMIT
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation

And decide to drop c581d4d (USELESS COMMIT), but inadvertently drop 377921c (VERY IMPORTANT COMMIT) at the same time. For this example, I simply dropped both commits in a rebase operation.

I notice now that I am missing my VERY IMPORTANT COMMIT in my history:

42sh$ git log --oneline
* ec8508b (HEAD -> master) Documentation update
* 3866067 Remove baz dependency
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation

If I now use try to see what happened to my HEAD reference using reflog, I can find the last update I did before starting my rebase to cancel the whole operation.

42sh$ git reflog
ec8508b (HEAD -> master) HEAD@{0}: rebase (finish): returning to refs/heads/master
ec8508b (HEAD -> master) HEAD@{1}: rebase (pick): Documentation update
3866067 HEAD@{2}: rebase (pick): Remove baz dependency
58eb2d9 HEAD@{3}: rebase: fast-forward
dab7792 HEAD@{4}: rebase: fast-forward
612e6f5 HEAD@{5}: rebase (start): checkout 612e6f5a055280aac1d7608af2dd2443aed6875c
524de22 HEAD@{6}: commit: Documentation update
d60ddb5 HEAD@{7}: commit: USELESS COMMIT
e81b5fb HEAD@{8}: commit: Remove baz dependency
44cea7d HEAD@{9}: commit: VERY IMPORTANT COMMIT
58eb2d9 HEAD@{10}: commit: Use foo without bar
dab7792 HEAD@{11}: commit (initial): Simplify frobulation

By reading the reflog, I can see that my rebase started at HEAD@{5} (reads: HEAD's fifth prior value). If I want to return to the state of my repository before starting that rebase, I can simply do git checkout HEAD@6 which will take me back to the state prior to the rebase.

42sh$ git checkout HEAD@{6} # Checkout my `HEAD`'s 6th prior value
42sh$ git log --oneline # Are we back before the rebase?
* 524de22 (HEAD) Documentation update
* d60ddb5 USELESS COMMIT
* e81b5fb Remove baz dependency
* 44cea7d VERY IMPORTANT COMMIT
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation

Now, I want to make sure that I have my master branch back to that state too, and not simply my disembodied HEAD.

42sh$ git branch -f master # Change where `master` is pointing at
42sh$ git checkout master # Checkout `master` branch
42sh$ git log --oneline # Is everything in order?
* 524de22 (HEAD -> master) Documentation update
* d60ddb5 USELESS COMMIT
* e81b5fb Remove baz dependency
* 44cea7d VERY IMPORTANT COMMIT
* 58eb2d9 Use foo without bar
* dab7792 Simplify frobulation

And voila! I can now try my rebase again, and be careful not to lose VERY IMPORTANT COMMIT this time.

Tips and tricks

Here are some basic pieces of knowledge which don't really belong to any other section, which I think needs to be said.

The importance of small commits

You might have noticed that people keep saying that commits should be kept atomic. What does that mean and why should it matter?

Keeping commits atomic means that you should strive to commit your changes in the smallest unit of work possible. Instead of making one commit named WIP: add stuff at the end of the day, you should instead try to cut your work up into small units: add tests for frobulator, account for foo in bar processing, etc...

This way of working has multiple things going for it once you start taking advantage of git's power: you can more easily reason about a line of code by using blame, you can more easily squash bugs using revert, you can more easily review the changes in an MR and keep its scope narrow.

One very useful command you can add to your tool belt is git add -p, which prompts you interactively for each patch in your working directory : you can easily choose which parts of your changes should end up in the same commit.

Miscellaneous commands

Here's a list of commands that you should read-up on, but I won't be presenting further:

  • git bisect
  • git rerere
  • git stash
  • and more...

Going further

I advise you to check out Learn git branching to practice a few of the notions I just wrote about, with a nice visualization of the commit graph to explain what you are doing along the way.

Furthermore, the Pro Git book is available online for free, and contains a lot of great content. You can read it whole, but I especially recommend checking out chapter 7 (Git Tools) and chapter 8 (Git Configuration). If you want to learn about the inner workings of git and how it stores the repository on your hard-drive, checkout chapter 10 (Git Internals).