Commit Granularity

Update: 2014-06-18. Parts of this resource have been moved to their own sections on Best Practices for Teams and Branching.

If you are new to Git it can be difficult to know when should you should create a new branch, and when you should commit your work. The way that you work with a version control system will change as you learn to trust both the system, and yourself. As a novice, I think we are more likely to do the smallest amount of work and then show someone while asking, "Like this?". As we mature, we become more confident in our ability to conceptualize a problem and implement a possible solution. The way we present the solution back to our reviewers changes as well. I remember my days as a math student: early on I had to show all of my steps to get full marks; but as I advanced, it was the final solution, and my reasoning why (not how) which mattered most.

Organizing Your Thoughts

In distributed source control systems, such as Git, branching is inexpensive because the branches can exist within your local repository and do not need to be shared with the central system. The overwhelming majority of experienced developers will recommend you start a new branch for every new idea that you have. Each branch can be worked on individually, and you can leave partially completed ideas in a branch if you just need to make a fix real quick to the live site.

Let's take a look at how this might play out: I've decided to build my web site using Sculpin, a static site generator. I download the starter kit and immediately create a new branch for my own "hacks" to the site. This new branch allows me to more easily make updates to the Sculpin software independently of the changes that I've made. I'm hacking away on the site and realize I'd like to add a new content type. I'm not really sure how (or if) it's a good idea, so I create a third branch to play around in. If I decide I don't want to keep the changes, I can simply return to the main branch of my work, and discard the tinkering. This is where you want to get to.

I've found newcomers to git can get frustrated and stressed out as they try to switch between branches and their work isn't committed (which prevents you from making the switch). I've also found that new developers generally hold one new idea worth of code at a time, but they aren't as likely to explore multiple new code ideas simultaneously. As a result, I recommend novices master moving backwards and forwards on a single time line, rather than moving across time lines.

On one of the early projects I worked on with a distributed version control system, I worked progressively through a few options for a client's web site. Each time I made incremental changes in a single branch, I took a snapshot of my code with both a commit, and a tag. I was able to easily show variations to the client by moving forwards or backwards in the commit history. The variations were fairly minor: colour changes, and the image used in the banner, if I remember correctly.

Branches are not complicated. With time you will absolutely come to love them! But it can be difficult to know what "deserves" a branch if you're just starting out as a team of one. In teams of more than one, it's easy: anything that's worthy of a ticket should have its own branch so that you can make the changes outlined in the ticket. As a team of one, you might not have a ticketing system. If you do, great! Use this as your basis for what gets a new branch. But if you don't have a ticketing system, take the time to work in a master branch (or perhaps a content branch if you are working with a starter kit, such as Sculpin, Jekyll, or some other CMS) and learn how to work with checkout, stash, reset, and revert. Learn to use effective tags and commit messages. It will allow you to be even more effective when you work on larger projects with more complex branching patterns.

Commit Granularity

I've seen two basic approaches to commits: demonstrate the thinking process vs. present the final solution. When I'm programming, I think in increments, therefore I commit in increments, I want to be able to track how I thought through a problem. I do not necessarily want to be able to roll back to every single commit in my time line. This also means that I'm willing to sacrifice the tool git bisect when debugging work as a single commit may not actually bit a coherent point in the code base where the build compiled. HOWEVER, the title of this section is actually about the best-ever PUBLISHED commit, i.e. one that is shared with others. If you were to read my commit messages when I code, you would be able to easily unpack my thinking. Commits might show work in increments as small as 15-30minutes. The commit messages are unlikely to explain "why" I've done something, but the initial commit should contain a docblock of code comments which outline what I'm about to do. In this case, you'll need to read the git diffs to see what I've done.

When I'm working on content, however, I am much more confident, and perhaps more aggressive. I mash things up. I move things around. I don't worry about tracking my changes (perhaps because my progressions would embarrass future me?) and I work until things are presentable. Then, when I feel my work is done, I collect the pages I've created, group them into a logical progression of commits (for example: if I created a new template, I would commit this first) and write detailed commit messages with the highlights of what I've done and why I've done it. Yes, you could still go back and read the git diffs to see exactly what I've done (and you should if you're reviewing my work), but you wouldn't need to read them to get a sense of the changes I made.

I hesitate to call these two approaches a novice vs. advanced approach to version control, but I think it does sort of lean in that direction. Different source control management systems will have different ways of presenting commits in the history of your project. Git is very granular in how it shows you the commit history as a result thinking in tiny commit increments gets messy and frustrating to work with. This is why we say that as you "mature" with Git, you will be more likely to adopt the second approach.

You don't need to give up your tiny commits though. You can use git rebase to combine many little, unpublished, commits into a history that is more like the second version. Work the way you want to work, present the history in the way others want to consume it. (Yes, I hate with a screaming passion that git allows you to re-write history, but it's sloppy enough in how it deals with showing commits that it becomes necessary. It's okay, you don't need to love it either.)

If you have a culture of showing work in progress, you will need to be careful in how you use git rebase. Each time you rebase, you change the hash which represents your commit. This means the ID of each commit will be different, and make it appear completely different from a pointer perspective. There are ways around this, for example, only working in side branches and changing the name of the branch. If you have an on-site team, it's easy to look over a person's shoulder, but if you work in a distributed team, and you need to publish your work to share it, rebasing can make sharing work in progress more challenging.

On the opposite side of this coin, you don't need to forego a granular commit history just because you like to do all of your thinking (and make radical changes) at once. Any single chunk of work can be separated into individual hunks, which can be applied selectively using git add --patch <filename>. This command will walk through your file, line-by-line, and ask you if you would like to include each changed line in the commit you are building.

The Best-Ever Published Commit

This is completely subjective, which makes it difficult to work in teams if everyone is using their own definition of the best-ever published commit.

  1. Contains only related code. No scope creep, no "just fixing white space issues too".
  2. Conforms to coding standards for your project, including in-code documentation.
  3. Is just the right size. Perhaps this is 100 lines of code. Or perhaps it's a mega re-factoring where a function name changed and 1000 lines of code were affected.
  4. Is described in the best-ever commit message (see the next section).

The Best-Ever Commit Message

My friend Joe Shindelar writes the best-ever commit messages. His general rule is "Whatever it takes to make future me not get pissed off at past me for being lazy."

  1. Use a standard format to make it easy to scan logs.
  2. Answer the question: Why is this change necessary?
  3. Describe, at a high level, how the change addresses the issue identified.
  4. Outline the potential side effects the change may have.
  5. Give a summary of the changes made, so that reading the diff of the code confirms the commit message, but reading the diff is not guesswork on what / why something has changed.

Sample Commit Messages

Bad Example: git commit -am "re-wrote entire site in angular.js - it's faster now, I'm sure"

Analysis:

  • By using the -a flag, all files will be committed as part of this commit en masse, and without consideration if they should be included or not.
  • By using the -m flag, the tendency will always be to write only a terse message which does not describe why the change is necessary, and how the change addresses this necessary change.
  • The commit message does not reference a ticket number, so it's impossible to know which issue(s) are now resolved and can be closed in the ticket tracker.

Good Example:

[#321] Stop clipping trainer meta-data on video nodes at small screen size.

- Removes an unnecessary overflow: hidden that was causing some clipping.

Resolves #321

Analysis:

  • Includes the ticket number, in square brackets, at the beginning of the terse commit message, making it easier to read the logs later.
  • Terse description (for the short log view) explains the symptom that was seen by site visitors.
  • Detailed explanation explains the technical implementation which was used to resolve the problem.
  • The final line of the commit message Resolves #321 will be captured by the ticketing system and move the ticket from "open" to "needs review".

References