Git Commit Creation
This is an article in which I explore the details and thinking that goes into how you should create git commits, and why. I like to think of it as the article I wish existed when I was just starting out over 20 years ago.
I wanted to cover all the things that you should think about at a high level. That way it at least could work as an entry point to deeper exploration of the particular areas if the reader isn’t completely sold or they want to just gain a deeper understanding. While at the same time trying to provide enough details to show why and how these choices are valuable. This is always a tricky balance.
Anyways, I would love any feedback on thoughts on how this could be improved.
Thanks
So this bit confuses me. The article says in the intent and scope section that the entire process of bug fixing, in the included example, is literal bug fixing, clean up toggle, correct lints, correct duplication. That point to linting issues.
The earlier section says that a commit should be ‘buildable’ and ‘testable’. So if there are linting issues, the commit won’t satisfy this criteria right?
What am I missing here?
In the example I provide the project that the PR was made for doesn’t have the linting passing as a build requirement. But that is irrelevant to the point I am trying to make which is to split things out base on those singular intents. Do you think that point was clear?
Maybe I should change the example so that it isn’t based on linting which can be part of the build requirements but doesn’t have to be.
Aah. I assumed linting was part of the build also. My bad. I did understand the idea you were mentioning. Just that assumptions kind of threw me off.
I wanted to ask something related to that. As you mentioned, git takes a snapshot of the repo on every commit. So splitting up the bug fix and other activities means you have 3 or 4 commits instead of one. Let us say we are dealing with a very large repo. This does not look ideal in that context right? So do you think the way you proposed is only suitable for smaller repos?
Actually, having more commits is negligible because of the way that Git stores the snapshots behind the scenes. Specifically, it uses a content addressable key value store. So the storage is bound to the file changes irrespective of the commits.
The commits simply hold the sha of each of the files. Technically, it is a bit more complicated than that. But from an understanding of size implications and what it is bound to that mental model should get you there. It also does additional smart things in packing this key value store to store things more efficiently that also help.
If you want to start understanding more about the internals of kid and how it actually stores stuff. The Pro Git book has a Git Internals section, https://git-scm.com/book/en/v2/Git-Internals-Plumbing-and-Porcelain which is a great place to start.
Good analysis, there are a few things that I think area bit opinionated and there is nothing wrong with that, I just don’t agree with a few things out of context. For example I agree that code on main should be buildable and testable. Code in your own branch should be for yourself and should still have commits. Also lazygit really abstracts a huge chunk of git logic while making it easier to understand.
What is important to me is that when you open a pull request the commits in that pull request, the things you are requesting to be brought into the mainline, are in a good and final form. Meaning they are actually ready to be brought into mainline. That means each of them need to be logically chunked, clearly define and communicate the intent, etc. as I outlined in the article.
Now, prior to that moment in time where you open the pull request. You can have your commits look like whatever horrible mess you want locally. Including if they are in a branch that you want locally that you aren’t going to integrate into mainline or open a pull request for.
To facilitate this refinement of commits prior to this moment of opening a pull request. I personally use interactive rebase a lot as well as use https://git-ps.sh/ to facilitate a patch stack based workflow locally.
I have found that organizing my commits according to the principles outlined in the article is extremely valuable even locally. It seems like most devs aren’t comfortable enough with git interactive rebase, etc. that facilitate refining commits as you go along. This is simply a skill issue that is well worth learning.
If you take the stance in which you say it is my PR. Whatever commits are in there are whatever commits you created, with no logical separation, no clear commit descriptions explaining the intent of each of the changes you made. You then lose the benefits outlined in the article. Tools like git bisect won’t work properly, reverting commits sucks, etc.
If you say, well what if we use GitHub to require that PR be integrated using a Squash & Merge? You still end up losing almost all of the benefits.
So I think the dividing line needs to be that if you are requesting something to be integrated into mainline, or you are going to integrate something into mainline, you should follow the principles in the article. Otherwise, you are just taking the stance that you don’t care about the values outlined in the article.
Which is a stance you can take. But those values being present or not based on your position is not opinion. That is fact, as it is provable by you simply testing out the things. So be careful not to fall into the trap of being like, “Well that is your opinion.”, and missing out on the fact that by having a different opinion you are factually losing out on the benefits.
Yeah a pull request should be formatted corectly, build, documented, pass all tests and have changes covered in tests.
Your own branch? Do whatever you want, will get squashed anyway in the pull request.