Thoughts on Linus Torvalds's Git Talk
December 11th, 2007
At Pivotal Labs last week we watched Linus Torvald's Google talk about Git, the Source Code Management (SCM) system he wrote and uses to manage the Linux kernel code.
I've watched it twice now and here are some thoughts, based on quotes and themes from the video.
"I Never Care About Just One File"
Linus stated that one of the reasons Git was wonderful for him is that, as a high level code maintainer, he needs to merge thousands of files at once. In fact, he stated that he never cares about just one file.
Not so for me. As an in-the-trenches developer, my whole life is caring about just one file, over and over again. When I merge, I care about each file because, since I work on small teams and with small codebases, there is a fairly high likelihood that my changes will collide with those from another developer.
"The Repository Must Be Decentralized.... You Must Have a Network of Trust"
Linus made the point that central repositories suck for large projects where the morons must not have commit access -- only the super privileged are allowed to commit code back to the repo. He argues that Git is better because it is a decentralized network of repositories -- there is no central master, only Some Dudes who have repositories. Usually there is Some Dude In Charge, like Linus, and everyone tends to pull code from them. To update the "master" code version, Some Dude In Charge pulls code from the repositories owned by Some Other Wicked Smart Dudes, who have most likely pulled code from Some Other Trusted Dudes (And One Gal), and so on. Thus, rather than limit access to just the hand-selected few, everyone has their own local copy of the repository, and the smart merge from the smart who merge from the smart, resulting in some kind of official or de facto version.
While I like the local copy of the repo idea, Pivotal does not work the way Linus describes... but Pivotal is weird, in a good way. We all have full commit rights. Our network of trust is everyone. The Dude In Charge is named Continuous Integration. CI makes the official versions. CI runs the tests. CI makes sure that the deploy process works. I'm sure that we could coerce Git into working in a centralized-like way, where it merges automatically from the individual developers and runs the builds, but I'm not sure if that would be forcing a square peg into a penguin-shaped hole.
"Some Companies Use Git And Don't Even Know It"
Linus described how developers at some companies use Git on their development machines, committing their changes and merging fellow developer's changes with Git, then pushing those changes to central SVN repos. He rather mocked this, but it actually sounds like a good solution: developers merge, so use the tool that's good at that. CI machines and deploy machines love centralized master repositories, so use that for those jobs.
"It Does Not Matter How Easy It Is To Branch, Only How Easy It Is to Merge"
Well said. I never thought about that before but he is completely right. I could never put my finger on why I never branch in SVN, even though it's practically 'free' and easy to do. Now it's obvious: who cares how easy it is to branch when merging sucks? Git is supposed to make merging incredibly easy because Git is content-aware rather than just file-aware... or something like that. I'll believe it when I see it, but if Git really does make merging highly divergent branches easy then I'll give it a try.
Joe's Take
I'd like to try Git, especially if it makes branching and merging those branches as easy as Linus suggests, but I don't think that Pivotal would get as much benefit out of it as large, distributed open source projects. A 'really big' project might have 10 developers, not thousands, and all must have commit rights. Our network of trust goes like this: if you are here, we trust you; if we don't trust you, you have to leave. And the idea of having to merge directly from my fellow developers sounds like a pain in the ass... why would I want to merge from 3 separate pairs when I can pull code from the central repo and be reasonably sure (thanks to CI) that it is clean and green? Hopefully I'll be able to answer those questions soon by using Git on a project.
I'm
December 11th, 2007 at 08:38 AM
You could go back to the concept of having a CI machine that you physically go to and pull changes from YOUR machine into the repository. I often find that I miss dependencies like adding a gem to geminstaller.yml and end up breaking the build because I can’t test the commit with a clean checkout.
http://www.jamesshore.com/Blog/Continuous-Integration-is-an-Attitude.html
December 11th, 2007 at 10:14 PM
Hi Wes: I kind of miss that! We used to do that at Evant when we used VisualAge for Java and it’s SCM based on the ENVY SCM. We would put out name on the integration list and perform out merge on that machine when it was our turn. After merging, we would run the tests on that machine and nervously watch that machine’s monitor from across the room, cringing when JUnit’s progress bar went from green to red.
December 14th, 2007 at 01:22 PM
Just a couple of thoughts on your post…
I have been using Git for all my development for a few months and I’ve really enjoyed it (coming from Subversion/SVK). I highly recommend you give a it a try.
Regarding the distributed development model. Yes Linus comes across as believing that this is the only way to do dev now. However, you are not forced into this model at all with Git. It is just as easy to setup a bare repository on a central server and have each developer ‘git push’ to this origin repository. You can have a hook script that runs on the central repos which will update a central clone of that repos which you can run you CI tools on.
So now each developer pushes their changes when ready, and they simply do a ‘git pull’ to get all of the changes others have pushed. No peer-to-peer sharing is required by Git, its an option that works for some development models.
Additionally, you can make this repository available to users from anywhere as long as they have simple ssh access to the repository machine. There is a new tool called ‘gitosis’ which manages this simply and cleverly and does not require creating or maintaining separate ssh access credentials for each user. Check it out:
http://scie.nti.st/2007/11/14/hosting-git-repositories-the-easy-and-secure-way
http://eagain.net/gitweb/?p=gitosis.git
Enjoy.
PS - Git merging really IS that good and that easy.
December 14th, 2007 at 09:42 PM
Glenn –
Thanks for the from-the-field report regarding Git; without a doubt, my thoughts are knee-jerk reactions to Linus’ talk. It’s good to hear that, while it’s available and perhaps superior in certain situations, a team can use Git easily without a networked, pull-from-your-peers model, even though Linus threatened to take away my birthday and shoot my dog if I implemented a centralized Git repository.