The Mythical Linux MonthJan 22, 2001, 17:22 (61 Talkback[s])
(Other stories by Jeff Darcy)
Linux 2.2 became available in January of 1999, after being in development for over two years. Many at the time expressed dismay at how long this release had taken and hope that the 2.4 cycle would be shorter. In early June of 1999 Linus Torvalds himself said he was trying to get 2.4 ready for "this fall". Fall of 1999 came and went, followed by all of 2000. In January of 2001, after nearly a year of hearing that being told that 2.4 was just around the corner, the long-delayed release finally occurred. What went wrong?
To some extent, there are two possible answers to that question. There's a mostly technical question of why the project took so long, and I could write a great many all-but-incomprehensible pages on that subject...but I won't. Instead, I'll try to focus on a less technical project-management question of why nobody realized it would take so long. My point is not that it could have been done more quickly, but that we (collectively) should have known it would not. The observable facts never supported the more optimistic time estimates. In the next two sections I'll talk a little about some of the reasons for this, and I'll wind up with a few suggestions about how to do better next time.
The first thing to realize is that the people working on, or, more precisely, in Linux are mostly not doing so on a full-time paid basis. Yes, there is a growing cadre of professional Linux hackers, but they're still greatly outnumbered by the people hacking Linux in their free time, and this has important project-management implications.
For one thing, free time is generally part time. Free time is also unpredictable. It tends to come and go because of day-job or family commitments, other hobbies, other projects, energy levels, and so on. Free-timers don't usually have a lot of equipment, what they have they must maintain themselves, and in general they lack the sort of "infrastructure" that exists within a company. All of these further diminish the amount of time they can spend working full-speed on a project.
In addition to the productivity issues affecting free-timers, there's a turnover issue. People who are doing something on their free time tend to just disappear sometimes, because another project caught their interest or because of personality conflicts or because they're just tired or disinterested. Certainly these things happen in the commercial world, too, but even the most battle-scarred dot-com HR director would find defection rates in open source alarming. In the open source world, you can't talk to someone's boss to prevent a transfer, you can't offer someone more salary or stock options, you can't do anything. The person leaving doesn't have to go through a lengthy interview process before they give notice; in fact, no notice at all is the norm. As soon as someone wants to leave, or as soon as they no longer actively want to stay, they're just gone.
It might seem that on a large project one could just assume lower per-day productivity and higher turnover rates, and let statistical probabilities take care of the rest. Unfortunately, big projects like an operating system kernel are full of dependencies, so the uncertainty introduced by the factors mentioned above has a more than linear effect on the project as a whole. Sooner or later the person who leaves in a huff or falls into a slump will be the one who's on the critical path and your whole optimistic schedule will be at risk. On a project of this size, with this number of participants, with these levels of variability and uncertainty, you have to assume that you'll suffer at least a few such setbacks, and include that in your schedule estimates.
To start with, the people working on 2.4 didn't know what 2.4 would be. There were some feature lists and such passed around, but nothing like the sort of detailed requirements specification that would be expected in a commercial environment. Even today the 2.4 feature set is undecided. Not long ago Linus was threatening to leave out substantial virtual-memory enhancements, and yet new features in other areas are still being actively considered for inclusion.
The next problem was in the design phase. There wasn't one, at least not anything that would be taken seriously in the commercial sector. Without a rigorous design phase the people working on critical 2.4 subprojects literally didn't know what they were getting into, so how could they possibly have predicted how long it would take?
In the active-development phase, the near-total lack of basic programmer discipline made it even harder to guess how far from completion things were. The number of bugs found in unit tests is probably the best predictor of how many bugs will be found in later stages, but few if any of the 2.4 developers seem to believe in performing unit tests, or regression tests for that matter, and bug tracking in Linux can most charitably be described as informal. Data-corruption bugs that should probably have been nailed in unit test continue to be found. In one case, a single data-corruption bug was reported to be fixed about a half-dozen times before it really was fixed...or was it? Without regression tests, who knows?
Here are some things I think can be done to ensure that Linux doesn't earn another "Vaporware of the Year" award. It's not entirely coincidence that, in addition to improving the accuracy of schedule predictions, many of these things will also help improve the speed and quality of Linux kernel development.
It's inevitable that any attempt to add rigor to the Linux development process will cause some people to leave. I don't think that's a bad thing, and I don't mean that in a callous flippant "good riddance" kind of way. Those people will be missed, but maybe they'll go on to do great things in their own sandboxes. Just as Linux is already displacing older operating systems, Linux itself will be displaced as well some day, and whatever comes next will be replaced too. Linux is not a green little shoot any more, it's a mature tree that requires a different kind of care. It's time to move it out of the nursery.