dcsimg
Linux Today: Linux News On Internet Time.




More on LinuxToday


The Mythical Linux Month

Jan 22, 2001, 17:22 (61 Talkback[s])
(Other stories by Jeff Darcy)

WEBINAR:
On-Demand

Re-Imagining Linux Platforms to Meet the Needs of Cloud Service Providers


Linux 2.2 became available in January of 1999, after being in development for over two years. Many at the time expressed dismay at how long this release had taken and hope that the 2.4 cycle would be shorter. In early June of 1999 Linus Torvalds himself said he was trying to get 2.4 ready for "this fall". Fall of 1999 came and went, followed by all of 2000. In January of 2001, after nearly a year of hearing that being told that 2.4 was just around the corner, the long-delayed release finally occurred. What went wrong?

To some extent, there are two possible answers to that question. There's a mostly technical question of why the project took so long, and I could write a great many all-but-incomprehensible pages on that subject...but I won't. Instead, I'll try to focus on a less technical project-management question of why nobody realized it would take so long. My point is not that it could have been done more quickly, but that we (collectively) should have known it would not. The observable facts never supported the more optimistic time estimates. In the next two sections I'll talk a little about some of the reasons for this, and I'll wind up with a few suggestions about how to do better next time.

People

The first thing to realize is that the people working on, or, more precisely, in Linux are mostly not doing so on a full-time paid basis. Yes, there is a growing cadre of professional Linux hackers, but they're still greatly outnumbered by the people hacking Linux in their free time, and this has important project-management implications.

For one thing, free time is generally part time. Free time is also unpredictable. It tends to come and go because of day-job or family commitments, other hobbies, other projects, energy levels, and so on. Free-timers don't usually have a lot of equipment, what they have they must maintain themselves, and in general they lack the sort of "infrastructure" that exists within a company. All of these further diminish the amount of time they can spend working full-speed on a project.

In addition to the productivity issues affecting free-timers, there's a turnover issue. People who are doing something on their free time tend to just disappear sometimes, because another project caught their interest or because of personality conflicts or because they're just tired or disinterested. Certainly these things happen in the commercial world, too, but even the most battle-scarred dot-com HR director would find defection rates in open source alarming. In the open source world, you can't talk to someone's boss to prevent a transfer, you can't offer someone more salary or stock options, you can't do anything. The person leaving doesn't have to go through a lengthy interview process before they give notice; in fact, no notice at all is the norm. As soon as someone wants to leave, or as soon as they no longer actively want to stay, they're just gone.

It might seem that on a large project one could just assume lower per-day productivity and higher turnover rates, and let statistical probabilities take care of the rest. Unfortunately, big projects like an operating system kernel are full of dependencies, so the uncertainty introduced by the factors mentioned above has a more than linear effect on the project as a whole. Sooner or later the person who leaves in a huff or falls into a slump will be the one who's on the critical path and your whole optimistic schedule will be at risk. On a project of this size, with this number of participants, with these levels of variability and uncertainty, you have to assume that you'll suffer at least a few such setbacks, and include that in your schedule estimates.

Process

To start with, the people working on 2.4 didn't know what 2.4 would be. There were some feature lists and such passed around, but nothing like the sort of detailed requirements specification that would be expected in a commercial environment. Even today the 2.4 feature set is undecided. Not long ago Linus was threatening to leave out substantial virtual-memory enhancements, and yet new features in other areas are still being actively considered for inclusion.

The next problem was in the design phase. There wasn't one, at least not anything that would be taken seriously in the commercial sector. Without a rigorous design phase the people working on critical 2.4 subprojects literally didn't know what they were getting into, so how could they possibly have predicted how long it would take?

In the active-development phase, the near-total lack of basic programmer discipline made it even harder to guess how far from completion things were. The number of bugs found in unit tests is probably the best predictor of how many bugs will be found in later stages, but few if any of the 2.4 developers seem to believe in performing unit tests, or regression tests for that matter, and bug tracking in Linux can most charitably be described as informal. Data-corruption bugs that should probably have been nailed in unit test continue to be found. In one case, a single data-corruption bug was reported to be fixed about a half-dozen times before it really was fixed...or was it? Without regression tests, who knows?

Suggestions

Here are some things I think can be done to ensure that Linux doesn't earn another "Vaporware of the Year" award. It's not entirely coincidence that, in addition to improving the accuracy of schedule predictions, many of these things will also help improve the speed and quality of Linux kernel development.

  1. Be realistic about who and what you have to work with. Some of the people on your project will have less than god-like technical skills. Some will have the skills but be so disruptive you'll wish they didn't. Defections, distractions and slumps will affect practically everyone. Accept it, and factor it into your predictions.
  2. Create a detailed feature list at the beginning and stick to it. Don't be afraid to seem inflexible. Don't worry about leaving stuff out, either; there will be other releases.
  3. Require that major subprojects go through a decent design phase before you even think of giving out any completion dates. Make sure people know what level of detail is expected in a design spec - examples would be great - and when specs are due. There's nothing wrong with hacking and experimentation in their place, but if a design can't be done on time it probably means either the problem or the solution is not well understood, and there's no shame in taking the time to do something right. It sure beats having to redo it later. Both the current release and the next one will be improved if half-baked ideas are deferred for further research/experimentation in parallel with ongoing development.
  4. Be conservative. Don't just give a 51%-confidence estimate, based on an assumption that there will be no rough spots. Assume that Murphy's Law is being strongly enforced on your project, and then give a 90%-confidence (or better) estimate based on that assumption.
  5. Encourage a culture of good software engineering. The familiar model of specification, coding, testing, debugging, more testing, etc. has stood the test of time. It benefits nobody so much as the developers themselves, and there's nothing about it that's specific to commercial programming; it all applies equally well to open source. People who are or have been full-time paid programmers, and particularly those who are now full-time paid Linux programmers, have no excuse for being sloppy or lazy in their Linux work. Does Linux deserve less of you than a commercial product would?

It's inevitable that any attempt to add rigor to the Linux development process will cause some people to leave. I don't think that's a bad thing, and I don't mean that in a callous flippant "good riddance" kind of way. Those people will be missed, but maybe they'll go on to do great things in their own sandboxes. Just as Linux is already displacing older operating systems, Linux itself will be displaced as well some day, and whatever comes next will be replaced too. Linux is not a green little shoot any more, it's a mature tree that requires a different kind of care. It's time to move it out of the nursery.