Linux 2.2 became available in January of 1999, after being
in development for over two years. Many at the time expressed
dismay at how long this release had taken and hope that the 2.4
cycle would be shorter. In early June of 1999 Linus Torvalds
himself said he was trying to get 2.4 ready for “this fall”. Fall
of 1999 came and went, followed by all of 2000. In January of 2001,
after nearly a year of hearing that being told that 2.4 was just
around the corner, the long-delayed release finally occurred. What
went wrong?
To some extent, there are two possible answers to that question.
There’s a mostly technical question of why the project took so
long, and I could write a great many all-but-incomprehensible pages
on that subject…but I won’t. Instead, I’ll try to focus on a less
technical project-management question of why nobody
realized it would take so long. My point is not that it
could have been done more quickly, but that we (collectively)
should have known it would not. The observable facts never
supported the more optimistic time estimates. In the next two
sections I’ll talk a little about some of the reasons for this, and
I’ll wind up with a few suggestions about how to do better next
time.
People
The first thing to realize is that the people working on, or,
more precisely, in Linux are mostly not doing so on a full-time
paid basis. Yes, there is a growing cadre of professional Linux
hackers, but they’re still greatly outnumbered by the people
hacking Linux in their free time, and this has important
project-management implications.
For one thing, free time is generally part time. Free time is
also unpredictable. It tends to come and go because of day-job or
family commitments, other hobbies, other projects, energy levels,
and so on. Free-timers don’t usually have a lot of equipment, what
they have they must maintain themselves, and in general they lack
the sort of “infrastructure” that exists within a company. All of
these further diminish the amount of time they can spend working
full-speed on a project.
In addition to the productivity issues affecting free-timers,
there’s a turnover issue. People who are doing something on their
free time tend to just disappear sometimes, because another project
caught their interest or because of personality conflicts or
because they’re just tired or disinterested. Certainly these things
happen in the commercial world, too, but even the most
battle-scarred dot-com HR director would find defection rates in
open source alarming. In the open source world, you can’t talk to
someone’s boss to prevent a transfer, you can’t offer someone more
salary or stock options, you can’t do anything. The person leaving
doesn’t have to go through a lengthy interview process before they
give notice; in fact, no notice at all is the norm. As soon as
someone wants to leave, or as soon as they no longer actively want
to stay, they’re just gone.
It might seem that on a large project one could just assume
lower per-day productivity and higher turnover rates, and let
statistical probabilities take care of the rest. Unfortunately, big
projects like an operating system kernel are full of dependencies,
so the uncertainty introduced by the factors mentioned above has a
more than linear effect on the project as a whole. Sooner or later
the person who leaves in a huff or falls into a slump will be the
one who’s on the critical path and your whole optimistic schedule
will be at risk. On a project of this size, with this number of
participants, with these levels of variability and uncertainty, you
have to assume that you’ll suffer at least a few such setbacks, and
include that in your schedule estimates.
Process
To start with, the people working on 2.4 didn’t know what 2.4
would be. There were some feature lists and such passed around, but
nothing like the sort of detailed requirements specification that
would be expected in a commercial environment. Even today the 2.4
feature set is undecided. Not long ago Linus was threatening to
leave out substantial virtual-memory enhancements, and yet new
features in other areas are still being actively considered for
inclusion.
The next problem was in the design phase. There wasn’t one, at
least not anything that would be taken seriously in the commercial
sector. Without a rigorous design phase the people working on
critical 2.4 subprojects literally didn’t know what they were
getting into, so how could they possibly have predicted how long it
would take?
In the active-development phase, the near-total lack of basic
programmer discipline made it even harder to guess how far from
completion things were. The number of bugs found in unit tests is
probably the best predictor of how many bugs will be found in later
stages, but few if any of the 2.4 developers seem to believe in
performing unit tests, or regression tests for that matter, and bug
tracking in Linux can most charitably be described as informal.
Data-corruption bugs that should probably have been nailed in unit
test continue to be found. In one case, a single data-corruption
bug was reported to be fixed about a half-dozen times before it
really was fixed…or was it? Without regression tests, who
knows?
Suggestions
Here are some things I think can be done to ensure that Linux
doesn’t earn another “Vaporware of the Year” award. It’s not
entirely coincidence that, in addition to improving the accuracy of
schedule predictions, many of these things will also help improve
the speed and quality of Linux kernel development.
- Be realistic about who and what you have to work with. Some of
the people on your project will have less than god-like technical
skills. Some will have the skills but be so disruptive you’ll wish
they didn’t. Defections, distractions and slumps will affect
practically everyone. Accept it, and factor it into your
predictions. - Create a detailed feature list at the beginning and stick
to it. Don’t be afraid to seem inflexible. Don’t worry about
leaving stuff out, either; there will be other releases. - Require that major subprojects go through a decent design phase
before you even think of giving out any completion dates. Make sure
people know what level of detail is expected in a design spec –
examples would be great – and when specs are due. There’s nothing
wrong with hacking and experimentation in their place, but if a
design can’t be done on time it probably means either the problem
or the solution is not well understood, and there’s no shame in
taking the time to do something right. It sure beats having to redo
it later. Both the current release and the next one will be
improved if half-baked ideas are deferred for further
research/experimentation in parallel with ongoing development. - Be conservative. Don’t just give a 51%-confidence estimate,
based on an assumption that there will be no rough spots. Assume
that Murphy’s Law is being strongly enforced on your project, and
then give a 90%-confidence (or better) estimate based on that
assumption. - Encourage a culture of good software engineering. The familiar
model of specification, coding, testing, debugging, more testing,
etc. has stood the test of time. It benefits nobody so much as the
developers themselves, and there’s nothing about it that’s specific
to commercial programming; it all applies equally well to open
source. People who are or have been full-time paid programmers, and
particularly those who are now full-time paid Linux programmers,
have no excuse for being sloppy or lazy in their Linux
work. Does Linux deserve less of you than a commercial product
would?
It’s inevitable that any attempt to add rigor to the Linux
development process will cause some people to leave. I don’t think
that’s a bad thing, and I don’t mean that in a callous flippant
“good riddance” kind of way. Those people will be missed, but maybe
they’ll go on to do great things in their own sandboxes. Just as
Linux is already displacing older operating systems, Linux itself
will be displaced as well some day, and whatever comes next will be
replaced too. Linux is not a green little shoot any more, it’s a
mature tree that requires a different kind of care. It’s time to
move it out of the nursery.