Linux Today: Linux News On Internet Time.

More on LinuxToday

Community: Testing Linux in the Real World

Sep 07, 2000, 18:29 (0 Talkback[s])
(Other stories by John Telford)


Desktop-as-a-Service Designed for Any Cloud ? Nutanix Frame

By John Telford

I'm delighted seeing efforts such as the Linux Test Project and VA Cerberus Test Control System focusing on testing Linux.

Several years ago I began working with an early Unix symmetric multiprocessing (SMP) vendor. My marching orders included creating a team responsible for testing the porting of BSD 4.2 Unix.


The first order of business was gaining the trust of the world-class Unix gurus doing the porting. Some of them were still reeling from bad experiences with traditional Quality Assurance (QA) groups. Instead of having the porting team throw chunks of code over the wall and the testing team throwing it back saying its dead beef, members of the porting and testing teams work together.

The testing team is the loyal opposition. Their obsession is creating successful test cases showing the Unix port isn't work as advertised. Once that becomes very difficult, we all win.

When a successful test case surfaces a defect in the Unix port, a member of the porting and testing teams get together and resolve the problem. The first issue resolved is insuring the test case is working as advertised. Once they agree it's sane, they work together finding the defect in the Unix port.

Successful Test Case

A successful test case shows something isn't working. It's far more valuable than one showing something is working. The time when creating successful test cases becomes difficult, is when whatever we're creating just may be ready for the real world.

One way of going about crafting test cases is start by showing there's life at the "center" of the functional range of whatever we're testing. It soon becomes difficult creating successful test cases at this level of testing.

The next test cases aims at just inside the functional boundaries. It's less difficult creating test successful at this level of testing than in the "center."

The easiest testing level for creating a successful test case is aiming outside the functional boundaries.

These simple concepts aren't simple to manifest. They're counter-intuitive. Surfacing defects in one's creation just isn't in the mindset of most developers, test creators, or their managers.

Testing SMP

Testing SMP requires something more than running a single instance of test cases. The idea is to surface subtle timing problems, lock defects, etc. Seldom does a single instance do it.

One way of doing it is running multiple instances of several test cases. Vary the number of instances from low to beyond saturation, and randomly vary the timing between each instance. This simple process generates something like an ocean tide.

The testing tide begins gently when it comes in and works up to giving the platform under test a pounding at high tide. It lets up when the tide goes back out. Varying the number and timing of test cases insures the testing tide is seldom the same. This simple mechanism is remarkable. It sometimes surfaces deeply rooted defects. There is a down side. It's virtually impossible to repeat the exact testing sequences leading up to a defect surfacing.

This testing philosophy works for non-SMP too.

Platform Testing

The Unix porting and testing teams enjoyed success using this testing philosophy. Creating successful test cases was becoming difficult.

Meanwhile manufacturing was busy assembling parts and sparking life into platforms. They were looking for ways of burning-in, stressing, and accelerating infant mortality among their creations. They saw how well testing was going with the Unix porting and testing teams and wondered if it could be adapted to manufacturing needs.

The testing team stepped up to the challenge by expanding its scope to creating test suites for manufacturing.

The core testing philosophy remains the same. The added twist is creating tides and cross currents that flow through every platform data path and orifice.

Once again the simple mechanisms of varying the number and timing of test cases successfully surfaces deeply rooted defects. Now the defects seeing the light of day are in processors and controller chips, as well as in the kernel, device drivers, and utilities.

Sometimes deeply rooted defects didn't surface until a day or so after testing started. The testing tides had to be just right, sort of like the phase of the moon.

Manufacturing liked the test suites, except for the downside of being unable to exactly repeat the testing sequence leading up to a defect surfacing.

Bottom Line

The testing efforts I'm describing helped insure we could be proud of the quality of product shipping. About the best complement I heard from a customer was she knew how to reboot all the other Unix boxes except ours because it just didn't go down.

Maybe the Linux testing efforts and Linux can benefit from our experiences.