Linux Today: Linux News On Internet Time.

Expecting to Fail

Sep 26, 2009, 08:04 (0 Talkback[s])
(Other stories by Jeremy Zawodny)

"A few days ago, after the latest in a seemingly never ending string of problems that interrupted connectivity between two of our data centers, coworker #1 said something like "why can't we have a network that just works?!" The exasperation in his voice is something we all felt to some degree or another. Moments later coworker #2 piped up and said "if our network was perfect, the software you write wouldn't be nearly as robust."

"There's a lot of truth to that. I find myself writing software a bit differently now than I did seven or eight years ago, even though I was working on high-traffic web sites then and still am now. The single biggest difference is that I try to expect everything to fail. Everything.

"I never really thought about this as a design philosophy or how it affects the way I approach things, but that comment last week made me realize it was something worth talking about. The more I've thought about it, there are seven distinct issues I find myself dealing with over and over when it comes to embracing failure: redundancy, locality, caching, timeouts, logging, and monitoring."

Complete Story

Related Stories: