---

Expecting to Fail

“A few days ago, after the latest in a seemingly never ending
string of problems that interrupted connectivity between two of our
data centers, coworker #1 said something like “why can’t we have a
network that just works?!” The exasperation in his voice is
something we all felt to some degree or another. Moments later
coworker #2 piped up and said “if our network was perfect, the
software you write wouldn’t be nearly as robust.”

“There’s a lot of truth to that. I find myself writing software
a bit differently now than I did seven or eight years ago, even
though I was working on high-traffic web sites then and still am
now. The single biggest difference is that I try to expect
everything to fail. Everything.

“I never really thought about this as a design philosophy or how
it affects the way I approach things, but that comment last week
made me realize it was something worth talking about. The more I’ve
thought about it, there are seven distinct issues I find myself
dealing with over and over when it comes to embracing failure:
redundancy, locality, caching, timeouts, logging, and
monitoring.”

Complete
Story

Get the Free Newsletter!

Subscribe to Developer Insider for top news, trends, & analysis