As you may recall from last week, we were starting to troubleshoot
is strange problem with failing or dropped connections between our
Sphinx servers and MySQL slaves. We were in the midst of suspecting
the network and pick up the story there.
At this point I should note that we weren’t really running
any production services at this new data center. This was all part
of the process of getting it ready to do so. It wasn’t that
unreasonable to think that something might be misconfigured. And,
again, even though it tested well, we were also essentially on a
new hardware platform and a new operating system release.
"We tried moving servers around on the network: move one MySQL
box and one Sphinx box from this set of ports to that set of ports
to see if it’s a bad linecard in our switch. No dice. The
problem was still there. We measured network performance between
the hosts and things seemed okay there too."