Home > Ramblings > Today’s downtime

Today’s downtime

March 13th, 2010

Sorry about the downtime, folks.  The server that runs this web site (and many others) experienced about 90 minutes of unavailability this evening.

Basically, Apache was tuned incorrectly, a particularly slow dynamic page was getting a lot of requests, and the number of Apache processes grew to a point where the system ran out of RAM.  In an unfortunate coincidence, that was almost exactly the same time that the MSN, Yahoo, Google, and Cuil crawlers all started hitting the site almost simultaneously.  The backlog of connections grew, the system was swapping like mad, and the load average hit triple digits.  Everything came screeching to a halt.

A few minutes later, when an automated alert tipped me off that something was amiss, I nearly had a heart attack trying to deal with the horribly unresponsive system.  Thankfully, I got a handle on the situation.

My initial fear of a malicious attacker appears to be unfounded.  I’m doing a thorough analysis to be certain.

Thanks to the people who emailed me about the outage.  Automated monitoring tools are in place, but when it comes to things like this, I’d rather get a few redundant messages than have the problem go undiscovered for hours.  If you happen to notice strange downtime in the future, please don’t hesitate to shoot me an email, IM, or text (preferred).

Comments are closed.