About an hour ago as of this writing, one of CrankWheel’s EC2 instances spontaneously stopped responding to anything - it wasn’t serving requests and I couldn’t SSH into it.

Looking at the AWS EC2 console, I could see that in the Status Check column, instead of saying X/X passed, it said “Insufficient info”. Digging a bit deeper, I found that AWS admitted that the “System Reachability Check Failed”.

The funny thing was that the Personalized AWS health dashboard showed no sign of any issues on the CrankWheel hosting account, even though one of the supposedly running EC2 instances could not be reached by their own status monitoring stuff.

After looking at a few different possibilities, I decided to try rebooting the instance to see if that would fix it, which it did not. A bit of googling revealed lots of different possibilities, and that’s why I decided to write this quick post, since what should have been the obvious path for a spontaneous failure wasn’t too obvious from the search results.

A spontaneous failure, one with no apparent outside cause like a software upgrade, is quite likely to be some kind of hardware issue with the underlying EC2 metal, in fact a bit after solving today’s issue, I recalled that this had happened once before a few years ago in a different production system and the solution was the same as below.

The solution for me was as follows:

  1. Stop (not reboot) the instance using the AWS console.
  2. Wait for it to stop; in my case I had waited for about 10 minutes and it was still in the “Stopping” state; at this point I chose Stop again from the actions menu, and was asked if I wanted to “Force stop”. Yes please, thank you AWS - maybe make it more obvious that “Force stop” is a thing?
  3. The force stop took effect within a couple of minutes and the instance entered the “Stopped” state. I now started the instance again, which should make it start up on different hardware.
  4. Everything was now working smoothly, services came back as usual.

Snapshot of older comments

I was using a plug-in called Disqus for comments, but it recently went crazy with spam ads from around the web, so until I get around to switching to a different commenting system, here’s a static snapshot of previous comments on this post