Blog


Nov

The "Oh Shit" Moment


You know that point when you realise you forgot something crucial, but it's too late to do anything about it? That's the "Oh shit" moment.

Last week, one of the guys at GitHub experienced just such a moment, when he accidently dropped their production database. A lot of people made fun of GitHub for this (after their data had been restored). We didn't laugh. Because we had just done something similar ourselves.

In our down time between client projects, we've been working on our own web application (more to come on that subject). It's been in private beta for a little while now, and some of our friends have been trying it out. We've been using it extensively ourselves. While trying out a new feature, we accidently deleted the entire production database. And we didn't have a backup.

What was the cause of this royal screwup? I'll get to that, but first I'd like to apologise to our friends who lost the data they had put in while helping us try out our new service. We fucked up. I'm so, so sorry!

What happened

We host our app on Heroku, and we have two environments there: the production (beta) environment and our staging (testing) environment. We were going to reset our staging database to get some new data in there by running heroku rake db:reset.

When you have multiple app environments on Heroku you specify which one you want to run a command against by appending --app appname to your command. If you leave that out, Heroku tries to guess which one you meant by seeing if any of your environments have the same name as the local directory you're in. In our case the production environment was named in the form of "appname" and the staging environment was "appname-staging". Our local directory was also named in the form "appname", which of course meant that the rake db:reset command ran against our production environment and reset the production database.

Now, we should of course have had a backup. But in our small private beta we just hadn't taken the time to set that up yet. We thought other things were more important. Now we know better.

What we are doing about it

We've taken two important steps to make sure that this doesn't happen again.

  1. We now have backups that get run automatically.
  2. We've changed the name of our production environment to the form of "appname-production" to make sure that we don't accidently run commands against it when we forget to add the --app flag.

If you were affected by our mistake, please accept my apologies. I am truly, very sorry.

Oh shit!