Joe's Nerd Party
Motivations

I develop and run a website called http://www.tanga.com.

Last week, my website was offline for 36+ hours because of problems at Amazon. See http://highscalability.com/blog/2011/4/25/the-big-list-of-articles-on-the-amazon-outage.html for more details.

There were multiple reasons why our website was so vulnerable to a glitch in Amazon service.  This blog will be an attempt to investigate and solve these issues.  I’m hoping this information will be useful to other people.

Let’s start with some assumptions:

  • We’re running a webservice or website - something http-based.
  • The dataset involved is a “reasonable” size.  We aren’t dealing with Google, Reddit, or Amazon-sized loads.  Most discussion online about high availability and scalability assumes you are dealing with huge amounts of traffic and data. If you follow recommendations geared for large sites, you will be wasting effort that should be spent in other areas.
  • You want your service to be up as much as possible, while keeping costs and complexity down.  You don’t want an army of operations people, you don’t want a byzantine system, you don’t want overly-complex code.
  • You know the basics of *nix, databases, filesystems, http, etc.
  • You run on some *nix.  I use Ubuntu server (10.04).

The next post will deal with database backups! You don’t think your database server will last forever, do you…?

Blog comments powered by Disqus