Of load balancing, traffic and bad code

Nothing survives bad code.  Absolutely nothing.

For a month before CAT we spent a lot of time managing the hardware infrastructure at PG and scaling it.  We hit all the usual roadblocks of scaling mySQL  and Apache.  Working around them was both fun and enlightening.

We ended up running a multiple server configuration that allowed us to use the master/slave configuration for our databases and also ran a few machines in the front end to manage our front end traffic.  Our front end traffic was distributed over the few servers using a load balancer.  For those in the know, Apache is kind of not the coolest webserver to serve images. It is ineffective even if you tweak it to the very limits and so we had to find ways around it. We did it in two ways:

a) We offloaded most of our image serving to the Amazon S3 setup. Last month we served over 70+ million image requests through the S3 setup.

b) On the load balanced machines to ensure Apache is not given the job of serving images, we reverse proxied it with nginx.  It is the cool webserver from russia that is probably the most efficient at serving images. The nginx + apache combo can handle a ton more traffic than apache alone.

We felt we were fairly well set for the traffic onslaught and we were right.  Until, we released a new product on CAT day. The scorecard app which we put out had a bug that put processes into a loop. We crashed. Twice. Like a virus spreading across multiple servers, the bug in the app got replicated across all our frontend servers and pulled them down.

We had to the pull the application down to ensure the site was up.

It was a reminder, a rather stark reminder that Nothing survives bad code.  Absolutely nothing.

One Comment

  1. Amlan Mitra says:

    Using Load Runner could have been a good idea for alpha testing.
    The enthusiasm of the volunteer distributing flyers for your scorecard webpage at the gate of my CAT centre amazed me.
    Looking at the site crash with ugly Apache PHP errors saddened me more than anything. Okay, maybe not as much as my performance that day, but still.

Leave a Reply