Ynet on AWS. Let’s hope we don’t have to test their limits.

tightrope

In Israel, more than in most places, no news is good news. Ynet, one of the largest news sites in Israel, recently posted a case study (at the bottom of this article) on handling large loads by moving their notification services to AWS.

“We used EC2, Elastic Load Balancers, and EBS… Us as an enterprise, we need something stable…”

They are contradicting themselves in my opinion. EBS and Elastic Load Balancers (ELB) are the two AWS services which fail the most and fail hardest with multiple downtimes spanning multiple days each.

EBS: Conceptually flawed, prone to cascading failures

EBS, a virtual block storage service, is conceptually flawed and prone to severe cascading failures. In recent years, Amazon has improved reliability somewhat, mainly by providing such a low level of service on standard EBS, that customers are default to paying extra for reserved IOPS and SSD backed EBS volumes.

Many cloud providers avoid the problematic nature of virtual block storage entirely, preferring compute nodes based on local, direct attached storage.

ELB: Too slow to adapt, silently drops your traffic

In my experience, ELBs are too slow to adapt to spikes in traffic. About a year ago, I was called to investigate availability issues with one of our advertising services. The problems were intermittent and extremely hard to pin down. Luckily, as a B2B service, our partners noticed the problems. Our customers would have happily ignored the blank advertising space.

Suspecting some sort of capacity problem, I ran some synthetic load tests and compared the results with logs on our servers. Multiple iterations of these tests with and without ELB in the path confirmed a gruesome and silent loss of 40% of our requests when traffic via Elastic Load Balancers grew suddenly.

The Elastic Load Balancers gave us no indication that they were dropping requests and, although they would theoretically support the load once Amazon’s algorithms picked up on the new traffic, they just didn’t scale up fast enough. We wasted tons of money in bought media that couldn’t show our ads.

Amazon will prepare your ELBs for more traffic if you give them two weeks notice and they’re in a good mood but who has the luxury of knowing when a spike in traffic will come?

Recommendations

I recommend staying away from EC2, EBS, and ELB if you care about performance and availability. There are better, more reliable providers like Joyent. Rackspace without using their cloud block storage (basically the same as EBS with the same flaws) would be my second choice.

If you must use EC2, try to use load balancing AMIs from companies like Riverbed or F5 instead of ELB.

If you must use ELB, make sure you run synthetic load tests at random intervals and make sure that Amazon isn’t dropping your traffic.

Conclusion

In conclusion, let us hope that we have no reasons to test the limits of Ynet’s new services, and if we do, may it only be good news.

Be Sociable, Share!
  • Twitter
  • Facebook
  • email
  • LinkedIn
  • HackerNews
  • Reddit

Leave a Reply

Your email address will not be published.