Tag: Akamai

Save Money on Private SSL CDN and Improve Performance at the Same Time

burning money

SSL support in your site, and therefore, in your CDN is critical but it is also incredibly expensive. In this article, I’ll show you how to save a fortune while improving the performance and security of your site.

It used to be that sites only encrypted the most sensitive traffic with their customers, i.e. registration, login, checkout, etc. In 2010, the Firesheep extension made it very clear that this is not enough.

Since then, many other attacks on partially encrypted sites have been devised and so major sites like Google, Twitter, Facebook, and others have all switched to HTTPS by default, aka Always On SSL. Last year, Google called for HTTPS everywhere and later, announced that secured sites will get boosts in PageRank.

Shared SSL from the CDN is cheap.

Shared SSL in the CDN is better than nothing but it isn’t great. It’s better than nothing because it will help with mixed content warnings. It’s not great because it isn’t on the same domain as your site so it can cause same origin policy problems.

Why is Private SSL CDN so expensive?

That’s a great question and I’m not sure there is a good answer. Some of the common excuses are:

  1. Encrypted traffic is heavier than unencrypted traffic so it should cost more. This is only very slightly true and recent work in this area has made this even less true.
  2. The logistics of deploying the SSL Certificates across all the CDN edge nodes is expensive. It’s honestly hard to believe that companies base all their business on managing uncountable edge nodes, have real trouble deploying the certificates.
  3. It’s only worth it for the CDN company to assume the security risk of HTTPS traffic if they charge you more money. This is probably the closest answer to the truth.

Regardless of the real reason, CDN’s often charge both exorbitant one time setup fees and high monthly fees for each SSL enabled configuration.

Common mistakes which are costing you money.

Companies often make several mistakes when ordering SSL services, costing them thousands of extra dollars each month:

  1. Ordering too many domains
  2. Ordering the wrong type of SSL service
  3. Not negotiating the fees

Ordering too many domains

Companies often spread their sites over multiple subdomains, i.e. www.yahu.com, mail.yahu.com, shopping.yahu.com, etc. Each of these sites will have it’s own content and many companies understandably setup separate CDN configurations for each origin, i.e. cdn.www.yahu.com, cdn.mail.yahu.com.

Ordering SSL from the CDN for each of those domains will cost a fortune. Instead, create a single CDN configuration for each root domain, i.e. cdn.yahu.com, cdn.giggle.com. In each configuration, use the CDN provider’s built in URL rewriting and origin rewriting features to direct the requests to the appropriate origin.

For example, configure cdn.yahu.com/www/ to cache content from www.yahu.com while cdn.yahu.com/shopping/ caches content from shopping.yahu.com. Now you have cut down the number of SSL slots you need to order drastically.

Ordering the wrong type of SSL service

Different CDN providers offer different types of SSL service. Some provide standard single domain certificates. Some use a multi-domain certificate. Some offer wildcard certificates.

Now that you have minimized the number of domains you need to protect in the CDN, you can re-evaluate the type of SSL certificate you need. In the example above, we might have thought to order an expensive wildcard certificate to protect all the subdomains of yahu.com, whereas now we can choose a less expensive single domain certificate.

If we can’t use URL rewriting to save on SSL enabled CDN domains, it may be less expensive to get one wildcard certificate, than to get several domains on a multi-domain certificate.

Not negotiating the fees

CDN fees are always negotiable. Usually, the more traffic you commit to each month, the lower a price you get. SSL slots are also open to volume discounts. If you need multiple SSL slots, try to order them at the same time and ask for a discount on the setup fees. The logic- even if they install each certificate manually, they can install all your certs at the same time.

You should also be able to get a discount on the monthly fees just because you are committing to pay them more each month. A better deal to ask for, is to offer to over-commit on monthly traffic commitment, in exchange for a discount on the SSL monthly fees.

For example, you have a monthly $5K commit for traffic and you are adding five SSL slots at a $750/month list price. They offer to go down to $700 monthly per SSL because you are adding five (a total of $8.5K monthly commitment). Counter back with an offer to commit to $6K monthly traffic + $500 monthly per SSL. This is better because, even if your traffic grows by 20%, you will continue paying $8.5K/month but you are still getting the SSL service.

How does this improve performance?

There is an added, hidden benefit in consolidating your CDN hostnames.

Now, when a customer first reaches one of your sites, i.e. www.yahu.com, their browser looks up and connects to your consolidated CDN domain – cdn.yahu.com. When they browse your site and switch to a new subdomain, for example if they clicked on shopping.yahu.com, they are already connected to cdn.yahu.com. They save on additional DNS lookups and additional TCP handshakes (the most expensive part of SSL traffic).

You can push this even further by sharing resource between your origins using a special CDN location, for example by storing common CSS files at cdn.yahu.com/shared/. In this case, shopping.yahu.com will be able to use the pre-loaded and cached CSS files from the browser’s initial visit to www.yahu.com.

Summary

I’ve used these SSL consolidation techniques in both Akamai and CDNetworks¬†realizing significant cost and performance savings. Although I’ve worked with at least five other CDN providers, I haven’t tried these techniques elsewhere.

If you’re interested in optimizing your CDN deployment for best cost/performance, feel free to contact me via LinkedIn or via¬†https://donatemyfee.org/.

When 99.999% Isn’t Good Enough

When discussing availability of a service, it is common to hear the term “Five Nines” referring to a service being available 99.999% of the time but “Five Nines” are relative. If your time frame is a week, then your service can be unavailable for 6.05 seconds whereas a time frame of a year, allows for a very respectable 5.26 minutes.

In reality, none of those calculations are relevant because no one cares if a service is unavailable for 10 hours, as long as they aren’t trying to use it. On the other hand, if you’re handling 50,000 transactions per second, 6.05 seconds of unavailability could cost you 302,500 transactions and no one cares if you met your SLA.

This problem is one I’ve come up against a number of times in the past and recently even more and the issue is orders of magnitude in IT. The larger the volume of business you handle, the less relevant the Five Nines become.

Google became famous years ago for its novel approach to hardware availability. They were using servers and disks on such a scale that they could no longer prevent the failures and they decided not to even try. Instead, they planned to sustain lots of failures and made a business of knowing when to expect problems and where. As much as we would like to be able to take Google’s approach to things, I think most of our IT budgets aren’t up for it.

Another good example is EMC2 who boast 99.999% availability for their Clariion line of storage systems. I want to start by saying that I use EMC storage and I’m happy with them. Regardless, their claim of 99.999% availability doesn’t give me any comfort for the following reasons.

According to a Whitepaper from 2007 (maybe they have changed things since then) EMC has a team which calculates availability for every Clariion in the field on a weekly basis. Assuming there were 2000 Clariion systems in the field on a given week(the example given in the whitepaper), and across all of them was 1.5 hours of downtime, then:

2000 systems x 7 days x 24 hours   =  336,000 total hours of runtime
336,000 hours - 1.5 hours downtime =  335,998.5 hours of uptime
335,998.5 / 336,000                =  99.9996% uptime

That is great, at least that is what EMC wants you to think. I look at this and understand something totally different. According to this guy, as of the beginning of 2009 there were 300,000 Clariion’s sold- not 2000. That is two orders of magnitude different meaning:

300,000 systems x 7 days x 24 hours   =  50,400,000 total hours of runtime
336,000 hours - 504 hours downtime    =  50,399,496 hours of uptime
50,399,496 / 50,400,000               =  99.999% uptime

Granted, that is a lot of uptime but 504 hours of downtime is still 21 full days of downtime for someone. If it were possible for 21 full days of downtime to fit in one week, they could all be yours and EMC would still be able to claim 99.999% availability according to their calculations. By the same token, 3 EMC customers each week could theoretically have no availability the entire week and one of those customers could be me.

Since storage failures can cause soo many complications, I figure it is much more likely that EMC downtime comes in days as opposed to minutes or hours. Either way, Five Nines is lost in the scale of things in this case as well.

Content Delivery Networks provide another availability vs scale problem. Akamai announced record breaking amounts of traffic on their network in January 2009. They passed 2 terabits and 12,000,000 requests per second. (I don’t use Akamai but I think it is amazing that they delivered over 2 terabits/second of traffic). With that level of traffic, even if Akamai would provide a 99.999% availability SLA, they could have had 120 failed requests per second, 7200 failed requests per minute, etc.

Sometimes complaints relating to our CDN cross my desk and while I have no idea how much traffic our CDN handles world wide, I know that we can easily send it 20,000,000 requests per day. Assuming 99.999% availability, I expect (learning from Google) to have 200 failed requests per day. Knowing IT as I do, I also expect that all 200 failed requests will be in the same country -probably an issue with one of their cache servers which due to GTM will primarily affect people directed to that server, etc. Unfortunately, the issue of scale is lost on our partners who didn’t get their content.

Availability is not the only case where scale is forgotten. I was recently asked to help debug the performance of an application server which could handle a large amount of requests per second when queried directly but only handled 80% of the requests per second when sitting behind a load balancer.

Of course we started by trying to find a reason why the load balancer would be causing a 20% performance hit. After deep investigation the answer I found (not necessarily the correct answer) was that all the load balancing configurations were correct and on average having the load balancer in the path added 1 millisecond to the response time of each request. Unfortunately the response time without the load balancer was an average of 4 milliseconds, so the additional 1 millisecond reduced the overal performance by 20%.

In short, everything is relative and 99.999% isn’t good enough.