Tag Archive for DNS

How To Setup a Multi-Platform Website With Cedexis

multiplatform

With the recent and ongoing DDOS attacks against Github, many sites hosted as Github Pages have been scrambling to find alternative hosting. I began writing this tutorial long before the attacks but the configuration you find here is exactly what you need to serve static content from multiple providers like Github Pages, Divshot, Amazon S3, etc.

In my previous article, I introduced the major components of Cedexis and how they fit together to build great multi-provider solutions. If you haven’t read it, I suggest taking a quick look to get familiar with the concepts.

In this article, I’ll show you, step-by-step, how to build a robust multi-provider, multi-platform website using the following Cedexis components:

  • Radar- provides real-time user measurements of provider performance and availability.
  • Sonar- provides lightweight efficient service availability health-checks.
  • OpenMix- DNS traffic routing based on the data from Radar and Sonar.
  • Portal- UI for configuration and reporting.

I’ll also briefly cover some of the reports available in the portal.

Basic Architecture

OpenMix applications all have the same basic architecture and the same basic building blocks. Here is a diagram for reference:

Preparations

For the purpose of the tutorial, I’ve built a demo site using the Amazon S3 and Divshot static web hosting services. I’ve already uploaded my content to both providers and made sure that they are responding.

Both of these services provide us a DNS hostname with which to access the sites.

Amazon S3 is part of the standard community Radar measurements, but as a recently launched service, Divshot hasn’t made the list yet. By adding test objects, we can eventually enable private Radar measurements instead.

Download the test objects from here:

I’ve uploaded them to a cedexis directory inside my web root on Divshot.

Configuring the Platforms

Platforms are, essentially, named data sets. Inside the OpenMix Application, we assign them to a service endpoint and the data they contain influences how Cedexis routes traffic. In addition, the platforms become dimensions by which we slice and dice data in the reports.

We need to define platforms for S3 and Divshot in Cedexis and connect each platform to their relevant data sources (Radar and Sonar).

Log in to the Cedexis portal here and click on the Add Platform button in the Platforms section.

 

 

We’ll find the community platform for Amazon S3 under the Cloud Storage category. It means that S3 performance will be monitored automatically by Cedexis’ Community Radar. You can leave the defaults on this screen:

After clicking next, we’ll get the opportunity to set up custom Radar and Sonar settings for this platform. We want to enable Sonar to make sure there are no problems with our specific S3 bucket which community Radar might not catch.

We’ll enable Sonar polls every 60 seconds (default) and for the test URL, I’ve put the homepage of the site.

We’ll save the platform and create another:

Divshot is somewhere in between Cloud Storage and Cloud Compute. It’s really only hosting static content so I’ve chosen the Cloud Storage category, but there is no real difference from Cedexis’ perspective. If they eventually add Divshot to their community metrics, it might end up in a different category.

Since Divshot isn’t one of the pre-configured Cloud Storage platforms, choose the platform “Other”.

The report name is what will show up in Cedexis charts when you want to analyze the data from this platform.

The OpenMix alias is how OpenMix applications will refer to this platform. Notice that I’ve called it divshot_production. That is because Divshot provides multiple environments for development, staging, and QA. In the future, we may define platforms for other environments as well.

Since there are no community Radar measurements for Divshot, we prepare private measurements of our own in the next step.

We are going to add three types of probes using the test objects which we downloaded above.

First the HTTP Response Time URL (If you are serving your content over SSL, as you should, then you should choose the HTTPS Response Time URL probe, etc.) The HTTP Response Time probe should use the Small Javascript Timing Object.

Click Add probe at the bottom left of the dialog to add the next probes.

In addition to the response time probe, we will add the Cold Start and Throughput probes to cover all our bases.

The Cold Start probe should also use the small test object.

The Throughput probe needs the large test object.

Make sure the test objects are correct in the summary page before setting up Sonar.

Configure the Sonar settings for Divshot similarly to those from S3 with the exception of using the homepage from Divshot for the test URL. Then click ‘Complete’ to save the platform and we are done setting up platforms for now.

A little bit of information about platforms. A nice thing about platforms is that they are decoupled from the OpenMix applications. That means that you can re-use a platform across multiple OpenMix applications with completely different logic. It also means you can slice and dice your data using application and platform as separate dimensions.

For example, if we had applications running in multiple datacenters, we would be interested to know about the performance and availability of each data center across all our applications. Conversely, we would also want to know if a specific application performs better in one data center than another. Cedexis hands us this data on a silver platter.

Our First OpenMix Application

Open the Application Configuration option under the OpenMix tab and click on the plus in the upper right corner to add a new application..

We’re going to select the Optimal RTT quick app for the application type. This app will send traffic to the platform with the best response time according to the information Cedexis has on the user making the request.

Define the fallback host. Note that this does not have to be one of the defined platforms. This host will be used in case the application logic fails or there is a system issue within Cedexis. In this case, I trust S3 slightly more than Divshot so I’ll configure it as the fallback host.

In the second configuration step, I’ve left the default TTL of 20 seconds. This means that users should check every 20 seconds to see if a different provider should be used to return requests. Once Cedexis detects a problem, the maximum time for users to be directed to a different provider should be approximately the same as this value.

In my experience, 20 seconds is a good value to use. It is long enough that users can browse one or two pages of a site before doing any additional DNS lookups and it is short enough to react to shorter downtimes.

Increasing this value will result in fewer requests to Cedexis. To save money, consider automatically changing TTLs via RESTful Web Services. Use lower TTLs during peaks, where performance could be more erratic, and use longer TTLs during low traffic periods to save request costs.

On the third configuration step, I’ve left all the defaults. The Optimal RTT quick app will filter out any platforms which have less than 80% availability before considering them as choices for sending traffic.

Depending on the quality of your services, you may decide to lower this number but you probably do not want it any higher. Why not eliminate any platform that isn’t reporting 100% available? The answer is that RUM measurements rely on the, sometimes poor quality, home networks of your users and, as a result, they can be extremely finicky. Expecting 100% availability from a high traffic service is unrealistic and leaving a threshold of 80% will help reduce thrashing and unwanted use of the fallback host.

Regarding eDNS, you pretty much always want this enabled since many people have switched to using public DNS resolvers like Google DNS instead of the resolvers provided by their ISPs.

Shared resolvers break assumptions made by traffic optimization solutions and the eDNS standard works around this problem, passing information about the request origin to the authoritative DNS servers. Cedexis has supported eDNS from the beginning but many services still don’t.

In the final step, we will configure the service endpoints for each of the platforms we defined.

In our case, we are just associating the hostname aliases that Amazon and Divshot gave us with the correct platform and it’s Radar/Sonar data.

In a more complicated setup, you might have a platform per region of a cloud and service endpoints with different aliases or CNAMEs across each region.

Pay attention that each platform in the application has an “Enabled” checkbox. This makes it easy to go into an application and temporarily stop sending traffic to a specific platform. It is very useful avoiding downtime in case of maintenance windows, migrations, or intermittent problems with a provider.

Choose the Add Platform button on the bottom left corner of the dialog to add the second platform, not the Complete button on the bottom right.

Define the Divshot platform like we did for S3 and click Complete.

You should get a success message with the CNAME for the Cedexis application alias. Click “Publish” to activate the OpenMix application right away.

Alternatively, clicking “Done” would leave the application configured but inactive. When editing applications, you will get a similar dialog. Leaving changes saved but unpublished can be a useful way to stage changes to be activated later with the push of a button.

Building a Custom OpenMix Application

The application we just created will work, but it doesn’t take advantage of the Sonar data that we configured. To consider the Sonar metrics, we will create acustom OpenMix app and by custom, I mean copy and paste the code from Cedexis’ GitHub repository. If you’re squeamish about code, talk to your account manager and I’m sure he’ll be able to help you.

The specific code we are going to adapt can be found here (Note: I’ve fixed the link to a specific revision of the repo to make sure the instructions match, but you might choose to take the latest revision.)

We only need to modify definitions at the beginning of the script:

Let’s create a new application using the new script. Then we can switch back and forth between them if we want. We’ll start by duplicating the original app. Expand the app’s row and click Duplicate.

Switch the application type to that of a Custom Javascript app, modify the name and description to reflect that we will use Sonar data and click next.

Leave the fallback and TTL as is on the next screen.

In the third configuration step, we’ll be asked to upload our custom application.
Choose the edited version of the script and click through to complete the process.

As before, publish the application to activate it.

Adding Radar Support to our Service

At this point, Cedexis is already gathering availability data on both our platforms via Sonar. Since we used the community platform for S3, we also have performance data for that. To finish implementing private Radar for Divshot, we must include the Radar tag in our pages so our users start reporting on their performance.

We get our custom JavaScript tag from the portal (Note: If you want to get really advanced, you can go into the tag configuration section and set some custom parameters to control the behavior of the Radar tag, for example, how many test to run on each page load, etc.)
Copy the tag to your clipboard and add it in your pages on all platforms.

Going Live

Before we go live, we should really test out the application with some manual DNS requests, disabling and enabling platforms, to see that the responses change, etc.

Once we’re satisfied, the last step is to change the DNS records to point at the CNAME of the OpenMix application that we want to use. I’ll set the DNS to point at our Sonar enabled application.

A useful service to check how our applications are working ishttps://www.whatsmydns.net/. This will show how our application CNAMEs resolve from location around the world. For example, if I check the CNAME resolution for the application we just created, I get the following results:

By and large, the Western Hemisphere prefers Divshot while the Eastern Hemisphere prefers Amazon S3 in Europe. This is completely understandable. Interestingly, there are exceptions in both directions. For example, in this test, OpenMix refers the TEANET resolver from Italy to Divshot while the Level 3 resolver in New York is referred to Amazon S3 in Europe. If you refresh the test every so often, you will see that the routings change.

Since this demo site isn’t getting any live traffic, I’ve generated traffic to show off the reports. First the dashboard which gives you a quick overview of your Cedexis traffic on login. Here we show that the majority of the traffic came from North America, and a fair amount came from Europe as well. We also see that, for the most part, the traffic was split evenly between the two platforms.

To examine how Cedexis is routing our traffic, we look at the OpenMix Decision Report. I’ve added a secondary dimension of ‘Platform’ to see how the decisions have been distributed. You see that sometimes Amazon S3 is preferred and other times Divshot.

To figure out why requests were routed one way or the other, we can drill down into the data using the other reports. First, let’s check the availability stats in the Radar Performance Report. For the sake of demonstration, I’ve drilled down into availability per continent. In Asia, we see shoddy availability from Divshot but Amazon S3 isn’t perfect either. Since we didn’t see much traffic from Asia, this probably didn’t affect the traffic distribution. Theoretically, a burst of Asian traffic would result in more traffic going to Amazon.

In Europe, Divshot generally showed better availability than Amazon, reporting 100% except for a brief outage.

In North America, we see a similar graph. As to be expected, the availability of Amazon S3 in Europe is lower and less stable in North America. Divshot shows 100% availability which is also expected.

It’s important to note that the statistics here are skewed because we are comparing our private platform, measured only by our users to the community data from S3. The community platform collects many more data points in comparison to our private platform and it’s almost impossible for it to show 100% availability. This is also why we chose an 80% availability threshold when we built the OpenMix Application.

Next let’s look at the performance reports for response times of each platform. With very little traffic from Asia, the private performance measurements for Divshot are pretty erratic. With more traffic, the graph should stabilize into something more meaningful.

The graph for Europe behaves as expected showing Amazon S3 outperforming Divshot consistently.

The graph for North America also behaves as expected with Divshot consistently outperforming Amazon S3.So we’ve seen some basic data on how our traffic performs. Cedexis doesn’t stop there. We can also take a look at how our traffic could perform if we add a new provider. Let’s see how we could improve performance in North America by adding other community platforms to our graph.

I’ve added Amazon S3 in US-East, which shows almost a 30ms advantage on Divshot, though our private measurement still need to be taken lightly with so little traffic behind them. Even better performance comes from Joyent US-East. Using Joyent will require us to do more server management but if we really care about performance, Cedexis shows that it will provide a major improvement.

Summary

To recap, In this tutorial, I’ve demonstrated how to set up a basic OpenMix application to balance traffic between two providers. Balancing between multiple CDNs or implementing a Hybrid Datacenter+CDN architecture is just as easy.

Cedexis is a great solution for setting up a multi-provider infrastructure. With the ever-growing Radar community generating performance metrics from around the globe, Cedexis provides both real-time situational awareness for your services operational intelligence so you can make well-informed decisions for your business.

Unpacking Cedexis; Creating Multi-CDN and Multi-Cloud applications

package

Technology is incredibly complex and, at the same time, incredibly unreliable. As a result, we build backup measures into the DNA of everything around us. Our laptops switch to battery when the power goes out. Our cellphones switch to cellular data when they lose connection to WiFi.

At the heart of this resilience, technology is constantly choosing between the available providers of a resource with the idea that if one provider becomes unavailable, another provider will take its place to give you what you want.

When the consumers are tightly coupled with the providers, for example, a laptop consuming power at Layer 1 or a server consuming LAN connectivity at Layer 2, the choices are limited and the objective, when choosing a provider, is primarily one of availability.

Multiple providers for more than availability.

As multi-provider solutions make their way up the stack, however, additional data and time to make decisions enable choosing a provider based on other objectives like performance. Routing protocols, such as BGP, operate at Layer 3. They use path selection logic, not only to work around broken WAN connections but also to prefer paths with higher stability and lower latency.

As pervasive and successful as the multi-provider pattern is, many services fail to adopt a full stack multi-provider strategy. Cedexis is an amazing service which has come to change that by making it trivial to bring the power of intelligent, real-time, provider selection your application.

I first implemented Multi-CDN using Cedexis about 2 years ago. It was a no-brainer to go from Multi-CDN to Multi-Cloud. The additional performance, availability, and flexibility for the business became more and more obvious over time. Having a good multi-provider solution is key in cloud-based architectures and so I set out to write up a quick how-to on setting up a Multi-Cloud solution with Cedexis; but first you need to understand a bit about how Cedexis works.

Cedexis Unboxed

Cedexis has a number of components:

  1. Radar
  2. OpenMix
  3. Sonar
  4. Fusion

OpenMix

OpenMix is the brain of Cedexis. It looks like a DNS server to your users but, in fact, it is a multi-provider logic controller. In order to setup multi-provider solutions for our sites, we build OpenMix applications. Cedexis comes with the most common applications pre-built but the possibilities are pretty endless if you want something custom. As long as you can get the data you want into OpenMix, you can make your decisions based on that data in real time.

Radar

Radar is where Cedexis really turned the industry on their heads. Radar uses a javascript tag to crowdsource billions of Real User Monitoring (RUM) metrics in real time. Each time a user visits a page with the Radar tag, they take a small number of random performance measurements and send the data back to Cedexis for processing.

The measurements are non-intrusive. They only happen several seconds after your page has loaded and you can control various aspects of what and how much gets tested by configuring the JS tag in the portal.

It’s important to note that Radar has two types of measurements:

  1. Community
  2. Private.

Community Radar

Community measurements are made against shared endpoints within each service provider. All Cedexis users that implement the Radar tag and allow community measurements get free access to the community Radar statistics. The community includes statistics for the major Cloud compute, Cloud storage, and CDNs making Radar the first place I go to research developments and trends in the Cloud/CDN markets.

Community Radar is the fastest and easiest way to use Cedexis out of the box and the community measurements also have the most volume so they are very accurate all the way up to the “door” of your service provider. They do have some disadvantages though.

The community data doesn’t account for performance changes specific to each of the provider’s tenants. For example, community Radar for Amazon S3 will gather RUM data for accessing a test bucket in the specified region. This data assumes that within the S3 region all the buckets perform equally.

Additionally, there are providers which may opt out of community measurements so you might not have community data for some providers at all. In that case, I suggest you try to connect between your account managers and get them included. Sometimes it is just a question of demand.

Private Radar

Cedexis has the ability to configure custom Radar measurements as well. These measurements will only be taken by your users, the ones using your JS tag.

Private Radar lets you monitor dedicated and other platforms which aren’t included in the community metrics. If you have enough traffic, private Radar measurements have the added bonus of being specific to your user base and of measuring your specific application so the data can be even more accurate than the community data.

The major disadvantage of private Radar is that low volume metrics may not produce the best decisions. With that in mind, you will want to supplement your data with other data sources. I’ll show you how to set that up.

Situational Awareness

More than just a research tool, Radar makes all of these live metrics available for decision-making inside OpenMix. That means we can make much more intelligent choices than we could with less precise technologies like Geo-targeting and Anycast.

Most people using Geo-targeting assume that being geographically close to a network destination is also closer from the networking point of view. In reality, network latency depends on many factors like available bandwidth, number of hops, etc. Anycast can pick a destination with lower latency, but it’s stuck way down in Layer 3 of the stack with no idea about application performance or availability.

With Radar, you get real-time performance comparisons of the providers you use, from your user’s perspectives. You know that people on ISP Alice are having better performance from the East coast DC while people on ISP Bob are having better performance from the Midwest DC even if both these ISPs are serving the same geography.

Sonar

Whether you are using community or low volume private Radar measurements, you ideally want to try and get more application specific data into OpenMix. One way to do this is with Sonar.

Sonar is a synthetic polling tool which will poll any URL you give it from multiple locations and store the results as availability data for your platforms. For the simplest implementation, you need only an address that responds with an OK if everything is working properly.

If you want to get more bang for your buck, you can make that URL an intelligent endpoint so that if your platform is nearing capacity, you can pretend to be unavailable for a short time to throttle traffic away before your location really has issues.

You can also use the Sonar endpoints as a convenient way to automate diverting traffic for maintenance windows- No splash pages required.

Fusion

Fusion is really another amazing piece of work from Cedexis. As you might guess from its name, Fusion is where Cedexis glues services together and it comes in two flavors:

  1. Global Purge
  2. Data Feeds

Global Purge

By nature, one of the most apropos uses of Cedexis is to marry multiple CDN providers for better performance and stability. Every CDN has countries where they are better and countries where they are worse. In addition, maintenance windows in a CDN provider can be devastating for performance even though they usually won’t cause downtime.

The downside of a Multi-CDN approach is the overhead involved in managing each of the CDNs and most often that means purging content from the cache. Fusion allows you to connect to multiple supported CDN providers (a lot of them) and purge content from all of them from one interface inside Cedexis.

While this is a great feature, I have to add that you shouldn’t be using it. Purging content from a CDN is very Y2K and you should be using versioned resources with far futures expiry headers to get the best performance out of your sites and so you never have to purge content from a CDN ever again.

Data Feeds

This is the really great part. Fusion lets you import data from basically anywhere to use in your OpenMix decision making process. Built in, you will find connections to various CDN and monitoring services, but you can also work with Cedexis to setup custom Fusion integrations so the sky’s the limit.

With Radar and Sonar, we have very good data on performance and availability (time and quality) both from the real user perspective and a supplemental synthetic perspective. To really optimize our traffic we need to account for all three corners of the Time, Cost, Quality triangle.

With Fusion, we can introduce cost as a factor in our decisions. Consider a company using multiple CDN providers, each with a minimum monthly commitment of traffic. If we would direct traffic based on performance alone, we might not meet the monthly commitment on one provider but be required to pay for traffic we didn’t actually send. Fusion provides usage statistics for each CDN and allows OpenMix to divert traffic so that we optimize our spending.

Looking Forward

With all the logic we can build into our infrastructure using Cedexis, it could almost be a fire and forget solution. That would, however, be a huge waste. The Internet is always evolving. Providers come and go. Bandwidth changes hands.

Cedexis reports provide operational intelligence on alternative providers without any of the hassle involved in a POC. Just plot the performance of the provider you’re interested in against the performance of your current providers and make an informed decision to further improve your services. When something better come along, you’ll know it.

The Nitty Gritty

Keep an eye out for the next article where I’ll do a step by step walk-through on setting up a Multi-Cloud solution using Cedexis. I’ll cover almost everything mentioned here, including Private and Community Radar, Sonar, Standard and Custom OpenMix Applications, and Cedexis Reports.

Wrangling Elephants in the Cloud

elephant

You know the elephant in the room, the one no one wants to talk about. Well it turns out there was a whole herd of them hiding in my cloud. There’s a herd of them hiding in your cloud too. I’m sure of it. Here is my story and how I learned to wrangle the elephants in the cloud.

Like many of you, my boss walked into my office about three years ago and said “We need to move everything to the cloud.” At the time, I wasn’t convinced that moving to the cloud had technical merit. The business, on the other hand, had decided that, for whatever reason, it was absolutely necessary.

As I began planning the move, selecting a cloud provider, picking tools with which to manage the deployment, I knew that I wasn’t going to be able to provide the same quality of service in a cloud as I had in our server farm. There were too many unknowns.

The cloud providers don’t like to give too many details on their setups nor do they like to provide many meaningful SLAs. I have very little idea what hardware I’m running. I have almost no idea how it’s connected. How many disks I’m running on? What RAID configuration? How many IOPS can I count on? Is a disk failing? Is it being replaced? What will happen if the power supply blows? Do I have redundant network connections?

Whatever it was that made the business decide to move, it trumped all these unknowns. In the beginning, I focused on getting what we had from one place to the other, following whichever tried and true best practices were still relevant.

Since then, I’ve come up with these guiding principles for working around the unknowns in the cloud.

  • Beginners:
    • Develop in the cloud
    • Develop for failure
    • Automate deployment to the cloud
    • Distribute deployments across regions
  • Advanced:
    • Monitor everything
    • Use multiple providers
    • Mix and match private cloud

Wrangling elephants for beginners:

Develop in the cloud.

Developers invariably want to work locally. It’s more comfortable. It’s faster. It’s why you bought them a crazy expensive MacBook Pro. It is also nothing like production and nothing developed that way ever really works the same in real life.

If you want to run with the IOPS limitations of standard Amazon EBS or you want to rely on Amazon ELBs to distribute traffic under sudden load, you need to have those limitations in development as well. I’ve seen developers cry when their MongoDB deployed to EBS and I’ve seen ELBs disappear 40% of a huge media campaign.

Develop for failure.

Cloud providers will fail. It is cheaper for them to fail and in the worst case, credit your account for some machine hours, than it is for them to buy high quality hardware and setup highly available networks. In many cases, the failure is not even a complete and total failure (that would be too easy). Instead, it could just be some incredibly high response times which your application may not know how to deal with.

You need to develop your application with these possibilities in mind. Chaos Monkey by Netflix is a classic, if not over-achieving example.

Automate deployment to the cloud.

I’m not even talking about more complicated, possibly over complicated, auto-scaling solutions. I’m talking about when it’s 3am and your customers are switching over to your competitors. Your cloud provider just lost a rack of machines including half of your service. You need to redeploy those machines ASAP, possibly to a completely different data center.

If you’ve automated your deployments and there aren’t any other hiccups, it will hopefully take less than 30 minutes to get back up. If not, well, it will take what it takes. There are many other advantages to automating your deployments but this is the one that will let you sleep at night.

Distribute deployments across regions.

A pet peeve of mine is the mess that Amazon has made with their “availability zones.” While the concept is a very easy to implement solution (from Amazon’s point of view) to the logistical problems involved in running a cloud service, it is a constantly overlooked source of unreliability for beginners choosing Amazon AWS. Even running a multi-availability zone deployment in Amazon only marginally increases reliability whereas deploying to multiple regions can be much more beneficial with a similar amount of complexity.

Whether you use Amazon or another provider, it is best to build your service from the ground up to run in multiple regions, even only in an active/passive capacity. Aside from the standard benefits of a distributed deployment (mitigation of DDOS attacks and uplink provider issues, lower latency to customers, disaster recovery, etc.), running in multiple regions will protect you against regional problems caused by hardware failure, regional maintenance, or human error.

Advanced elephant wrangling:

The four principles before this are really about being prepared for the worst. If you’re prepared for the worst, then you’ve managed 80% of the problem. You may be wasting resources or you may be susceptible to provider level failures, but your services should be up all of the time.

Monitor Everything.

It is very hard to get reliable information about system resource usage in a cloud. It really isn’t in the cloud provider’s interest to give you that information, after all, they are making money by overbooking resources on their hardware. No, you shouldn’t rely on Amazon to monitor your Amazon performance, at least not entirely.

Even when they give you system metrics, it might not be the information you need to solve your problem. I highly recommend reading the book Systems Performance – Enterprise and the Cloud by Brendan Gregg.

Some clouds are better than others at providing system metrics. If you can choose them, great! Otherwise, you need to start finding other strategies for monitoring your systems. It could be to monitor your services higher up in the stack by adding more metric points to your code. It could be to audit your request logs. It could be to install an APM agent.

Aside from monitoring your services, you need to monitor your providers. Make sure they are doing their jobs. Trust me that some times they aren’t.

I highly recommend monitoring your services from multiple points of view so you can corroborate the data from multiple observers. This happens to fit in well with the next principle.

Use multiple providers.

There is no way around it. Using one provider for any third party service is putting all your eggs in one basket. You should use multiple providers for everything in your critical path, especially the following four:

  • DNS
  • Cloud
  • CDN
  • Monitoring

Regarding DNS, there are some great providers out there. CloudFlare is a great option for the budget conscious. Route53 is not free but not expensive. DNSMadeEasy is a little bit pricier but will give you some more advanced DNS features. Some of the nastiest downtimes in the past year were due to DNS provider

Regarding Cloud, using multiple providers requires very good automation and configuration management. If you can find multiple providers which run the same underlying platform (for example, Joyent licenses out their cloud platform to various other public cloud vendors), then you can save some work. In any case, using multiple cloud providers can save you from some downtime, bad cloud maintenance or worse.

CDNs also have their ups and downs. The Internet is a fluid space and one CDN may be faster one day and slower the next. A good Multi-CDN solution will save you from the bad days, and make every day a little better at the same time.

Monitoring is great but who’s monitoring the monitor. It’s a classic problem. Instead of trying to make sure every monitoring solution you use is perfect, use multiple providers from multiple points of view (application performance, system monitoring, synthetic polling).

These perspectives all overlap to some degree backing each other up. If multiple providers start alerting, you know there is a real actionable problem and from how they alert, you can sometimes home in on the root cause much more quickly.

If your APM solution starts crying about CPU utilization but your system monitoring solution is silent, you know that you may have a problem that needs to be verified. Is the APM system misreading the situation or has your system monitoring agent failed to warn you of a serious issue?

Mix and match private cloud

Regardless of all the above steps you can take to mitigate the risks of working in environments not completely in your control, really important business should remain in-house. You can keep the paradigm of software defined infrastructure by building a private cloud.

Joyent license their cloud platform out to companies for building private clouds with enterprise support. This makes a mixing and matching between public and private very easy. In addition, they have open sourced the entire cloud platform so if you want to install without support, you are free to do so.

Summary

When a herd of elephants is stampeding, there is no hope of stopping them in their tracks. The best you can hope for is to point them in the right direction. Similarly, in the cloud, we will never get back the depth of visibility and control that we have with private deployments. What’s important is to learn how to steer the herd so we are prepared for the occasional stampede while still delivering high quality systems.

Save Money on Private SSL CDN and Improve Performance at the Same Time

burning money

SSL support in your site, and therefore, in your CDN is critical but it is also incredibly expensive. In this article, I’ll show you how to save a fortune while improving the performance and security of your site.

It used to be that sites only encrypted the most sensitive traffic with their customers, i.e. registration, login, checkout, etc. In 2010, the Firesheep extension made it very clear that this is not enough.

Since then, many other attacks on partially encrypted sites have been devised and so major sites like Google, Twitter, Facebook, and others have all switched to HTTPS by default, aka Always On SSL. Last year, Google called for HTTPS everywhere and later, announced that secured sites will get boosts in PageRank.

Shared SSL from the CDN is cheap.

Shared SSL in the CDN is better than nothing but it isn’t great. It’s better than nothing because it will help with mixed content warnings. It’s not great because it isn’t on the same domain as your site so it can cause same origin policy problems.

Why is Private SSL CDN so expensive?

That’s a great question and I’m not sure there is a good answer. Some of the common excuses are:

  1. Encrypted traffic is heavier than unencrypted traffic so it should cost more. This is only very slightly true and recent work in this area has made this even less true.
  2. The logistics of deploying the SSL Certificates across all the CDN edge nodes is expensive. It’s honestly hard to believe that companies base all their business on managing uncountable edge nodes, have real trouble deploying the certificates.
  3. It’s only worth it for the CDN company to assume the security risk of HTTPS traffic if they charge you more money. This is probably the closest answer to the truth.

Regardless of the real reason, CDN’s often charge both exorbitant one time setup fees and high monthly fees for each SSL enabled configuration.

Common mistakes which are costing you money.

Companies often make several mistakes when ordering SSL services, costing them thousands of extra dollars each month:

  1. Ordering too many domains
  2. Ordering the wrong type of SSL service
  3. Not negotiating the fees

Ordering too many domains

Companies often spread their sites over multiple subdomains, i.e. www.yahu.com, mail.yahu.com, shopping.yahu.com, etc. Each of these sites will have it’s own content and many companies understandably setup separate CDN configurations for each origin, i.e. cdn.www.yahu.com, cdn.mail.yahu.com.

Ordering SSL from the CDN for each of those domains will cost a fortune. Instead, create a single CDN configuration for each root domain, i.e. cdn.yahu.com, cdn.giggle.com. In each configuration, use the CDN provider’s built in URL rewriting and origin rewriting features to direct the requests to the appropriate origin.

For example, configure cdn.yahu.com/www/ to cache content from www.yahu.com while cdn.yahu.com/shopping/ caches content from shopping.yahu.com. Now you have cut down the number of SSL slots you need to order drastically.

Ordering the wrong type of SSL service

Different CDN providers offer different types of SSL service. Some provide standard single domain certificates. Some use a multi-domain certificate. Some offer wildcard certificates.

Now that you have minimized the number of domains you need to protect in the CDN, you can re-evaluate the type of SSL certificate you need. In the example above, we might have thought to order an expensive wildcard certificate to protect all the subdomains of yahu.com, whereas now we can choose a less expensive single domain certificate.

If we can’t use URL rewriting to save on SSL enabled CDN domains, it may be less expensive to get one wildcard certificate, than to get several domains on a multi-domain certificate.

Not negotiating the fees

CDN fees are always negotiable. Usually, the more traffic you commit to each month, the lower a price you get. SSL slots are also open to volume discounts. If you need multiple SSL slots, try to order them at the same time and ask for a discount on the setup fees. The logic- even if they install each certificate manually, they can install all your certs at the same time.

You should also be able to get a discount on the monthly fees just because you are committing to pay them more each month. A better deal to ask for, is to offer to over-commit on monthly traffic commitment, in exchange for a discount on the SSL monthly fees.

For example, you have a monthly $5K commit for traffic and you are adding five SSL slots at a $750/month list price. They offer to go down to $700 monthly per SSL because you are adding five (a total of $8.5K monthly commitment). Counter back with an offer to commit to $6K monthly traffic + $500 monthly per SSL. This is better because, even if your traffic grows by 20%, you will continue paying $8.5K/month but you are still getting the SSL service.

How does this improve performance?

There is an added, hidden benefit in consolidating your CDN hostnames.

Now, when a customer first reaches one of your sites, i.e. www.yahu.com, their browser looks up and connects to your consolidated CDN domain – cdn.yahu.com. When they browse your site and switch to a new subdomain, for example if they clicked on shopping.yahu.com, they are already connected to cdn.yahu.com. They save on additional DNS lookups and additional TCP handshakes (the most expensive part of SSL traffic).

You can push this even further by sharing resource between your origins using a special CDN location, for example by storing common CSS files at cdn.yahu.com/shared/. In this case, shopping.yahu.com will be able to use the pre-loaded and cached CSS files from the browser’s initial visit to www.yahu.com.

Summary

I’ve used these SSL consolidation techniques in both Akamai and CDNetworks realizing significant cost and performance savings. Although I’ve worked with at least five other CDN providers, I haven’t tried these techniques elsewhere.

If you’re interested in optimizing your CDN deployment for best cost/performance, feel free to contact me via LinkedIn or via https://donatemyfee.org/.

How to Host a Screaming Fast Site for $0.03/Month

perf

I had an idea. That’s always how it starts. Before I know it, I’ve purchased the domain name and I’m futzing around with some HTML but where am I going to host it and how much is this going to end up costing me?

That’s where I was when I came up with #DonateMyFee. “This is a site that is only going to cost me money”, I thought to myself (the whole point is for people to donate money rather than paying me). I really didn’t want to start shelling out big (or small) bucks on hosting.

Long story short, here is the recipe for a screaming fast website on a low budget:

Amazon S3

I’m not a huge fan of Amazon AWS, but S3 is useful enough to make it into my good graces. S3 is Amazon’s storage service. You upload static files into “buckets” and S3 can hold on to them, version them, and most importantly serve them via http. When configured to serve a bucket as a static website, S3 can be used to replace the load balancing and web serving infrastructure needed to serve a static website.

There are only two problems with that.

  1. You pay for S3 by the amount of traffic pulled from your bucket.
  2. Your “website” will be called something crazy ugly like donatemyfee.com.s3-website-eu-west-1.amazonaws.com

Regarding the price, S3 tries to get you three ways. They charge for the volume of the data being stored, for the number of requests made, and for the volume of the request throughput in GB. That said, the prices are very reasonable if we can keep the number of requests low. For that reason, a CDN is an absolute must. The CDN will also solve our second problem – the unfriendly S3 website name.

Often S3 is paired with Amazon’s CDN, Cloudfront, but I don’t recommend it. Cloudfront is expensive as CDN’s go and we’re on a budget. Even if we wanted to pay for the CDN, there are better performing options for less. CloudFlare is a great alternative with a free plan that will do us wonders.

CloudFlare

CloudFlare is one of several CDN by proxy + Webapp Firewall solutions that cropped up several years ago. Since the beginning, they have had a free plan and they have proven to be both innovative and competitive.

To use CloudFlare , we need to set their servers as your domain’s DNS name servers which can be a deal breaker in some cases. Once that’s setup we create a CNAME record in CloudFlare which points to the ugly S3 website name. CloudFlare has a new CNAME flattening technique which will allow us to configure this even for the root domain (without the www). This technique break some rules so I wouldn’t recommend it in every case, but in ours, it’s just what we need.

CloudFlare will cache all of our static content from S3 saving us from paying for the majority of the visits to the site. CloudFlare will also compress and optimize our content so it takes less time to reach the browser. Depending on what kind of traffic your site attracts, CloudFlare’s security settings can also protect you from all kinds of resource abuse, malicious traffic, hotlinking, etc.

Note: S3 will not properly identify the mime types for every file which means that some files might not be compressed properly by CloudFlare. You can fix this by changing the metadata for the files in S3. Specifically .ttf, .eot, and other typography related files are a problem.

Frugal Functionality

Having a cheaply hosted static website is nice but static can also be pretty useless. In order to get some functionality out of the site, you could go all jQuery on it but I that that is a road too often traveled these days. I’ve seen too many people include all of jQuery instead of writing 3 lines of JavaScript.

If we want this site to be fast we need to work frugally. If you take a look athttp://donatemyfee.com, you will see some examples of what I call “frugal functionality”.

The social share buttons are static links, not huge JavaScript widgets included from various social networks. Including external scripts is always a bad idea and they always hurt the performance of your site no matter what anyone tells you. Also, the icons and hover animations are CSS typography tricks. No JavaScript and no icon images downloaded.

The site is designed using responsive web design techniques which is “buzzword” for using a bunch of crafty CSS to make the same thing look decent on different sized screens. If we were a large company, I would say “Responsive web is for lazy companies and people without a budget to develop good looking, device targeted sites.” Since we’re on a budget, I’ll say it’s frugal 🙂

Last but not least, we have skimped on all the normal infrastructure that goes behind a website so our options for actually generating leads are a bit thin. We could go very old school with mailto links but in these days where webmail reigns supreme, they are getting pretty useless. Enter Google Forms.

Google Forms

If you haven’t been asked to fill out a Google Form yet, here’s your chance. Google lets you create fairly elaborate forms for free. The forms collect the answers and store them automatically in a Google Drive spreadsheet. There are more sophisticated options for processing the answers, and an entire extension ecosystem being built around the process. For us, the basic solution is more than enough.

Note: You can link to the form or embed it in an iframe. The form will take a bite out of your page load performance (iframes are a huge performance no-no). They will also annoy you with endless warnings, all of which you can nothing about, if you test your site performance with any of the free online services (Webpagetest,Websitetest, GTmetrix, PageSpeed, etc.). In this case, I used some simple (read jQuery-free) JavaScript to load the embeded iframe if it’s requested. This has the added benefit of keeping the user on-site to fill out the form and eliminating the page load time performance hit.

Less is more

Finally, the most important advice about web performance is always “Less is more”. There is no better way to ensure that a page loads quickly than to make it smaller in every way possible. Use less and smaller pictures. Combine, compress and minify everything. Reduce the number of requests.

If you’re interested in getting my help with your site, contact me via LinkedIn or#DonateMyFee . All consulting fees go directly from you to a tax deductible charity in your/your company’s name.