Category: Performance

Triton Bare Metal Containers FTW!

triton

If you haven’t heard of Docker, I’m not sure what cave you’ve been living in but here’s the short story:

Hardware level virtualization, like you are used to (VMWare, Virtualbox, KVM, Xen, Amazon, Rackspace, Azure, etc.) is slow and awful. Virtualization at the OS level, where processes share isolated access to a single Operating System kernel, is much more efficient and therefore awesome. Another way of saying OS level virtualization is “containers” such as have existed in FreeBSD and Solaris for over a decade.

Docker is hacking together a whole slew of technologies (cgroups, union filesystems, iptables, etc.) to finally bring the concept of containers to the Linux masses. Along the way, they’ve also managed to evolve the concept a little adding the idea of running a container as very portable unit which runs as a process.

Instead of managing dependencies across multiple environments and platforms, an ideal Docker container encapsulates all the runtime dependencies for a service or process. Instead including a full root file system, the ideal Docker container could be as small as a single binary. Taking that hand in hand with the growing interest in developing reusable microservices, we have an amazing tech revolution on our hands.

That is not to say that everything in the Docker world is all roses. I did say that Docker is hacking together a slew of technologies so while marketing and demos portray:

You are more likely to start with something like this:

Either way, tons of containers per host is great until you realize you are lugging them around on a huge, slow, whale of a cargo ship.

  1. Currently, Docker’s level of isolation between containers is not that great so security and noisy neighbors are issues.
  2. Containers on the same host can’t easily run on the same ports so you may have to do some spaghetti networking.
  3. On top of that, if you are running Docker in Amazon, Google, Azure, etc. you are missing the whole point which was to escape the HW level virtualization.

Joyent to the rescue!

Joyent is the only container based cloud provider that I’m aware of. They have been running the vast majority of my cloud instances (possibly yours as well) on OS level virtualization for years now (years before Docker was a twinkle in Shamu’s eye). As such, they are possibly the most experienced and qualified technical leaders on the subject.

They run a customized version of Illumos, an OpenSolaris derivative, with extremely efficient zone technology for their containers. In January (Linux and Solaris are Converging but Not the Way you Think), I wrote about the strides Joyent made allowing Linux binaries to run inside Illumos zones.

Triton May Qualify as Witchcraft

The love child of that work, announced as GA last week, was Triton- a Docker API compliant (for the most part) service running zone based Docker containers on bare metal in the cloud. If running on bare metal weren’t enough of an advantage, Joyent completely abstracted away the notion of the Docker host (ie. the cargo ship). Instead, your Docker client speaks to an API endpoint which schedules your bare metal containers transparently across the entire cloud.

Addressing each of the points I mentioned above:

  1. Zones in Illumos/Joyent provide complete isolation as opposed to Linux based containers so no security or noisy neighbor problems.
  2. Every container gets it’s own public ip address with a full range of ports so no spaghetti networking
  3. No Docker host and no HW virtualization so every container is running full speed on bare metal

Going back to the boat analogy, if Docker containers on Linux in most clouds looks like this:

Docker containers as zones on bare metal in Joyent look like this:

Enough of the FUD

I’m not a big fan of marketing FUD so I’ve been kicking the tires on the beta version of Triton for a while. Now with the GA, here are the pros and cons.

Pros:

  1. Better container isolation
  2. Better networking support
  3. Better performance
  4. No overhead managing a Docker host
  5. Great pricing (per minute and much lower pricing)
  6. User friendly tooling in the portal, including log support and running commands on containers using docker exec.

Cons:

  1. The API still isn’t fully supported so things like build and push don’t work. You can mostly work around this using a docker registry.
  2. Lack of a Docker Host precludes using some of the patterns that have emerged for logging, monitoring, and sharing data between containers.

Summary

Docker is a game changer but it is far from ready for prime time. Triton is the best choice available today for running container based workloads in general, and for production Docker workloads specifically.

At Codefresh, we’re working to give you the benefits of containers in your development workflow without the headache of actually keeping the ship afloat yourselves.  Sign up for our free Beta service to see how simple we can make it for you to run your code and/or Docker containers in a clean environment. Take a look at my getting started walkthrough or contact me if you have any questions.

How To Setup a Multi-Platform Website With Cedexis

multiplatform

With the recent and ongoing DDOS attacks against Github, many sites hosted as Github Pages have been scrambling to find alternative hosting. I began writing this tutorial long before the attacks but the configuration you find here is exactly what you need to serve static content from multiple providers like Github Pages, Divshot, Amazon S3, etc.

In my previous article, I introduced the major components of Cedexis and how they fit together to build great multi-provider solutions. If you haven’t read it, I suggest taking a quick look to get familiar with the concepts.

In this article, I’ll show you, step-by-step, how to build a robust multi-provider, multi-platform website using the following Cedexis components:

  • Radar- provides real-time user measurements of provider performance and availability.
  • Sonar- provides lightweight efficient service availability health-checks.
  • OpenMix- DNS traffic routing based on the data from Radar and Sonar.
  • Portal- UI for configuration and reporting.

I’ll also briefly cover some of the reports available in the portal.

Basic Architecture

OpenMix applications all have the same basic architecture and the same basic building blocks. Here is a diagram for reference:

Preparations

For the purpose of the tutorial, I’ve built a demo site using the Amazon S3 and Divshot static web hosting services. I’ve already uploaded my content to both providers and made sure that they are responding.

Both of these services provide us a DNS hostname with which to access the sites.

Amazon S3 is part of the standard community Radar measurements, but as a recently launched service, Divshot hasn’t made the list yet. By adding test objects, we can eventually enable private Radar measurements instead.

Download the test objects from here:

I’ve uploaded them to a cedexis directory inside my web root on Divshot.

Configuring the Platforms

Platforms are, essentially, named data sets. Inside the OpenMix Application, we assign them to a service endpoint and the data they contain influences how Cedexis routes traffic. In addition, the platforms become dimensions by which we slice and dice data in the reports.

We need to define platforms for S3 and Divshot in Cedexis and connect each platform to their relevant data sources (Radar and Sonar).

Log in to the Cedexis portal here and click on the Add Platform button in the Platforms section.

 

 

We’ll find the community platform for Amazon S3 under the Cloud Storage category. It means that S3 performance will be monitored automatically by Cedexis’ Community Radar. You can leave the defaults on this screen:

After clicking next, we’ll get the opportunity to set up custom Radar and Sonar settings for this platform. We want to enable Sonar to make sure there are no problems with our specific S3 bucket which community Radar might not catch.

We’ll enable Sonar polls every 60 seconds (default) and for the test URL, I’ve put the homepage of the site.

We’ll save the platform and create another:

Divshot is somewhere in between Cloud Storage and Cloud Compute. It’s really only hosting static content so I’ve chosen the Cloud Storage category, but there is no real difference from Cedexis’ perspective. If they eventually add Divshot to their community metrics, it might end up in a different category.

Since Divshot isn’t one of the pre-configured Cloud Storage platforms, choose the platform “Other”.

The report name is what will show up in Cedexis charts when you want to analyze the data from this platform.

The OpenMix alias is how OpenMix applications will refer to this platform. Notice that I’ve called it divshot_production. That is because Divshot provides multiple environments for development, staging, and QA. In the future, we may define platforms for other environments as well.

Since there are no community Radar measurements for Divshot, we prepare private measurements of our own in the next step.

We are going to add three types of probes using the test objects which we downloaded above.

First the HTTP Response Time URL (If you are serving your content over SSL, as you should, then you should choose the HTTPS Response Time URL probe, etc.) The HTTP Response Time probe should use the Small Javascript Timing Object.

Click Add probe at the bottom left of the dialog to add the next probes.

In addition to the response time probe, we will add the Cold Start and Throughput probes to cover all our bases.

The Cold Start probe should also use the small test object.

The Throughput probe needs the large test object.

Make sure the test objects are correct in the summary page before setting up Sonar.

Configure the Sonar settings for Divshot similarly to those from S3 with the exception of using the homepage from Divshot for the test URL. Then click ‘Complete’ to save the platform and we are done setting up platforms for now.

A little bit of information about platforms. A nice thing about platforms is that they are decoupled from the OpenMix applications. That means that you can re-use a platform across multiple OpenMix applications with completely different logic. It also means you can slice and dice your data using application and platform as separate dimensions.

For example, if we had applications running in multiple datacenters, we would be interested to know about the performance and availability of each data center across all our applications. Conversely, we would also want to know if a specific application performs better in one data center than another. Cedexis hands us this data on a silver platter.

Our First OpenMix Application

Open the Application Configuration option under the OpenMix tab and click on the plus in the upper right corner to add a new application..

We’re going to select the Optimal RTT quick app for the application type. This app will send traffic to the platform with the best response time according to the information Cedexis has on the user making the request.

Define the fallback host. Note that this does not have to be one of the defined platforms. This host will be used in case the application logic fails or there is a system issue within Cedexis. In this case, I trust S3 slightly more than Divshot so I’ll configure it as the fallback host.

In the second configuration step, I’ve left the default TTL of 20 seconds. This means that users should check every 20 seconds to see if a different provider should be used to return requests. Once Cedexis detects a problem, the maximum time for users to be directed to a different provider should be approximately the same as this value.

In my experience, 20 seconds is a good value to use. It is long enough that users can browse one or two pages of a site before doing any additional DNS lookups and it is short enough to react to shorter downtimes.

Increasing this value will result in fewer requests to Cedexis. To save money, consider automatically changing TTLs via RESTful Web Services. Use lower TTLs during peaks, where performance could be more erratic, and use longer TTLs during low traffic periods to save request costs.

On the third configuration step, I’ve left all the defaults. The Optimal RTT quick app will filter out any platforms which have less than 80% availability before considering them as choices for sending traffic.

Depending on the quality of your services, you may decide to lower this number but you probably do not want it any higher. Why not eliminate any platform that isn’t reporting 100% available? The answer is that RUM measurements rely on the, sometimes poor quality, home networks of your users and, as a result, they can be extremely finicky. Expecting 100% availability from a high traffic service is unrealistic and leaving a threshold of 80% will help reduce thrashing and unwanted use of the fallback host.

Regarding eDNS, you pretty much always want this enabled since many people have switched to using public DNS resolvers like Google DNS instead of the resolvers provided by their ISPs.

Shared resolvers break assumptions made by traffic optimization solutions and the eDNS standard works around this problem, passing information about the request origin to the authoritative DNS servers. Cedexis has supported eDNS from the beginning but many services still don’t.

In the final step, we will configure the service endpoints for each of the platforms we defined.

In our case, we are just associating the hostname aliases that Amazon and Divshot gave us with the correct platform and it’s Radar/Sonar data.

In a more complicated setup, you might have a platform per region of a cloud and service endpoints with different aliases or CNAMEs across each region.

Pay attention that each platform in the application has an “Enabled” checkbox. This makes it easy to go into an application and temporarily stop sending traffic to a specific platform. It is very useful avoiding downtime in case of maintenance windows, migrations, or intermittent problems with a provider.

Choose the Add Platform button on the bottom left corner of the dialog to add the second platform, not the Complete button on the bottom right.

Define the Divshot platform like we did for S3 and click Complete.

You should get a success message with the CNAME for the Cedexis application alias. Click “Publish” to activate the OpenMix application right away.

Alternatively, clicking “Done” would leave the application configured but inactive. When editing applications, you will get a similar dialog. Leaving changes saved but unpublished can be a useful way to stage changes to be activated later with the push of a button.

Building a Custom OpenMix Application

The application we just created will work, but it doesn’t take advantage of the Sonar data that we configured. To consider the Sonar metrics, we will create acustom OpenMix app and by custom, I mean copy and paste the code from Cedexis’ GitHub repository. If you’re squeamish about code, talk to your account manager and I’m sure he’ll be able to help you.

The specific code we are going to adapt can be found here (Note: I’ve fixed the link to a specific revision of the repo to make sure the instructions match, but you might choose to take the latest revision.)

We only need to modify definitions at the beginning of the script:

Let’s create a new application using the new script. Then we can switch back and forth between them if we want. We’ll start by duplicating the original app. Expand the app’s row and click Duplicate.

Switch the application type to that of a Custom Javascript app, modify the name and description to reflect that we will use Sonar data and click next.

Leave the fallback and TTL as is on the next screen.

In the third configuration step, we’ll be asked to upload our custom application.
Choose the edited version of the script and click through to complete the process.

As before, publish the application to activate it.

Adding Radar Support to our Service

At this point, Cedexis is already gathering availability data on both our platforms via Sonar. Since we used the community platform for S3, we also have performance data for that. To finish implementing private Radar for Divshot, we must include the Radar tag in our pages so our users start reporting on their performance.

We get our custom JavaScript tag from the portal (Note: If you want to get really advanced, you can go into the tag configuration section and set some custom parameters to control the behavior of the Radar tag, for example, how many test to run on each page load, etc.)
Copy the tag to your clipboard and add it in your pages on all platforms.

Going Live

Before we go live, we should really test out the application with some manual DNS requests, disabling and enabling platforms, to see that the responses change, etc.

Once we’re satisfied, the last step is to change the DNS records to point at the CNAME of the OpenMix application that we want to use. I’ll set the DNS to point at our Sonar enabled application.

A useful service to check how our applications are working ishttps://www.whatsmydns.net/. This will show how our application CNAMEs resolve from location around the world. For example, if I check the CNAME resolution for the application we just created, I get the following results:

By and large, the Western Hemisphere prefers Divshot while the Eastern Hemisphere prefers Amazon S3 in Europe. This is completely understandable. Interestingly, there are exceptions in both directions. For example, in this test, OpenMix refers the TEANET resolver from Italy to Divshot while the Level 3 resolver in New York is referred to Amazon S3 in Europe. If you refresh the test every so often, you will see that the routings change.

Since this demo site isn’t getting any live traffic, I’ve generated traffic to show off the reports. First the dashboard which gives you a quick overview of your Cedexis traffic on login. Here we show that the majority of the traffic came from North America, and a fair amount came from Europe as well. We also see that, for the most part, the traffic was split evenly between the two platforms.

To examine how Cedexis is routing our traffic, we look at the OpenMix Decision Report. I’ve added a secondary dimension of ‘Platform’ to see how the decisions have been distributed. You see that sometimes Amazon S3 is preferred and other times Divshot.

To figure out why requests were routed one way or the other, we can drill down into the data using the other reports. First, let’s check the availability stats in the Radar Performance Report. For the sake of demonstration, I’ve drilled down into availability per continent. In Asia, we see shoddy availability from Divshot but Amazon S3 isn’t perfect either. Since we didn’t see much traffic from Asia, this probably didn’t affect the traffic distribution. Theoretically, a burst of Asian traffic would result in more traffic going to Amazon.

In Europe, Divshot generally showed better availability than Amazon, reporting 100% except for a brief outage.

In North America, we see a similar graph. As to be expected, the availability of Amazon S3 in Europe is lower and less stable in North America. Divshot shows 100% availability which is also expected.

It’s important to note that the statistics here are skewed because we are comparing our private platform, measured only by our users to the community data from S3. The community platform collects many more data points in comparison to our private platform and it’s almost impossible for it to show 100% availability. This is also why we chose an 80% availability threshold when we built the OpenMix Application.

Next let’s look at the performance reports for response times of each platform. With very little traffic from Asia, the private performance measurements for Divshot are pretty erratic. With more traffic, the graph should stabilize into something more meaningful.

The graph for Europe behaves as expected showing Amazon S3 outperforming Divshot consistently.

The graph for North America also behaves as expected with Divshot consistently outperforming Amazon S3.So we’ve seen some basic data on how our traffic performs. Cedexis doesn’t stop there. We can also take a look at how our traffic could perform if we add a new provider. Let’s see how we could improve performance in North America by adding other community platforms to our graph.

I’ve added Amazon S3 in US-East, which shows almost a 30ms advantage on Divshot, though our private measurement still need to be taken lightly with so little traffic behind them. Even better performance comes from Joyent US-East. Using Joyent will require us to do more server management but if we really care about performance, Cedexis shows that it will provide a major improvement.

Summary

To recap, In this tutorial, I’ve demonstrated how to set up a basic OpenMix application to balance traffic between two providers. Balancing between multiple CDNs or implementing a Hybrid Datacenter+CDN architecture is just as easy.

Cedexis is a great solution for setting up a multi-provider infrastructure. With the ever-growing Radar community generating performance metrics from around the globe, Cedexis provides both real-time situational awareness for your services operational intelligence so you can make well-informed decisions for your business.

Unpacking Cedexis; Creating Multi-CDN and Multi-Cloud applications

package

Technology is incredibly complex and, at the same time, incredibly unreliable. As a result, we build backup measures into the DNA of everything around us. Our laptops switch to battery when the power goes out. Our cellphones switch to cellular data when they lose connection to WiFi.

At the heart of this resilience, technology is constantly choosing between the available providers of a resource with the idea that if one provider becomes unavailable, another provider will take its place to give you what you want.

When the consumers are tightly coupled with the providers, for example, a laptop consuming power at Layer 1 or a server consuming LAN connectivity at Layer 2, the choices are limited and the objective, when choosing a provider, is primarily one of availability.

Multiple providers for more than availability.

As multi-provider solutions make their way up the stack, however, additional data and time to make decisions enable choosing a provider based on other objectives like performance. Routing protocols, such as BGP, operate at Layer 3. They use path selection logic, not only to work around broken WAN connections but also to prefer paths with higher stability and lower latency.

As pervasive and successful as the multi-provider pattern is, many services fail to adopt a full stack multi-provider strategy. Cedexis is an amazing service which has come to change that by making it trivial to bring the power of intelligent, real-time, provider selection your application.

I first implemented Multi-CDN using Cedexis about 2 years ago. It was a no-brainer to go from Multi-CDN to Multi-Cloud. The additional performance, availability, and flexibility for the business became more and more obvious over time. Having a good multi-provider solution is key in cloud-based architectures and so I set out to write up a quick how-to on setting up a Multi-Cloud solution with Cedexis; but first you need to understand a bit about how Cedexis works.

Cedexis Unboxed

Cedexis has a number of components:

  1. Radar
  2. OpenMix
  3. Sonar
  4. Fusion

OpenMix

OpenMix is the brain of Cedexis. It looks like a DNS server to your users but, in fact, it is a multi-provider logic controller. In order to setup multi-provider solutions for our sites, we build OpenMix applications. Cedexis comes with the most common applications pre-built but the possibilities are pretty endless if you want something custom. As long as you can get the data you want into OpenMix, you can make your decisions based on that data in real time.

Radar

Radar is where Cedexis really turned the industry on their heads. Radar uses a javascript tag to crowdsource billions of Real User Monitoring (RUM) metrics in real time. Each time a user visits a page with the Radar tag, they take a small number of random performance measurements and send the data back to Cedexis for processing.

The measurements are non-intrusive. They only happen several seconds after your page has loaded and you can control various aspects of what and how much gets tested by configuring the JS tag in the portal.

It’s important to note that Radar has two types of measurements:

  1. Community
  2. Private.

Community Radar

Community measurements are made against shared endpoints within each service provider. All Cedexis users that implement the Radar tag and allow community measurements get free access to the community Radar statistics. The community includes statistics for the major Cloud compute, Cloud storage, and CDNs making Radar the first place I go to research developments and trends in the Cloud/CDN markets.

Community Radar is the fastest and easiest way to use Cedexis out of the box and the community measurements also have the most volume so they are very accurate all the way up to the “door” of your service provider. They do have some disadvantages though.

The community data doesn’t account for performance changes specific to each of the provider’s tenants. For example, community Radar for Amazon S3 will gather RUM data for accessing a test bucket in the specified region. This data assumes that within the S3 region all the buckets perform equally.

Additionally, there are providers which may opt out of community measurements so you might not have community data for some providers at all. In that case, I suggest you try to connect between your account managers and get them included. Sometimes it is just a question of demand.

Private Radar

Cedexis has the ability to configure custom Radar measurements as well. These measurements will only be taken by your users, the ones using your JS tag.

Private Radar lets you monitor dedicated and other platforms which aren’t included in the community metrics. If you have enough traffic, private Radar measurements have the added bonus of being specific to your user base and of measuring your specific application so the data can be even more accurate than the community data.

The major disadvantage of private Radar is that low volume metrics may not produce the best decisions. With that in mind, you will want to supplement your data with other data sources. I’ll show you how to set that up.

Situational Awareness

More than just a research tool, Radar makes all of these live metrics available for decision-making inside OpenMix. That means we can make much more intelligent choices than we could with less precise technologies like Geo-targeting and Anycast.

Most people using Geo-targeting assume that being geographically close to a network destination is also closer from the networking point of view. In reality, network latency depends on many factors like available bandwidth, number of hops, etc. Anycast can pick a destination with lower latency, but it’s stuck way down in Layer 3 of the stack with no idea about application performance or availability.

With Radar, you get real-time performance comparisons of the providers you use, from your user’s perspectives. You know that people on ISP Alice are having better performance from the East coast DC while people on ISP Bob are having better performance from the Midwest DC even if both these ISPs are serving the same geography.

Sonar

Whether you are using community or low volume private Radar measurements, you ideally want to try and get more application specific data into OpenMix. One way to do this is with Sonar.

Sonar is a synthetic polling tool which will poll any URL you give it from multiple locations and store the results as availability data for your platforms. For the simplest implementation, you need only an address that responds with an OK if everything is working properly.

If you want to get more bang for your buck, you can make that URL an intelligent endpoint so that if your platform is nearing capacity, you can pretend to be unavailable for a short time to throttle traffic away before your location really has issues.

You can also use the Sonar endpoints as a convenient way to automate diverting traffic for maintenance windows- No splash pages required.

Fusion

Fusion is really another amazing piece of work from Cedexis. As you might guess from its name, Fusion is where Cedexis glues services together and it comes in two flavors:

  1. Global Purge
  2. Data Feeds

Global Purge

By nature, one of the most apropos uses of Cedexis is to marry multiple CDN providers for better performance and stability. Every CDN has countries where they are better and countries where they are worse. In addition, maintenance windows in a CDN provider can be devastating for performance even though they usually won’t cause downtime.

The downside of a Multi-CDN approach is the overhead involved in managing each of the CDNs and most often that means purging content from the cache. Fusion allows you to connect to multiple supported CDN providers (a lot of them) and purge content from all of them from one interface inside Cedexis.

While this is a great feature, I have to add that you shouldn’t be using it. Purging content from a CDN is very Y2K and you should be using versioned resources with far futures expiry headers to get the best performance out of your sites and so you never have to purge content from a CDN ever again.

Data Feeds

This is the really great part. Fusion lets you import data from basically anywhere to use in your OpenMix decision making process. Built in, you will find connections to various CDN and monitoring services, but you can also work with Cedexis to setup custom Fusion integrations so the sky’s the limit.

With Radar and Sonar, we have very good data on performance and availability (time and quality) both from the real user perspective and a supplemental synthetic perspective. To really optimize our traffic we need to account for all three corners of the Time, Cost, Quality triangle.

With Fusion, we can introduce cost as a factor in our decisions. Consider a company using multiple CDN providers, each with a minimum monthly commitment of traffic. If we would direct traffic based on performance alone, we might not meet the monthly commitment on one provider but be required to pay for traffic we didn’t actually send. Fusion provides usage statistics for each CDN and allows OpenMix to divert traffic so that we optimize our spending.

Looking Forward

With all the logic we can build into our infrastructure using Cedexis, it could almost be a fire and forget solution. That would, however, be a huge waste. The Internet is always evolving. Providers come and go. Bandwidth changes hands.

Cedexis reports provide operational intelligence on alternative providers without any of the hassle involved in a POC. Just plot the performance of the provider you’re interested in against the performance of your current providers and make an informed decision to further improve your services. When something better come along, you’ll know it.

The Nitty Gritty

Keep an eye out for the next article where I’ll do a step by step walk-through on setting up a Multi-Cloud solution using Cedexis. I’ll cover almost everything mentioned here, including Private and Community Radar, Sonar, Standard and Custom OpenMix Applications, and Cedexis Reports.

Your Graphic Designer is Tanking Your Site!

mona lisa

Your graphic designer is an artist, a trained expert in aesthetics, a master at conveying messages via images and feelings via fonts. He may also be slowing your site down so much that nobody is seeing it.

Artists tend to be heavy on the quality and lighter on the practicality of what they deliver. It’s not entirely their fault. Even the most conscientious and experienced designer needs to sell his work and quality sells. Do marketing departments want to see their company advertised in 4K glory or on their mom’s 19″ LCD?

Quality isn’t worth the cost.

The reality of the Internet is that too much quality costs more than it’s worth. It costs bandwidth and it costs customers who aren’t willing to wait for heavy sites to load.

Is there a subliminal value to high quality graphics? The answer is yes but only if someone sees them.

How much do the images make a difference?

Here is a quick experiment you can do to visualize some of the improvement you could see. Go into your browser settings and disable all images. Then go back and visit your website to see how it loads without the bloat.

I just tried this on a the homepage of a major news network. With images the site was over 4MB. You don’t even see most of the pictures that were loaded. Without images the site was under 2MB (still very high to be honest). That basically means that, according to the laws of physics and in the best case scenario, the site with images will take at least twice as long to load.

You might say they are a huge media site. They know what they’re doing. They need the images. The sad truth is that they are wasting your bandwidth for no good reason as I’ll demonstrate shortly.

How to fix it?

I won’t tell you to get rid of all the images on your site, though if that can fit with your needs, less is always faster. You do, however, need to optimize your images for the web.

As a performance engineer who has examined the performance of literally hundreds of websites, the number one problem is always images that haven’t been optimized. Just optimizing your images properly can cut the download time of a website in half, possibly more.

As a continuation of our experiment above I took this picture from their page and tried optimizing it to see what kind of savings I could get. Here is the original 41KB version:

Here is the optimized 15KB version:

The result is almost 1/3 the size and I don’t think you can tell the difference. Would you notice any difference if this was just another image on a website? You too could get a 50-60% performance boost, just by optimizing your images.

If you are using some type of automated deployment for your sites, there are tools which will optimize your images automatically. They are OK for basic optimizations.

To really optimize your images, you need a person with a realistic eye (graphic designers are always biased towards heavier higher quality images) and you need to follow these basic rules:

Use the right images for the job.

In general, properly compressed, Progressive JPEG files are the best to use but they don’t support transparency. After weighing carefully if you need a transparent image or not, use Progressive JPEG files for any image that has more colors than you can count on your hand and doesn’t require transparency. Otherwise use a PNG file.

Optimize the images.

Optimizing JPEG files

First, manually reduce the file’s quality level until you reach the setting with acceptable levels of pixelation in the sharp edges and gradient surfaces of the image.

Note that you should always start optimizing from the original image at it’s best quality. Repeatedly editing a JPEG file will degrade the quality of the image with each generation.

After you have reached the optimal quality level, run them through a tool likeImageOptim which will remove any thumbnails or metadata adding weight to the image.

Optimizing PNG files

Optimize PNG files first by running them through a tool like ImageAlpha orTinyPNG. These tools use lossy compression techniques to reduce the number of colors in your image fitting them into a smaller bitmap, and resulting in better compression than a PNG would normally have.

Note: ImageAlpha gives you more control over the process letting you decide how many colors should be used in the resulting image. This is very useful for transparent PNGs with very few colors.

Then run the images through ImageOptim or similar tool to squeeze some extra space out of them.

Once you have mastered the above, your site should be much lighter. If you want to optimize further (and you should), look into the following techniques:

Don’t use images where you don’t have to.

Many of the most common icons used on websites have been implemented as fonts. Once a font is loaded, each icon is a single character and making the icon larger or smaller is as simple as using a larger font size.

Note: There is some overhead in loading and using the fonts (additional CSS) so use wisely.

Combine small images.

Similar to the idea of using fonts, is the CSS Sprite. These combine multiple small images into a single transparent PNG file and then show only one image at a time on your site using CSS tricks.

The advantage of this technique is that it saves requests to the web server. Each request to a web server has latency, sometimes as much as 200ms, which can’t be eliminated and for small images, this request latency can sometimes account for more time than downloading the image itself. By combining the smaller images in your site, you save that overhead.

There are tools which will generate the combined images and the corresponding CSS for you.

Summary

I’ve used these image optimization techniques in hundreds of websites resulting in significant savings of bandwidth and increased performance. Most importantly, though, these techniques result in better visitor engagement.

If you’re interested in optimizing your site for best cost/performance, feel free to contact me via LinkedIn or via https://donatemyfee.org/.

Couchbase is Simply Awesome

couchbase

Here are five things that make Couchbase a go-to service in any architecture.

Couchbase is simple to setup.

Keep It Simple. It’s one of the axioms of system administration. Couchbase, though complicated under the hood, makes it very simple to setup even complicated clusters spanning multiple data centers.

Every node comes with a very user friendly web interface including the ability to monitor performance across all the nodes in the same machine’s cluster.

Adding nodes to a cluster is as simple as plugging in the address of the new node after which, all the data in the cluster is automatically rebalanced between the nodes. The same is true when removing nodes.

Couchbase is built to never require downtime which makes it a pleasure to work with.

If you are into automation a la chef, etc., Couchbase supports configuration via REST api. There are cookbooks available. I’m not sure about other configuration management tools but they probably have the relevant code bits as well.

Couchbase replaces Memcached

Even if you have no need for a more advanced NoSQL solution, there is a good chance you are using Memcached, Couchbase is the original Memcached on steroids.

Unlike traditional Memcached, Couchbase supports clustering, replication, and persistence of data. Using the Moxi Memcached proxy that comes with Couchbase, your apps can talk Memcached protocol to a cluster of Couchbase servers and get the benefits of automatic sharding and failover. If you want, Couchbase can also persist the Memcached data to disk turning your Memcached into a persistent, highly available key value store.

Couchbase is also a schema-less NoSQL DB

Aside from support for simple Memcached key/value storage, Couchbase is a highly available, easy to scale, JSON based DB with auto-sharding and built in map reduce.

Traditionally, Couchbase uses a system called views to perform complicated queries on the JSON data but they are also working on a new query language called N1QL which brings tremendous additional ad hoc query capabilities.

Couchbase also supports connectivity to Elastic Search, Hadoop, and Talend.

Couchbase is all about global scale out

Adding and removing nodes is simple and every node in a Couchbase cluster is read and write capable all the time. If you need more performance, you just add more nodes.

When one data center isn’t enough, Couchbase has a feature called cross data center replication (XDCR), letting you easily setup unidirectional or bidirectional replication between multiple Couchbase clusters over WAN. You can even setup full mesh replication though it isn’t clearly described in their documentation.

Unlike MongoDB, which can only have one master, Couchbase using XDCR allows apps in any data center to write to their local Couchbase cluster and that data will be replicated to all the other data centers.

I recently setup a system using five Couchbase clusters across the US and Europe, all connected in a full mesh with each other. In my experience, data written in any of the data centers updated across the globe in 1-2 seconds max.

Couchbase is only getting better

Having used Couchbase built from source (read community support only) since version 2.1 (Couchbase is now at 3.0.2), I can say that it is only getting better. They have made amazing progress with XDCR, added security functionality, and the N1QL language.

The Couchbase community is great. Checkout the IRC channel if you need help.