Category: Architecture

Ynet on AWS. Let’s hope we don’t have to test their limits.

tightrope

In Israel, more than in most places, no news is good news. Ynet, one of the largest news sites in Israel, recently posted a case study (at the bottom of this article) on handling large loads by moving their notification services to AWS.

“We used EC2, Elastic Load Balancers, and EBS… Us as an enterprise, we need something stable…”

They are contradicting themselves in my opinion. EBS and Elastic Load Balancers (ELB) are the two AWS services which fail the most and fail hardest with multiple downtimes spanning multiple days each.

EBS: Conceptually flawed, prone to cascading failures

EBS, a virtual block storage service, is conceptually flawed and prone to severe cascading failures. In recent years, Amazon has improved reliability somewhat, mainly by providing such a low level of service on standard EBS, that customers are default to paying extra for reserved IOPS and SSD backed EBS volumes.

Many cloud providers avoid the problematic nature of virtual block storage entirely, preferring compute nodes based on local, direct attached storage.

ELB: Too slow to adapt, silently drops your traffic

In my experience, ELBs are too slow to adapt to spikes in traffic. About a year ago, I was called to investigate availability issues with one of our advertising services. The problems were intermittent and extremely hard to pin down. Luckily, as a B2B service, our partners noticed the problems. Our customers would have happily ignored the blank advertising space.

Suspecting some sort of capacity problem, I ran some synthetic load tests and compared the results with logs on our servers. Multiple iterations of these tests with and without ELB in the path confirmed a gruesome and silent loss of 40% of our requests when traffic via Elastic Load Balancers grew suddenly.

The Elastic Load Balancers gave us no indication that they were dropping requests and, although they would theoretically support the load once Amazon’s algorithms picked up on the new traffic, they just didn’t scale up fast enough. We wasted tons of money in bought media that couldn’t show our ads.

Amazon will prepare your ELBs for more traffic if you give them two weeks notice and they’re in a good mood but who has the luxury of knowing when a spike in traffic will come?

Recommendations

I recommend staying away from EC2, EBS, and ELB if you care about performance and availability. There are better, more reliable providers like Joyent. Rackspace without using their cloud block storage (basically the same as EBS with the same flaws) would be my second choice.

If you must use EC2, try to use load balancing AMIs from companies like Riverbed or F5 instead of ELB.

If you must use ELB, make sure you run synthetic load tests at random intervals and make sure that Amazon isn’t dropping your traffic.

Conclusion

In conclusion, let us hope that we have no reasons to test the limits of Ynet’s new services, and if we do, may it only be good news.

Linux and Solaris are Converging but Not the Way You Imagined

tux

In case you haven’t been paying attention, Linux is in a mad dash to copy everything that made Solaris 10 amazing when it launched in 2005. Everyone has recognized the power of Zones, ZFS and DTrace but licensing issues and the sheer effort required to implement the technologies has made it a long process.

ZFS

ZFS is, most probably, the most advanced file system in the world. The creators of ZFS realized, before anyone else, that file systems weren’t built to handle the amounts of data that the future would bring.

Work to port ZFS to Linux began in 2008 and a stable port of ZFS from Illumos was announced in 2013. That said, even 2 years later, the latest release still hasn’t reached feature parity with ZFS on Illumos. With developers preferring to develop OpenZFS on Illumos and the licensing issues preventing OpenZFS from being distributed as part of the Linux Kernel, it seems like ZFS on Linux (ZOL) may be doomed to playing second fiddle.

DTrace

DTrace is the most advanced tool in the world for debugging and monitoring live systems. Originally designed to help troubleshoot performance and other bugs in a live Solaris kernel, it quickly became extremely useful in debugging userland programs and run times.

Oracle has been porting DTrace since at least 2011 and while they both own the original and have prioritized the most widely used features, they still haven’t caught up to the original.

Zones

Solaris Zones are Operating System level virtual machines. They are completely isolated from each other but all running on the same kernel so there is only one operating system in memory. Zones have great integration with ZFS, DTrace, and all the standard system monitoring tools which makes it very easy to support and manage servers with hundreds of Zones running on them. Zones also natively support a mechanism called branding which allows the kernel to provide different interfaces to the guest zone. In Oracle Solaris, this is used to support running zones from older versions of Solaris on a machine running a newer OS.

Linux containers of some type or another have been around for a while, but haven’t gotten nearly as mature as Zones. Recently, the continued failure of traditional hypervisors to provide bare metal performance in the cloud, coupled with the uptake of Docker, has finally gotten the world to realize the tremendous benefits of container based virtualization like Zones.

The current state of containers in Linux is extremely fractured with at least 5 competing projects that I know of. LXC, initially released in 2008, seems to be the favorite but historically had serious privilege separation issues but has gotten a little better if you can meet all the system requirements.

Joyent has been waiting at the finish line.

While Linux users wait and wait for mature container solutions, full OS and application visibility, and a reliable and high performance file system, Joyent has been waiting to make things a whole lot easier.

About a year ago, David Mackay showed some interest in Linux Branded Zones, work which had been abandoned in Illumos. In the spring of 2014, Joyent started work on resurrecting lx-zones and in September, they presented their work. They already have working support for 32 bit and some 64 bit Linux binaries in Linux branded SmartOS Zones. As part of the process, they are porting some of the main Linux libraries and facilities to native SmartOS which will make porting Linux code to SmartOS much easier.

The upshot of it is that you can already get ZFS, Dtrace, and Linux apps inside a fully isolated, high performance, SmartOS zone. With only 9 months or so of work behind it, there are still some missing pieces to the Linux support, but, considering how long Linux has been waiting, I’m pretty sure SmartOS will reach feature parity with Linux a lot faster than Linux will reach feature parity with SmartOS.

SmartDataCenter, the Open Cloud Platform that Actually Already Works

sdc

For years enterprises have tried to make OpenStack work and failed miserably. Considering how many heads have broken against OpenStack, maybe they should have called it OpenBrick.

Before I dive into the details, I’ll cut to the chase. You don’t have to break your heads on cloud anymore. Joyent have open sourced (as in get it on Github) their cloud management platform.

It’s free if you want (install it on your laptop, install it on a server). It’s supported if you want. Best of all, it actually works outside of a lab or CI test suite. It’s what Joyent runs in production for all their public cloud customers (I admit to being one of the satisfied ones). It’s also something they have been licensing out to other cloud providers for years.

Now for the deep dive.

What’s wrong with OpenStack?

First off, it isn’t a cloud in a box, which is what most people think it is. In 2013,Gartner called out OpenStack for consciously misrepresenting what OpenStack actually provides:

no one in three years stood up to clarify what OpenStack can and cannot do for an enterprise.

In case you’re wondering, the analyst also quoted Ebay’s chief engineer on the true nature of OpenStack:

… an instance of an OpenStack installation does not make a cloud. As an operator you will be dealing with many additional activities not all of which users see. These include infra onboarding, bootstrapping, remediation, config management, patching, packaging, upgrades, high availability, monitoring, metrics, user support, capacity forecasting and management, billing or chargeback, reclamation, security, firewalls, DNS, integration with other internal infrastructure and tools, and on and on and on. These activities are bound to consume a significant amount of time and effort. OpenStack gives some very key ingredients to build a cloud, but it is not cloud in a box.

The analyst made it clear that:

vendors get this difference, trust me.

Other insiders put the situation into similar terms:

OpenStack has some success stories, but dead projects tell no tales. I have seen no less than 100 Million USD spent on bad OpenStack implementations that will return little or have net negative value.Some of that has to be put on the ignorance and arrogance of some of the organizations spending that money, but OpenStack’s core competency, above all else, has been marketing and if not culpable, OpenStack has at least been complicit.

The motive behind the deception is clear. OpenStack is like giving someone a free Ferrari, pink slip and all but keeping the keys. You get pieces of a cloud but no way to run it. Once you have put all your effort into installing OpenStack and you realize what’s missing, you are welcome to turn to any one of the vendors backing OpenStack for one of their packaged cloud platforms.

OpenStack is a foot in the door. It’s a classic bait and switch but even after years, no one is admitting it. Instead, blue chip companies fight to steer OpenStack into the direction that suits them and their corporate offerings.

What’s great about SmartDataCenter?

It works.

The keys are in the ignition. You should probably stop reading this article and install it already. You are likely to get a promotion for listening to me 😉

Great Technology

SmartDataCenter was built on really great technologies like SmartOS (fork of Solaris), Zones, ZFS, and DTrace. Most of these technologies are slowly being ported to Linux but they are already 10 years mature in SDC.

  • Being based on a fork of Solaris brings you baked in enterprise ready features like IPSEC, IPF, RBAC, SMF, Resource management and capping, System auditing, Filesystem monitoring, etc.
  • Zones are the big daddy of container technology guaranteeing you the best on-metal performance for your cloud instances. If you are running a native SmartOS guest, you get the added benefit of CPU bursting and live machine re-size (no reboot, or machine pause necessary).
  • ZFS is the most reliable, high performance, file system in the world and is constantly improving.
  • DTrace is the secret to low level visibility with zero to no overhead. In cloud deployments where visibility is usually close to zero, this is an amazing feature. It’s even more amazing as the cloud operator.

Focus

SDC was built for one thing by one company, to replace the data centers of the past. It says so in the name. With one purpose, SDC has been built to be veryopinionated about what it does and how it does it. This gives SDC a tremendous amount of focus, something sorely lacking from would-be competition like OpenStack.

Lastly, it works.

Couchbase is Simply Awesome

couchbase

Here are five things that make Couchbase a go-to service in any architecture.

Couchbase is simple to setup.

Keep It Simple. It’s one of the axioms of system administration. Couchbase, though complicated under the hood, makes it very simple to setup even complicated clusters spanning multiple data centers.

Every node comes with a very user friendly web interface including the ability to monitor performance across all the nodes in the same machine’s cluster.

Adding nodes to a cluster is as simple as plugging in the address of the new node after which, all the data in the cluster is automatically rebalanced between the nodes. The same is true when removing nodes.

Couchbase is built to never require downtime which makes it a pleasure to work with.

If you are into automation a la chef, etc., Couchbase supports configuration via REST api. There are cookbooks available. I’m not sure about other configuration management tools but they probably have the relevant code bits as well.

Couchbase replaces Memcached

Even if you have no need for a more advanced NoSQL solution, there is a good chance you are using Memcached, Couchbase is the original Memcached on steroids.

Unlike traditional Memcached, Couchbase supports clustering, replication, and persistence of data. Using the Moxi Memcached proxy that comes with Couchbase, your apps can talk Memcached protocol to a cluster of Couchbase servers and get the benefits of automatic sharding and failover. If you want, Couchbase can also persist the Memcached data to disk turning your Memcached into a persistent, highly available key value store.

Couchbase is also a schema-less NoSQL DB

Aside from support for simple Memcached key/value storage, Couchbase is a highly available, easy to scale, JSON based DB with auto-sharding and built in map reduce.

Traditionally, Couchbase uses a system called views to perform complicated queries on the JSON data but they are also working on a new query language called N1QL which brings tremendous additional ad hoc query capabilities.

Couchbase also supports connectivity to Elastic Search, Hadoop, and Talend.

Couchbase is all about global scale out

Adding and removing nodes is simple and every node in a Couchbase cluster is read and write capable all the time. If you need more performance, you just add more nodes.

When one data center isn’t enough, Couchbase has a feature called cross data center replication (XDCR), letting you easily setup unidirectional or bidirectional replication between multiple Couchbase clusters over WAN. You can even setup full mesh replication though it isn’t clearly described in their documentation.

Unlike MongoDB, which can only have one master, Couchbase using XDCR allows apps in any data center to write to their local Couchbase cluster and that data will be replicated to all the other data centers.

I recently setup a system using five Couchbase clusters across the US and Europe, all connected in a full mesh with each other. In my experience, data written in any of the data centers updated across the globe in 1-2 seconds max.

Couchbase is only getting better

Having used Couchbase built from source (read community support only) since version 2.1 (Couchbase is now at 3.0.2), I can say that it is only getting better. They have made amazing progress with XDCR, added security functionality, and the N1QL language.

The Couchbase community is great. Checkout the IRC channel if you need help.

Wrangling Elephants in the Cloud

elephant

You know the elephant in the room, the one no one wants to talk about. Well it turns out there was a whole herd of them hiding in my cloud. There’s a herd of them hiding in your cloud too. I’m sure of it. Here is my story and how I learned to wrangle the elephants in the cloud.

Like many of you, my boss walked into my office about three years ago and said “We need to move everything to the cloud.” At the time, I wasn’t convinced that moving to the cloud had technical merit. The business, on the other hand, had decided that, for whatever reason, it was absolutely necessary.

As I began planning the move, selecting a cloud provider, picking tools with which to manage the deployment, I knew that I wasn’t going to be able to provide the same quality of service in a cloud as I had in our server farm. There were too many unknowns.

The cloud providers don’t like to give too many details on their setups nor do they like to provide many meaningful SLAs. I have very little idea what hardware I’m running. I have almost no idea how it’s connected. How many disks I’m running on? What RAID configuration? How many IOPS can I count on? Is a disk failing? Is it being replaced? What will happen if the power supply blows? Do I have redundant network connections?

Whatever it was that made the business decide to move, it trumped all these unknowns. In the beginning, I focused on getting what we had from one place to the other, following whichever tried and true best practices were still relevant.

Since then, I’ve come up with these guiding principles for working around the unknowns in the cloud.

  • Beginners:
    • Develop in the cloud
    • Develop for failure
    • Automate deployment to the cloud
    • Distribute deployments across regions
  • Advanced:
    • Monitor everything
    • Use multiple providers
    • Mix and match private cloud

Wrangling elephants for beginners:

Develop in the cloud.

Developers invariably want to work locally. It’s more comfortable. It’s faster. It’s why you bought them a crazy expensive MacBook Pro. It is also nothing like production and nothing developed that way ever really works the same in real life.

If you want to run with the IOPS limitations of standard Amazon EBS or you want to rely on Amazon ELBs to distribute traffic under sudden load, you need to have those limitations in development as well. I’ve seen developers cry when their MongoDB deployed to EBS and I’ve seen ELBs disappear 40% of a huge media campaign.

Develop for failure.

Cloud providers will fail. It is cheaper for them to fail and in the worst case, credit your account for some machine hours, than it is for them to buy high quality hardware and setup highly available networks. In many cases, the failure is not even a complete and total failure (that would be too easy). Instead, it could just be some incredibly high response times which your application may not know how to deal with.

You need to develop your application with these possibilities in mind. Chaos Monkey by Netflix is a classic, if not over-achieving example.

Automate deployment to the cloud.

I’m not even talking about more complicated, possibly over complicated, auto-scaling solutions. I’m talking about when it’s 3am and your customers are switching over to your competitors. Your cloud provider just lost a rack of machines including half of your service. You need to redeploy those machines ASAP, possibly to a completely different data center.

If you’ve automated your deployments and there aren’t any other hiccups, it will hopefully take less than 30 minutes to get back up. If not, well, it will take what it takes. There are many other advantages to automating your deployments but this is the one that will let you sleep at night.

Distribute deployments across regions.

A pet peeve of mine is the mess that Amazon has made with their “availability zones.” While the concept is a very easy to implement solution (from Amazon’s point of view) to the logistical problems involved in running a cloud service, it is a constantly overlooked source of unreliability for beginners choosing Amazon AWS. Even running a multi-availability zone deployment in Amazon only marginally increases reliability whereas deploying to multiple regions can be much more beneficial with a similar amount of complexity.

Whether you use Amazon or another provider, it is best to build your service from the ground up to run in multiple regions, even only in an active/passive capacity. Aside from the standard benefits of a distributed deployment (mitigation of DDOS attacks and uplink provider issues, lower latency to customers, disaster recovery, etc.), running in multiple regions will protect you against regional problems caused by hardware failure, regional maintenance, or human error.

Advanced elephant wrangling:

The four principles before this are really about being prepared for the worst. If you’re prepared for the worst, then you’ve managed 80% of the problem. You may be wasting resources or you may be susceptible to provider level failures, but your services should be up all of the time.

Monitor Everything.

It is very hard to get reliable information about system resource usage in a cloud. It really isn’t in the cloud provider’s interest to give you that information, after all, they are making money by overbooking resources on their hardware. No, you shouldn’t rely on Amazon to monitor your Amazon performance, at least not entirely.

Even when they give you system metrics, it might not be the information you need to solve your problem. I highly recommend reading the book Systems Performance – Enterprise and the Cloud by Brendan Gregg.

Some clouds are better than others at providing system metrics. If you can choose them, great! Otherwise, you need to start finding other strategies for monitoring your systems. It could be to monitor your services higher up in the stack by adding more metric points to your code. It could be to audit your request logs. It could be to install an APM agent.

Aside from monitoring your services, you need to monitor your providers. Make sure they are doing their jobs. Trust me that some times they aren’t.

I highly recommend monitoring your services from multiple points of view so you can corroborate the data from multiple observers. This happens to fit in well with the next principle.

Use multiple providers.

There is no way around it. Using one provider for any third party service is putting all your eggs in one basket. You should use multiple providers for everything in your critical path, especially the following four:

  • DNS
  • Cloud
  • CDN
  • Monitoring

Regarding DNS, there are some great providers out there. CloudFlare is a great option for the budget conscious. Route53 is not free but not expensive. DNSMadeEasy is a little bit pricier but will give you some more advanced DNS features. Some of the nastiest downtimes in the past year were due to DNS provider

Regarding Cloud, using multiple providers requires very good automation and configuration management. If you can find multiple providers which run the same underlying platform (for example, Joyent licenses out their cloud platform to various other public cloud vendors), then you can save some work. In any case, using multiple cloud providers can save you from some downtime, bad cloud maintenance or worse.

CDNs also have their ups and downs. The Internet is a fluid space and one CDN may be faster one day and slower the next. A good Multi-CDN solution will save you from the bad days, and make every day a little better at the same time.

Monitoring is great but who’s monitoring the monitor. It’s a classic problem. Instead of trying to make sure every monitoring solution you use is perfect, use multiple providers from multiple points of view (application performance, system monitoring, synthetic polling).

These perspectives all overlap to some degree backing each other up. If multiple providers start alerting, you know there is a real actionable problem and from how they alert, you can sometimes home in on the root cause much more quickly.

If your APM solution starts crying about CPU utilization but your system monitoring solution is silent, you know that you may have a problem that needs to be verified. Is the APM system misreading the situation or has your system monitoring agent failed to warn you of a serious issue?

Mix and match private cloud

Regardless of all the above steps you can take to mitigate the risks of working in environments not completely in your control, really important business should remain in-house. You can keep the paradigm of software defined infrastructure by building a private cloud.

Joyent license their cloud platform out to companies for building private clouds with enterprise support. This makes a mixing and matching between public and private very easy. In addition, they have open sourced the entire cloud platform so if you want to install without support, you are free to do so.

Summary

When a herd of elephants is stampeding, there is no hope of stopping them in their tracks. The best you can hope for is to point them in the right direction. Similarly, in the cloud, we will never get back the depth of visibility and control that we have with private deployments. What’s important is to learn how to steer the herd so we are prepared for the occasional stampede while still delivering high quality systems.