Tag: bare metal

Docker 2015 Recap – The good, the bad and the ugly.

docker 2015

Year in review

There is no question in my mind that Docker was, by far, the most disruptive technology of 2015. What barely registered on the radar for many in 2014, became something shiny in Q1, advanced to thinking material in Q2, reached “you have to be crazy to run that in production” by Q3, and is now on everyone’s three year plan.

To make a long story short:

  1. Containers were a great idea 12 years ago and Docker has brought containers to Linux and managed to improve on the concept.
  2. That said, the actual implementation of containers in Linux is a disaster. If you can’t use Triton or another container technology, be prepared to step on a lot of mines.
  3. In proper hands and under proper circumstances (dev, qa, limited production use), Docker is already useful so use it.
  4. Though it’s production capable, it’s not production ready. Parental supervision required.

Getting some perspective

Sun released Solaris 10 and Zones around 2005. The performance advantages of OS level virtualization over hardware level virtualization were and are obvious but the real innovation came the combination of Zones with complementary technologies like ZFS. On the simplest level, quota support, delegated datasets, LOFI mounts, and snapshotting capabilities added layers upon layers of  functionality to containers.

By 2008, the only systems I had in dev, staging, or production that didn’t run on top of zones were Oracle RAC nodes. Every piece of code and every application server was deployed by sending an incremental ZFS snapshot of the container in staging to the servers in production. Effectively every server in production was running block level clones of the containers in staging.

It shouldn’t surprise anyone in the tech industry that we’ve come full circle as technology loves to do. Docker brings almost this exact pattern of shipping containers based on incremental file system changes to Linux but with some compelling twists and, yes, some troubling flaws.

The good

Container as an artifact

Developers rarely keep a clean house and by that I mean a clean working environment. Working on multitudes of projects they’ll often have multiple versions of a runtime installed, not to mention any number of libraries. When software is written in a dirty environment, dependencies can be forgotten or met unintentionally and it leads to “worked on my laptop syndrome.” One of the major advantages of containers is the filesystem level isolation for the processes. When developers work properly with containers, their software, runtimes, and dependencies are all isolated from any other project. The same code and dependencies running in dev, go to staging, and then possibly even to production.

Container as a process

One of the most brilliant conceptual changes Docker has, in contrast to what Sun had made possible with Zones and ZFS, is the idea of a container as a process. I don’t know if this was originally intended by design or a happy consequence of building containerization on top of cgroups but it’s ground breaking for several reasons:

  1. Running a single process per container has significantly less resource requirements than running a traditional fat container like a Solaris Zone.
  2. Running a single process per container has significantly less attack vectors than running a traditional fat container.
  3. The paradigm of a container accepting STDIN and returning STDOUT/STDERR like a normal process brings the UNIX philosophy to new heights, allowing, not just software, but even very complex processes to be packaged as containers and piped together.

API

While the lean, process model container is one of the best innovations Docker brings to the table, it is not without it’s downsides. Docker containers are poorly suited for services which require multiple running processes, involve complex configurations, and/or work with many files/output channels. Mounting configuration, data, or log files from the Docker host can work around some of these issues but that strays from the general goal of having a self contained, self reliant image which can be run anywhere.

The alternative is to pass some unwieldy combination of environment variables to the container at runtime. On the command line, this is infuriating after a while but when managed by API, it becomes much more tractable. The API also makes it fairly straightforward to replace cloud resources with Docker resources so configuration management tools like Chef, Puppet, Ansible, and Vagrant have already been able to provide reasonably mature support despite the still growing adoption levels.

That said, the most innovative use of the Docker API has been Joyent’s Triton offering. By creating a virtual Docker endpoint as a gateway to the Joyent public cloud, they have become the only service, that I’m aware of, which runs Docker containers on bare metal without maintaining dedicated Docker Host machines. Having built and run a production SAAS based on Docker, I can’t stress enough how much of a game changer that is in terms of optimal resource utilization, and consequently in terms of price.

Layers

While ZFS technically enabled Zones to be built from incremental changes to a file system, there was really very little, if any integration there. I suppose you could name snapshots but I don’t think there was any way to get a history of the layers involved in creating the container like there is with Docker.

With ZFS+deduplication you could also achieve something along the lines of Docker’s caching and reuse of layers. To be honest, in most cases I preferred not to enable deduplication and to have another copy of my data in production. In development, however, if you want to run 100 containers based on Ubuntu, you are probably more than happy to reuse the Ubuntu layer.

Another compelling pattern for layer reuse is the data volume container. Assuming a service relying on a database, you could store the initial database in an image and then run any number of data volume containers based on that image. Each container will contain a private version of the data but only store the changes from the original image which is excellent for development, automated testing, etc.

With the properly constructed Dockerfile, caching layers can also significantly reduce the amount of time involved in downloading images based on layers which have already been downloaded, updating containers with new changes, etc. That comes with several caveats that layer caching is not infallible and you may easily miss updating critical pieces of your software if you do it wrong, for examples see here and here.

…the bad, and the ugly

The ugly truth is that the Linux container technology underlying Docker is incapable of providing true containerization. Unlike Solaris zones which were designed from the beginning to provide complete isolation between containers, Linux containers are really just a combination of loosely coupled kernel features doing things they can, but weren’t necessarily meant to do, hob-cobbled together with a bunch of proverbial duct tape.

Once, after a service upgrade, we experienced transient failures that crippled our system. It took about 6 hours to find a rogue instance of the previous application that had remained running although the “container” it was running was deleted. Since there is no real container entity in Linux, you will often end up with orphaned file system mounts, and less often, broken iptables rules and rogue processes.

Another major issue is the lack of “C4I” solutions/patterns in the Docker space. In a fat host/VM environment, services rely on any number of base services provided by the host including DNS resolution, email, logging, time synchronization, firewall, etc. In the lean environments of process model containers, there is nothing to rely on. In many cases, organizations fall back to providing non-containerized solutions from the host. In other cases, deploying privileged agent containers alongside application containers is used. Many times a combination of both is necessary but none of these solutions is optimal.

Most importantly, Docker when used improperly can add numerous attack surfaces to any system. You must be extremely careful with access to the docker daemon whether via TCP or via socket. Helpers like docker-machine are so focused on providing the easiest out-of-box experience that it’s too easy to do something wrong. You must be vigilant regarding the trustworthiness and the maintenance of container contents. With DockerHub and Docker, the instant gratification of pulling and running a basic instance of any software can quickly erase any common sense regarding the trustworthiness of the software inside. Has the software been updated or am I opening myself to known exploits? Has the software been back-doored? Will the container be mining bitcoins for someone else using my resources?

In summary

Docker has both good and bad points but the situation is surely improving with time. If Linux containers continue to be developed as a disconnected combination of resources and their controls, I wouldn’t expect a mature Linux based solution for at least 10 years.

For the long run, a better and faster alternative would be to keep Docker’s conceptual innovations and API while replacing the underlying technology with adaptations of mature container solutions like Zones. Joyent is already playing a visionary role on that front. Their focus on cloud, however, means that there are no plans in sight to support a more standard zones based Docker Host. I wish someone in the space would pick up the gauntlet there.

In short, if you can run on Triton/Solaris derivatives with Zones, that would be my first pick. If you have to run on Linux, tread carefully, resist the temptation for instant gratification,  and Docker on Linux can already be useful in many cases.

Triton Bare Metal Containers FTW!

triton

If you haven’t heard of Docker, I’m not sure what cave you’ve been living in but here’s the short story:

Hardware level virtualization, like you are used to (VMWare, Virtualbox, KVM, Xen, Amazon, Rackspace, Azure, etc.) is slow and awful. Virtualization at the OS level, where processes share isolated access to a single Operating System kernel, is much more efficient and therefore awesome. Another way of saying OS level virtualization is “containers” such as have existed in FreeBSD and Solaris for over a decade.

Docker is hacking together a whole slew of technologies (cgroups, union filesystems, iptables, etc.) to finally bring the concept of containers to the Linux masses. Along the way, they’ve also managed to evolve the concept a little adding the idea of running a container as very portable unit which runs as a process.

Instead of managing dependencies across multiple environments and platforms, an ideal Docker container encapsulates all the runtime dependencies for a service or process. Instead including a full root file system, the ideal Docker container could be as small as a single binary. Taking that hand in hand with the growing interest in developing reusable microservices, we have an amazing tech revolution on our hands.

That is not to say that everything in the Docker world is all roses. I did say that Docker is hacking together a slew of technologies so while marketing and demos portray:

You are more likely to start with something like this:

Either way, tons of containers per host is great until you realize you are lugging them around on a huge, slow, whale of a cargo ship.

  1. Currently, Docker’s level of isolation between containers is not that great so security and noisy neighbors are issues.
  2. Containers on the same host can’t easily run on the same ports so you may have to do some spaghetti networking.
  3. On top of that, if you are running Docker in Amazon, Google, Azure, etc. you are missing the whole point which was to escape the HW level virtualization.

Joyent to the rescue!

Joyent is the only container based cloud provider that I’m aware of. They have been running the vast majority of my cloud instances (possibly yours as well) on OS level virtualization for years now (years before Docker was a twinkle in Shamu’s eye). As such, they are possibly the most experienced and qualified technical leaders on the subject.

They run a customized version of Illumos, an OpenSolaris derivative, with extremely efficient zone technology for their containers. In January (Linux and Solaris are Converging but Not the Way you Think), I wrote about the strides Joyent made allowing Linux binaries to run inside Illumos zones.

Triton May Qualify as Witchcraft

The love child of that work, announced as GA last week, was Triton- a Docker API compliant (for the most part) service running zone based Docker containers on bare metal in the cloud. If running on bare metal weren’t enough of an advantage, Joyent completely abstracted away the notion of the Docker host (ie. the cargo ship). Instead, your Docker client speaks to an API endpoint which schedules your bare metal containers transparently across the entire cloud.

Addressing each of the points I mentioned above:

  1. Zones in Illumos/Joyent provide complete isolation as opposed to Linux based containers so no security or noisy neighbor problems.
  2. Every container gets it’s own public ip address with a full range of ports so no spaghetti networking
  3. No Docker host and no HW virtualization so every container is running full speed on bare metal

Going back to the boat analogy, if Docker containers on Linux in most clouds looks like this:

Docker containers as zones on bare metal in Joyent look like this:

Enough of the FUD

I’m not a big fan of marketing FUD so I’ve been kicking the tires on the beta version of Triton for a while. Now with the GA, here are the pros and cons.

Pros:

  1. Better container isolation
  2. Better networking support
  3. Better performance
  4. No overhead managing a Docker host
  5. Great pricing (per minute and much lower pricing)
  6. User friendly tooling in the portal, including log support and running commands on containers using docker exec.

Cons:

  1. The API still isn’t fully supported so things like build and push don’t work. You can mostly work around this using a docker registry.
  2. Lack of a Docker Host precludes using some of the patterns that have emerged for logging, monitoring, and sharing data between containers.

Summary

Docker is a game changer but it is far from ready for prime time. Triton is the best choice available today for running container based workloads in general, and for production Docker workloads specifically.

At Codefresh, we’re working to give you the benefits of containers in your development workflow without the headache of actually keeping the ship afloat yourselves.  Sign up for our free Beta service to see how simple we can make it for you to run your code and/or Docker containers in a clean environment. Take a look at my getting started walkthrough or contact me if you have any questions.