Category: Architecture

The Ball is in the Net. Goal or No Goal?

goal

The ball hit the net but from which side. Can you tell? Over the past three years, companies have pushed themselves to the cloud for many reasons but have they landed in the wrong side of the net?

Many companies have mistaken moving to the cloud for a goal to be achieved and it is natural to make that mistake. Companies see the bottom line, that building services in PAAS or IAAS clouds lowers the costs of bootstrapping risky projects, speeds up time to market and enables greater flexibility. They naturally make moving everything to the cloud a business target.

They miss that driving these benefits are the ways that automation and infrastructure as a service force the modernization and industrialization of a company’s IT teams and processes. Even if a company isn’t using any modern software driven deployment techniques, it is the industrialization of infrastructure on the provider’s side that allows a “machine” to be spec’ed, purchased, racked, cabled, and installed at the push of a button or the call of an API. It is this change in the way that IT works that is improving the bottom line, speeding time to market and increasing the business agility.

Companies that make this distinction realize that hosting your servers in a cloud, private or public, isn’t an end, in and of itself. If it’s the automation and software defined infrastructure that is helping business, then that has to be the focus.

In reality, IAAS is still very immature. There is no provider today that can provide public IAAS which meets the standards of a high quality private deployment, let alone enterprise grade.

Visionary companies like Netflix have built vast frameworks to compensate for some of the problems with public cloud. In 2013, Netflix’s director of cloud solutions Ariel Tseitlin is quoted as having said “We’re far from being in a commoditized cloud market. It really isn’t a utility like we feel someday it is going to become. If you look at how much infrastructure we built, the huge amount of extra glue and services and tooling we’ve invested in, that gives you an indication of what could be offered in the future.”

Others, like Zynga, went the hybrid cloud route because “While the public cloud is exceptional at providing a wealth of services for various computing needs, we’re an outlier and not a traditional IT workload. The performance and availability required to operate social games on the scale that we do, requires the ability to fine tune infrastructure… We learned to understand our workload, look into the black box of cloud computing, and built what we affectionately call zCloud, our own private cloud infrastructure. zCloud looks, feels, and operates similar to the way we use the public cloud, but allows for greater performance, scale and reliability.”

Between these two notorious cloud consumers, the common denominator is the drive to change the way infrastructure is consumed by the business without sacrificing the quality or reliability of the services. That is why companies should be striving to modernize mainstream IT whether on private, public, or hybrid infrastructure.

Note: Jason Hoffman, Founder and former CTO of Joyent, now Head of Cloud Technology at Ericsson, really put this into perspective to me with this video segment (from which I definitely cannibalized some jargon). After living and breathing “cloud” for the past three years, I think he’s really hit the nail on the head and it will be interesting to see what Ericsson can do to bring forth the next iteration of IAAS.

How to Host a Screaming Fast Site for $0.03/Month

perf

I had an idea. That’s always how it starts. Before I know it, I’ve purchased the domain name and I’m futzing around with some HTML but where am I going to host it and how much is this going to end up costing me?

That’s where I was when I came up with #DonateMyFee. “This is a site that is only going to cost me money”, I thought to myself (the whole point is for people to donate money rather than paying me). I really didn’t want to start shelling out big (or small) bucks on hosting.

Long story short, here is the recipe for a screaming fast website on a low budget:

Amazon S3

I’m not a huge fan of Amazon AWS, but S3 is useful enough to make it into my good graces. S3 is Amazon’s storage service. You upload static files into “buckets” and S3 can hold on to them, version them, and most importantly serve them via http. When configured to serve a bucket as a static website, S3 can be used to replace the load balancing and web serving infrastructure needed to serve a static website.

There are only two problems with that.

  1. You pay for S3 by the amount of traffic pulled from your bucket.
  2. Your “website” will be called something crazy ugly like donatemyfee.com.s3-website-eu-west-1.amazonaws.com

Regarding the price, S3 tries to get you three ways. They charge for the volume of the data being stored, for the number of requests made, and for the volume of the request throughput in GB. That said, the prices are very reasonable if we can keep the number of requests low. For that reason, a CDN is an absolute must. The CDN will also solve our second problem – the unfriendly S3 website name.

Often S3 is paired with Amazon’s CDN, Cloudfront, but I don’t recommend it. Cloudfront is expensive as CDN’s go and we’re on a budget. Even if we wanted to pay for the CDN, there are better performing options for less. CloudFlare is a great alternative with a free plan that will do us wonders.

CloudFlare

CloudFlare is one of several CDN by proxy + Webapp Firewall solutions that cropped up several years ago. Since the beginning, they have had a free plan and they have proven to be both innovative and competitive.

To use CloudFlare , we need to set their servers as your domain’s DNS name servers which can be a deal breaker in some cases. Once that’s setup we create a CNAME record in CloudFlare which points to the ugly S3 website name. CloudFlare has a new CNAME flattening technique which will allow us to configure this even for the root domain (without the www). This technique break some rules so I wouldn’t recommend it in every case, but in ours, it’s just what we need.

CloudFlare will cache all of our static content from S3 saving us from paying for the majority of the visits to the site. CloudFlare will also compress and optimize our content so it takes less time to reach the browser. Depending on what kind of traffic your site attracts, CloudFlare’s security settings can also protect you from all kinds of resource abuse, malicious traffic, hotlinking, etc.

Note: S3 will not properly identify the mime types for every file which means that some files might not be compressed properly by CloudFlare. You can fix this by changing the metadata for the files in S3. Specifically .ttf, .eot, and other typography related files are a problem.

Frugal Functionality

Having a cheaply hosted static website is nice but static can also be pretty useless. In order to get some functionality out of the site, you could go all jQuery on it but I that that is a road too often traveled these days. I’ve seen too many people include all of jQuery instead of writing 3 lines of JavaScript.

If we want this site to be fast we need to work frugally. If you take a look athttp://donatemyfee.com, you will see some examples of what I call “frugal functionality”.

The social share buttons are static links, not huge JavaScript widgets included from various social networks. Including external scripts is always a bad idea and they always hurt the performance of your site no matter what anyone tells you. Also, the icons and hover animations are CSS typography tricks. No JavaScript and no icon images downloaded.

The site is designed using responsive web design techniques which is “buzzword” for using a bunch of crafty CSS to make the same thing look decent on different sized screens. If we were a large company, I would say “Responsive web is for lazy companies and people without a budget to develop good looking, device targeted sites.” Since we’re on a budget, I’ll say it’s frugal 🙂

Last but not least, we have skimped on all the normal infrastructure that goes behind a website so our options for actually generating leads are a bit thin. We could go very old school with mailto links but in these days where webmail reigns supreme, they are getting pretty useless. Enter Google Forms.

Google Forms

If you haven’t been asked to fill out a Google Form yet, here’s your chance. Google lets you create fairly elaborate forms for free. The forms collect the answers and store them automatically in a Google Drive spreadsheet. There are more sophisticated options for processing the answers, and an entire extension ecosystem being built around the process. For us, the basic solution is more than enough.

Note: You can link to the form or embed it in an iframe. The form will take a bite out of your page load performance (iframes are a huge performance no-no). They will also annoy you with endless warnings, all of which you can nothing about, if you test your site performance with any of the free online services (Webpagetest,Websitetest, GTmetrix, PageSpeed, etc.). In this case, I used some simple (read jQuery-free) JavaScript to load the embeded iframe if it’s requested. This has the added benefit of keeping the user on-site to fill out the form and eliminating the page load time performance hit.

Less is more

Finally, the most important advice about web performance is always “Less is more”. There is no better way to ensure that a page loads quickly than to make it smaller in every way possible. Use less and smaller pictures. Combine, compress and minify everything. Reduce the number of requests.

If you’re interested in getting my help with your site, contact me via LinkedIn or#DonateMyFee . All consulting fees go directly from you to a tax deductible charity in your/your company’s name.

Sun SPARC T3 Servers

Oracle announced their new line of Sun SPARC T3 powered servers at Oracle Openworld 2010. The SPARC T3 processor includes several improvements on T2 and T2+ processors including:

T2 / T2+T3
65 nm manufacturing process40 nm manufacturing process
4MB L2 Cache6MB L2 Cache
8 Cores (8 threads/core)16 Cores (8 threads/core)
8 Crypto Accelerators (1/core)16 Crypto Accelerators (1/core)
DDR2 FB-DIMMsDDR3
1 On Board PCIe x8 v1 Port2 On Board PCIe x8 v2 Ports

It is interesting to note that the T2 processor was only used in single socket systems. The T2+ processor removed the T2’s on board 10 GbE ports and other components to make room for the SMP glue. With the T3 processors, the 10 GbE ports have returned and the chip has built in glueless support for 4 way servers.

All in all they have packed more T-Series goodness in a smaller package but I’m not making goo-goo eyes yet.

For one, the smallest T3 based server, the T3-1, has the same number of threads as the T5140 but takes twice as many rack units. Although the T3-1 supports more PCIe cards and more internal hard disks, I would rather have a 1RU server or else have it support twice as much RAM.

The T3-2 server supports 256 threads. Compared to the T5440, it is actually smaller at 3RU and uses less power which sounds like a step in the right direction. Unfortunately, the T3-2 is also light on RAM supporting a maximum of 256GB compared to the T5440’s 512GB.

In short, The T3 series is a little off course for me at the moment. As a platform for consolidating tens of smaller applications, the thread to RAM ratio is too low making it hard to get 100% utilization out of these servers. With the T3-4 servers loading even more processing power into a single machine, the thread to machine ratio high as well. This is good if you are running a few really huge applications but if you are consolidating many smaller applications, you will not want to put this many eggs in one basket.

Vendor Lock-In or One Stop Shop

I was recently discussing load balancers with someone. I said I was much happier with F5 than I was with Cisco and he countered that although he preferred F5 head to head, going with Cisco for all the network was better for them in the long run.

The situation with storage is similar. EMC makes a great SAN but a pretty bad NAS. Is it worth getting EMC”s NAS for the One Stop Shop factor?

Since Oracle’s acquisition of Sun, I’ve been looking forward to the success of their “One Stop Shop” philosophy. Successfully bringing all their offerings under one roof promises better and faster support all around.

Unfortunately, it has been almost a year and Oracle is still not sure how they are to unify the customer support systems. New support contracts don’t work in either system.  To make things a little less clear, Oracle recently announced that everything will be migrated to “My Oracle Support” but they don’t know when- very reassuring.

A simple pattern emerges. One Stop Shop is a dream for IT people. Support is hard enough to get when you’ve isolated a problem to a specific vendor. It is even harder when your problems are between two vendors and each points the finger at the other.

When does the One Stop Shop strategy become a rationalization for Vendor Lock-In? It is a delicate balance around how much better your IT could be with Best of Breed vs. how much worse they will be integrating all the different pieces of the puzzle.

Regarding Cisco vs. F5, I’m also pretty happy letting Cisco handle everything Layer 3 and under and I don’t worry too much about the integration. I’m also optimistic regarding Sun and Oracle. I think they’ll have the wrinkles ironed out by the second half of 2011. If they don’t, it will be a serious let down.

EMC Fully Automated Storage Tiering

Storage Tiering is nothing new. We use fast 15K RPM disks for high performance applications, slower 10K RPM disks for less demanding applications, and 7.2K RPM SATA disks for archive storage. Recently, solid state disks (SSDs) have also become more common for really high performance needs. The trick is managing it all.

Two or three years ago, if you wanted to implement automatic storage tiering, I would have pointed you in the direction of Sun’s Storage and Archive Manager- SAM and QFS, Sun’s tightly integrated shared file system. SAM-QFS automatically moves files from one storage tier to another based on the SAM policy and transparently retrieves the files when requested. With tape still the least expensive storage available, this is still a great solution for archiving petabytes of documents/files.

Unfortunately, SAM works at the file level so it will not help our databases run faster. What will help us is ZFS. ZFS is still making some fairly big waves in the storage community with it’s Hybrid Storage Pool feature. In a standard configuration, ZFS uses RAM for a Layer 1 read cache (ARC).  In advanced configurations, the zpool can be configured to use a Layer 2 cache (L2ARC) on faster disks ie. SSDs compared to SAS compared to SATA , etc. The zpool can also be configured to use separate, possibly faster disks for the ZFS Intent Log (ZIL) which is basically a write cache (without getting into why it is more than a write cache). Even without faster disks, the ability to store the read/write cache on a separate device can increase performance just by dedicating more IOPS to the cause.

Oracle/Sun’s 7000 series storage builds on the success of the ZFS Hybrid Storage Pool, using Logzilla devices for the ZIL and Readzilla devices for the L2ARC. With the powerful flash acceleration in the storage pool, even 7.2K RPM disks can give performance equal to that of higher speed 15K RPM disks.

Although ZFS does great things for performance by utilizing multiple tiers of storage devices, all the data is still physically stored on the same tier of storage in addition to having the hot data stored again in the caches. This is arguably a waste of capacity but can also lead to performance issues in some cases. For example, a cold L2ARC cache after reboot could give slower performance until fully warmed up. Oracle will probably fix this at some point by allowing the L2ARC to persist if stored on a non-volatile device (bug_id=6662467).

In the meantime, EMC recently announced an interesting new feature called FAST, short for Fully Automated Storage Tiering. FAST is available from FLARE version 04.30.000.5.004. FAST allows you to define a pool in the array composed of multiple RAID Groups, and then define a LUN on the pool as opposed to defining a LUN on the RAID Groups themselves. Once the LUN begins filling with data, the EMC will transparently begin transparently migrating data between the tiers of the pool in 1GB chunks, storing hot data on the fastest tiers and coldest data on the slowest tier.

FAST sounds like a dream come true. No more complicated storage configurations for the database. No more packages and processes to move historical data to slower disk groups. On the other hand, I am skeptical as to whether or not this technology is really mature. Do all EMC products treat the FAST LUNS the same as traditional LUNS (SnapView, Replication Manager, etc.) Also, are the ramifications of disk failures for a FAST LUN the same or does failure of a Tier 1 disk in a FAST pool mean alot more high performance eggs in one basket? Time will tell.