Tag: solaris

When 99.999% Isn’t Good Enough

When discussing availability of a service, it is common to hear the term “Five Nines” referring to a service being available 99.999% of the time but “Five Nines” are relative. If your time frame is a week, then your service can be unavailable for 6.05 seconds whereas a time frame of a year, allows for a very respectable 5.26 minutes.

In reality, none of those calculations are relevant because no one cares if a service is unavailable for 10 hours, as long as they aren’t trying to use it. On the other hand, if you’re handling 50,000 transactions per second, 6.05 seconds of unavailability could cost you 302,500 transactions and no one cares if you met your SLA.

This problem is one I’ve come up against a number of times in the past and recently even more and the issue is orders of magnitude in IT. The larger the volume of business you handle, the less relevant the Five Nines become.

Google became famous years ago for its novel approach to hardware availability. They were using servers and disks on such a scale that they could no longer prevent the failures and they decided not to even try. Instead, they planned to sustain lots of failures and made a business of knowing when to expect problems and where. As much as we would like to be able to take Google’s approach to things, I think most of our IT budgets aren’t up for it.

Another good example is EMC2 who boast 99.999% availability for their Clariion line of storage systems. I want to start by saying that I use EMC storage and I’m happy with them. Regardless, their claim of 99.999% availability doesn’t give me any comfort for the following reasons.

According to a Whitepaper from 2007 (maybe they have changed things since then) EMC has a team which calculates availability for every Clariion in the field on a weekly basis. Assuming there were 2000 Clariion systems in the field on a given week(the example given in the whitepaper), and across all of them was 1.5 hours of downtime, then:

2000 systems x 7 days x 24 hours   =  336,000 total hours of runtime
336,000 hours - 1.5 hours downtime =  335,998.5 hours of uptime
335,998.5 / 336,000                =  99.9996% uptime

That is great, at least that is what EMC wants you to think. I look at this and understand something totally different. According to this guy, as of the beginning of 2009 there were 300,000 Clariion’s sold- not 2000. That is two orders of magnitude different meaning:

300,000 systems x 7 days x 24 hours   =  50,400,000 total hours of runtime
336,000 hours - 504 hours downtime    =  50,399,496 hours of uptime
50,399,496 / 50,400,000               =  99.999% uptime

Granted, that is a lot of uptime but 504 hours of downtime is still 21 full days of downtime for someone. If it were possible for 21 full days of downtime to fit in one week, they could all be yours and EMC would still be able to claim 99.999% availability according to their calculations. By the same token, 3 EMC customers each week could theoretically have no availability the entire week and one of those customers could be me.

Since storage failures can cause soo many complications, I figure it is much more likely that EMC downtime comes in days as opposed to minutes or hours. Either way, Five Nines is lost in the scale of things in this case as well.

Content Delivery Networks provide another availability vs scale problem. Akamai announced record breaking amounts of traffic on their network in January 2009. They passed 2 terabits and 12,000,000 requests per second. (I don’t use Akamai but I think it is amazing that they delivered over 2 terabits/second of traffic). With that level of traffic, even if Akamai would provide a 99.999% availability SLA, they could have had 120 failed requests per second, 7200 failed requests per minute, etc.

Sometimes complaints relating to our CDN cross my desk and while I have no idea how much traffic our CDN handles world wide, I know that we can easily send it 20,000,000 requests per day. Assuming 99.999% availability, I expect (learning from Google) to have 200 failed requests per day. Knowing IT as I do, I also expect that all 200 failed requests will be in the same country -probably an issue with one of their cache servers which due to GTM will primarily affect people directed to that server, etc. Unfortunately, the issue of scale is lost on our partners who didn’t get their content.

Availability is not the only case where scale is forgotten. I was recently asked to help debug the performance of an application server which could handle a large amount of requests per second when queried directly but only handled 80% of the requests per second when sitting behind a load balancer.

Of course we started by trying to find a reason why the load balancer would be causing a 20% performance hit. After deep investigation the answer I found (not necessarily the correct answer) was that all the load balancing configurations were correct and on average having the load balancer in the path added 1 millisecond to the response time of each request. Unfortunately the response time without the load balancer was an average of 4 milliseconds, so the additional 1 millisecond reduced the overal performance by 20%.

In short, everything is relative and 99.999% isn’t good enough.

Sun’s Predicament

I’ve been working with Unix for a fairly long time now- about 13 years.

I’ll admit that I started with Linux and thought it was light years ahead of SunOS 4.x running on those old SPARC machines- I mean who had heard of SPARC processors? I remember my boss trying to explain to me that even an older SPARC processor was more powerful than a newer Intel Pentium processor. I didn’t really believe him. In time, I convinced them to get rid of most of their SPARC/Solaris in favor of the hip, free, and cheap Intel/Linux combination.

Now I see that I couldn’t have been more wrong. I realize that SunOS 4.x probably still has features which I don’t know how to use properly. When I look at Solaris 10, ZFS, Zones, LDOMS, DTrace, etc. I not really sure you could pay me to work with Linux (that would be soo depressing). That isn’t even mentioning the SPARC hardware it runs on- Can any Intel server compare to a T5140???

That’s why the current situation with Sun absolutely SUCKS (pardon my french)! I’m sure there are a lot of admins out there who feel the same way. If this Oracle deal doesn’t go through and Sun disappears because of it, it will be our loss. We’ll be stuck with mediocre operating systems and commodity hardware and I really hope it doesn’t happen.

That said, I’d like to say thanks to all the people at Sun who are still turning out crazy cool technologies despite the problems.

Listing ZFS Clones using the origin property

Recently I created my first ZFS clones but quickly realized that there was no simple way to tell the clones from the regular filesystems. My first instinct was to run ‘zfs list -t clone’ similar to ‘zfs list -t snapshot’ but this didn’t work. Maybe it works in newer versions of ZFS.

After some poking around I found the ‘origin’ property which sets the clones apart so running something like-

zfs list -o origin,name,used,avail,refer,mountpoint | \
grep -v ^- |awk '{print $2"\t"$3"\t"$4"\t"$5}'

will get you what you are looking for.

If you haven’t played with ZFS clones yet, basically they are writable snapshots of a file system.

They are great if you want to copy a lot of data to the side, modify it, and possibly replace the original data, without taking a lot of time or disk space. The ZFS clones take seconds to create, since they don’t actually copy any data, and they will only store the blocks which have changed since their creation. If you want to replace the original data, you can then transparently promote the clone to be the master filesystem and turn the master into a clone.

The downside of clones is that they are always dependant on the snapshot from which they were created. You can not destroy a snapshot on which a clone is based without destroying the clone.

For the sake of simplicity and since I don’t usually have disk space issues, I usually prefer to make full copies using ZFS send/recieve but I have definate plans to make more use of ZFS clones in the future.

Sun Webstack 1.4 – Packages on Crack

I am a huge fan of Sun Microsystems.
I love Solaris 10.
I love ZFS.
I love RBAC.
I love zones.
I really love T2/T2+ processors.
I especially love the T5140 and X4450 servers.

One thing I cannot figure out though, is why Sun lets obviously delirious cocaine addicts package their software. Maybe I’m exaggerating but I think that many will agree that Sun’s packages leave much to be desired in general. On top of that, Sun seems to have a constant need to move software around and invent new paths- to boldy go where no sysadmin has gone before????

Our journey begins with the mythic /usr/ucb/ directory- a true treasure chest for those making the adjustment from Linux. We’ll continue to /usr/local/ ala sun freeware (actually the most normal place we will visit but not actually supported by sun) and then arrive at the more recent /usr/sfw.

On your right, we’ll be passing the Coolstack project (Not Officially Supported by Sun) located reasonably in /opt/coolstack. Notice the configuration files in /opt/coolstack/etc, apache located comfortably in /opt/coolstack/apache2, mysql located in /opt/coolstack/mysql. Can anyone guess where the SMF manifests are? My first guess would have been /opt/coolstack/var/svc… similar to the native manifests but I would be wrong because that would make too much sense or be too easy. Anyway- they are hiding in /opt/coolstack/lib/svc…

Wait- what’s that ahead? Coolstack is falling into disrepair, no longer to be updated. Instead, there will be a new neighborhood called Webstack and it WILL be officially supported by Sun- Time to get high. Can’t figure out where anything is? I’ll give you some hints:

Looking for configuration files? Don’t try /etc or /opt/webstack/etc. You should be looking in /etc/opt/webstack ??!?! Since when does that directory even exist?

Looking for your MySQL data directory? Don’t try /opt/webstack/mysql/data (similar to the existing structure in coolstack). Bet you wouldn’t have guessed /var/opt/webstack/mysql/5.0/data – /var/opt ??!?! What is that? Maybe for the 1.5 release they could put it in /usr/ucb/opt/usr/local/var/spool/sfw/webstack/mysql/5.0/data?

How about your default DocumentRoot for Apache? You must have guessed it by now: /var/opt/webstack/apache2/2.2/htdocs

Anyone here running webstack on Linux? In that case all the directories are different. I guess Sun wanted to make it difficult to run their stack on heterogenous environments?

Seriously- I really hope Sun wises up and fixes this before they hope for widespread adoption of the 1.5 release.

Network Interface Utilization in Solaris

A friend asked me how he could see the network utilization in Solaris. It seems like a fairly simple request but for some reason this is not a simple command line away.

In Linux I would instinctively go straight to iptraf. I don’t know if iptraf is the tool of choice these days but I’m pretty sure it is an apt-get away if not already installed.

If you are a DTrace wizard, you could whip something up. Maybe you could get the information from one of the of the DTraceToolkit scripts if their installed. The DTraceToolkit scripts I’ve seen seem to give too much information as most of them are concentrated on not only telling you if the network is loaded but what is loading it as well.

For the sake of practice I wrote the following script:

#!/usr/bin/perl -w
print "Interface: ";
$if=<>;
chomp($if);
$max=`dladm show-dev -p $if | awk -F= '{print \$3}' | awk '{print \$1*1024*1024/8}'`;
print "Max speed: ",$max,"\n";
$if=~m/([a-z0-9]+?)(\d+)/;
($module,$instance)=($1,$2);
$last_rbytes=0;
$last_obytes=0;
while(1){
@kstat=`kstat ${module}:${instance}:mac:/[or]bytes\$/ |awk '{print \$2}'`;
chomp(@kstat);
if($last_rbytes!=0){
printf("%02d%%\n",
(($kstat[$#kstat-1]-$last_rbytes)+
($kstat[$#kstat-2]-$last_obytes))/$max*100);
}
$last_rbytes=$kstat[$#kstat-1];
$last_obytes=$kstat[$#kstat-2];
sleep 1;
};

This script will ask you which interface you want to watch and then print out the utilization percentage on a new row every ~second.

On a side note, it seems strange to me the the received bytes are stored in kstat as rbytes while the transmitted bytes are stored in obytes. The only answer I can come up with is that if they would have chosen ibytes (in bytes) instead of rbytes, then the ‘i’ and ‘o’ might become interchanged in typos since they are next to each other on the keyboard. If they would have chosen tbytes (transmitted bytes), the same situation occurs- ‘r’ next to ‘t’. Still, as a friend pointed out, they could have used sbytes (sent bytes) which makes more sense than obytes.