Tag Archive for USD

Listing ZFS Clones using the origin property

Recently I created my first ZFS clones but quickly realized that there was no simple way to tell the clones from the regular filesystems. My first instinct was to run ‘zfs list -t clone’ similar to ‘zfs list -t snapshot’ but this didn’t work. Maybe it works in newer versions of ZFS.

After some poking around I found the ‘origin’ property which sets the clones apart so running something like-

zfs list -o origin,name,used,avail,refer,mountpoint | \
grep -v ^- |awk '{print $2"\t"$3"\t"$4"\t"$5}'

will get you what you are looking for.

If you haven’t played with ZFS clones yet, basically they are writable snapshots of a file system.

They are great if you want to copy a lot of data to the side, modify it, and possibly replace the original data, without taking a lot of time or disk space. The ZFS clones take seconds to create, since they don’t actually copy any data, and they will only store the blocks which have changed since their creation. If you want to replace the original data, you can then transparently promote the clone to be the master filesystem and turn the master into a clone.

The downside of clones is that they are always dependant on the snapshot from which they were created. You can not destroy a snapshot on which a clone is based without destroying the clone.

For the sake of simplicity and since I don’t usually have disk space issues, I usually prefer to make full copies using ZFS send/recieve but I have definate plans to make more use of ZFS clones in the future.

RAID 10 vs RAID 5: Performance, Cost, Space, and HA

DISCLAIMER: I am not a SAN storage expert but I have spent a lot of time looking into SAN storage systems from the business side and I thought I’d share some of my conclusions.

It seems that the proverbial question is how to balance the performance, cost, usable space, and availability of a storage solution. Any DBA will ask you to give him RAID 10 on small fast disks. Anyone paying the bills will ask “Why can’t I use half the disks I bought?”

I took a couple hours with your friendly neighborhood spreadsheet and did the math. I base my calculations on EMC Clariion storage and tried to follow the EMC best practices guide as much as possible.

According to the best practices, I started my calculations based on a necessary performance level consisting of total IOPS, read percentage, and write percentage.
Then, using the following formulas, I calculate the actual disk IOPS required to provide the requested performance:

  • RAID 5 (4+1 Groups)
    Disk IOPS = (Read % * Required IOPS) +
                    (Write % * RAID5 write penalty * Required IOPS)
  • RAID 10
    Disk IOPS = (Read % * Required IOPS) +
                    (Write % * RAID10 write penalty * Required IOPS)

The RAID 5 write penalty in a 4+1 RAID group is 4 while the RAID 10 write penalty is 2.
Before you even put this in a spreadsheet you know what it will tell you-

  • In a 100% Read Only environment RAID 5 and RAID 10 will give the same performance. RAID 5 may use less disks to do it but not necessarily.
  • In a 100% Write Only environment, RAID 5 will require twice as many disk IOPS and almost twice the number of disks.
  • Anywhere in between those two extremes, the more writes required, the less number of RAID 10 disks you will need to achieve the performance.

If we stop there, it doesn’t seem like there is any point in using RAID 5 since even in the best case scenario, there is only a partial chance that we will use less disks. That is where the cost and space effectiveness issues come in.

  • Space Effective Storage Allocation

If I want 2000 IOPS, 100% Read Only, I can do that using 15 x 146GB 15k RPM disks in RAID 5 or in RAID 10. In RAID 5 I will get ~1.5TB net space while in RAID 10 I will get ~1TB.

  • Cost Effective Storage Allocation

So far, we have compared different RAID types using the same size and speed disks and we saw that theoretically we can use less disks to reach the same performance but at the expense of usable disk space.

If we use bigger disks for the RAID 10, does it make up for the lost space? What effect does using RAID 10 with fewer large disks as opposed to RAID 5 with lots of smaller disks have on the cost of my solution?

That brings us back to the spreadsheet. Using the required disk IOPS we can figure out the required number of physical disks of each type. For the sake of comparison I use the following information which I found on the Internet (your mileage may vary):

  • 146GB 4GbFC 15k RPM, 140 IOPS, $1256
  • 300GB 4GbFC 10k RPM, 120 IOPS, $1348
  • 1TB 4Gb SATA II 7.2k RPM, 80 IOPS, $2088

For each of these I calculate the minimum number of physical disks required for to reach the required IOPS with the required read/write profile for both RAID 10 and RAID 5. Then I figure in the RAID group sizes and calculated the usable disk space.

Using the prices above, I calculate the price per TB of disk space in each RAID configuration and find:

  • 146GB, RAID 5 (4+1): $11.91K/TB
  • 300GB, RAID 5 (4+1): $6.35K/TB
  • 1TB, RAID 5 (4+1): $2.87K/TB
  • 146GB, RAID 10 (4+1): $19.01K/TB
  • 300GB, RAID 10: $10.15K/TB
  • 1TB, RAID 10: $4.59K/TB

What is really interesting here is how close the 300GB RAID 10 is to the 146GB RAID 5! Is this a coincidence?

Looking at the IOPS/TB relationship and $K/IOPS, we find that the ratios are dependant on the read/write profile of the required IOPS. Given the similar Price/TB of 300GB RAID 10 and 146GB RAID 5, I look there for a price/performance/disk space sweet spot.

The following table shows the difference between 146GB RAID 5 IOPS/TB and 300GB RAID 10 IOPS/TB.
Each column represents a different Read percentage (the Write percentage is the inverse).
Negative numbers mean that for this Read percentage and IOPS requirement, RAID 10 gives more IOPS/TB of disk. Positive numbers mean that RAID 5 gives better IOPS/TB.

What you see from this is that for any read workload under 70%, you will get more IOPS/TB from 300GB 10k RPM disks using RAID 10 than you will with RAID 5 on 146GB 15k RPM disks.
Even if you hit 80%, RAID 5 will gain less than 100 IOPS over the RAID 10 configuration and you are still better off paying less for your disks- let the cache do it’s job. Combine all this with our previous conclusion – that the 300GB RAID 10 configuration is ~$1.75K less expensive per TB and I say you have a winner.

Network Interface Utilization in Solaris

A friend asked me how he could see the network utilization in Solaris. It seems like a fairly simple request but for some reason this is not a simple command line away.

In Linux I would instinctively go straight to iptraf. I don’t know if iptraf is the tool of choice these days but I’m pretty sure it is an apt-get away if not already installed.

If you are a DTrace wizard, you could whip something up. Maybe you could get the information from one of the of the DTraceToolkit scripts if their installed. The DTraceToolkit scripts I’ve seen seem to give too much information as most of them are concentrated on not only telling you if the network is loaded but what is loading it as well.

For the sake of practice I wrote the following script:

#!/usr/bin/perl -w
print "Interface: ";
$if=<>;
chomp($if);
$max=`dladm show-dev -p $if | awk -F= '{print \$3}' | awk '{print \$1*1024*1024/8}'`;
print "Max speed: ",$max,"\n";
$if=~m/([a-z0-9]+?)(\d+)/;
($module,$instance)=($1,$2);
$last_rbytes=0;
$last_obytes=0;
while(1){
@kstat=`kstat ${module}:${instance}:mac:/[or]bytes\$/ |awk '{print \$2}'`;
chomp(@kstat);
if($last_rbytes!=0){
printf("%02d%%\n",
(($kstat[$#kstat-1]-$last_rbytes)+
($kstat[$#kstat-2]-$last_obytes))/$max*100);
}
$last_rbytes=$kstat[$#kstat-1];
$last_obytes=$kstat[$#kstat-2];
sleep 1;
};

This script will ask you which interface you want to watch and then print out the utilization percentage on a new row every ~second.

On a side note, it seems strange to me the the received bytes are stored in kstat as rbytes while the transmitted bytes are stored in obytes. The only answer I can come up with is that if they would have chosen ibytes (in bytes) instead of rbytes, then the ‘i’ and ‘o’ might become interchanged in typos since they are next to each other on the keyboard. If they would have chosen tbytes (transmitted bytes), the same situation occurs- ‘r’ next to ‘t’. Still, as a friend pointed out, they could have used sbytes (sent bytes) which makes more sense than obytes.

Google Analytics fixed but is Google crashing?

Google has finally added Israel to the list of Countries in the sign up process which is good news.
On the other hand, they got the timezone wrong (Israel is in DST right now and uses GMT+3 till about October) and that’s after spending over a week fixing it.

What’s going on inside Google? Why did it take so long? Why wasn’t Israel on the list to begin with?
I recieved no explanation from Google but my guess is that they must of had a bug in the code generating the form fields and the javascript behind them. Look at the following code sample:

CC["ID"] = new Array("220|(GMT+07:00) Jakarta","234|(GMT+08:00) Makassar","221|(GMT+09:00) Jayapura");
CC["IR"] = new Array("257|(GMT+03:30) Tehran");
CC["IQ"] = new Array("198|(GMT+03:00) Baghdad");
CC["IE"] = new Array("300|(GMT+00:00) Greenwich Mean Time");
CC["IL"] = new Array("222|(GMT+02:00) Jerusalem");

Before the fix, Jerusalem time was present in the javascript but it was ORed to something else like the first line above.

That is still no excuse for such a system going live. Google’s quality control should have stepped in.

On the other hand it points to a growing list of technical difficulties within Google.

  1. Since Google’s last update to their algorithms, they’ve been returning pages from sites of mine that haven’t been online in years.
  2. For over a week I’ve been experiencing problems with Gmail timing out.
  3. Blogger is less than responsive as always.
  4. Analytics, in the day that I’ve been using it, has often claimed to be under maintenance one second and fine the next- I guess maintenance means “I’m a tired server, leave me alone please.”

The Register reports that Google is choking on web spam: http://www.theregister.co.uk/2006/05/04/google_bigdaddy_chaos/

Webmasters now report sites not being crawled for weeks, with Google SERPS (search engine results pages) returning old pages, and failing to return results for phrases that used to bear fruitful results.

“Some sites have lost 99 per cent of their indexed pages,” reports one member of the Webmaster World forum. “Many cache dates go back to 2004 January.” Others report long-extinct pages showing up as “Supplemental Results.”

But the new algorithms may not be solely to blame. Google’s chief executive Eric Schmidt has hinted at another reason for the recent chaos. In Google’s earnings conference call last month, Schmidt was frank about the extent of the problem.

“Those machines are full,” he said. “We have a huge machine crisis.”

While here they attempt to save face for Google by putting Schmidt’s comment in context, it’s clear that Google has been having technical problems.

Google continued to make substantial capital investments, mainly in computer servers, networking equipment and its data centers. It spent $345 million on such items in the first quarter, more than double the level of last year. Yahoo, its closest rival, spent $142 million on capital expenses in the first quarter.

Referring to the sheer volume of Web site information, video and e-mail that Google’s servers hold, Schmidt said: “Those machines are full. We have a huge machine crisis.”

Jordan Rohan of RBC Capital Markets called Google’s capital spending “unfathomably high,” noting that it spent the same percentage of its revenue on equipment as a wire-line phone company.

I don’t see how the context makes things any better. The bottom line remains that Google needed a heck of a lot more hardware than it had and who knows if they bought everything they needed. Those are only the first quarter figures- I would imagine it could take a whole quarter to deploy $345 million dollars of equipment. I wonder what they will spend next quarter?