Tag: zfs

Sun’s Predicament

I’ve been working with Unix for a fairly long time now- about 13 years.

I’ll admit that I started with Linux and thought it was light years ahead of SunOS 4.x running on those old SPARC machines- I mean who had heard of SPARC processors? I remember my boss trying to explain to me that even an older SPARC processor was more powerful than a newer Intel Pentium processor. I didn’t really believe him. In time, I convinced them to get rid of most of their SPARC/Solaris in favor of the hip, free, and cheap Intel/Linux combination.

Now I see that I couldn’t have been more wrong. I realize that SunOS 4.x probably still has features which I don’t know how to use properly. When I look at Solaris 10, ZFS, Zones, LDOMS, DTrace, etc. I not really sure you could pay me to work with Linux (that would be soo depressing). That isn’t even mentioning the SPARC hardware it runs on- Can any Intel server compare to a T5140???

That’s why the current situation with Sun absolutely SUCKS (pardon my french)! I’m sure there are a lot of admins out there who feel the same way. If this Oracle deal doesn’t go through and Sun disappears because of it, it will be our loss. We’ll be stuck with mediocre operating systems and commodity hardware and I really hope it doesn’t happen.

That said, I’d like to say thanks to all the people at Sun who are still turning out crazy cool technologies despite the problems.

Listing ZFS Clones using the origin property

Recently I created my first ZFS clones but quickly realized that there was no simple way to tell the clones from the regular filesystems. My first instinct was to run ‘zfs list -t clone’ similar to ‘zfs list -t snapshot’ but this didn’t work. Maybe it works in newer versions of ZFS.

After some poking around I found the ‘origin’ property which sets the clones apart so running something like-

zfs list -o origin,name,used,avail,refer,mountpoint | \
grep -v ^- |awk '{print $2"\t"$3"\t"$4"\t"$5}'

will get you what you are looking for.

If you haven’t played with ZFS clones yet, basically they are writable snapshots of a file system.

They are great if you want to copy a lot of data to the side, modify it, and possibly replace the original data, without taking a lot of time or disk space. The ZFS clones take seconds to create, since they don’t actually copy any data, and they will only store the blocks which have changed since their creation. If you want to replace the original data, you can then transparently promote the clone to be the master filesystem and turn the master into a clone.

The downside of clones is that they are always dependant on the snapshot from which they were created. You can not destroy a snapshot on which a clone is based without destroying the clone.

For the sake of simplicity and since I don’t usually have disk space issues, I usually prefer to make full copies using ZFS send/recieve but I have definate plans to make more use of ZFS clones in the future.

Replace faulty disk in SVM mirror

A disk I have in a production machine went bad:

d4: Mirror
  Submirror 0: d14
    State: Okay
  Submirror 1: d24
    State: Needs maintenance
     Pass: 1
     Read option: roundrobin (default)
     Write option: parallel (default)
     Size: 120850176 blocks (57 GB)
d14: Submirror of d4
     State: Okay
     Size: 120850176 blocks (57 GB)
     Stripe 0:
          Device     Start Block  Dbase        State Reloc Hot Spare
          c1t0d0s4          0     No            Okay   Yes
d24: Submirror of d4
     State: Needs maintenance
     Invoke: metareplace d4 c1t1d0s4
     Size: 120850176 blocks (57 GB)
     Stripe 0:
          Device     Start Block  Dbase        State Reloc Hot Spare
          c1t1d0s4          0     No     Maintenance   Yes

The first thing I did was check iostat to see how bad the situation was:

bash-3.00# iostat -En
...
c1t1d0           Soft Errors: 9 Hard Errors: 98 Transport Errors: 27
Vendor: SEAGATE  Product: ST373207LSUN72G  Revision: 045A Serial No: 060133PK2W
Size: 73.40GB <73400057856>
Media Error: 84 Device Not Ready: 0 No Device: 14 Recoverable: 9
Illegal Request: 0 Predictive Failure Analysis: 0...

98 Hard Errors doesn’t look good. (It was probably less the first time I noticed the problem.) Let’s do a surface scan: format -> 1 -> analyze -> read -> y

Without posting the output- suffice it to say that I need to replace the disk. To do this we will have to dettach it from the mirror and offline the disk. If your disk is also part of a ZFS pool, you will need to dettach it from there as well.

Assuming the bad disk is c1t1d0, this will break the mirror:

for a in `metastat -c | grep c1t1 | awk '{print $1}'`;
     do A=`echo $a | sed 's/.$/0/'`;
     metadetach -f $A $a;
     metaclear -f $a;
done

You can use zpool detach poolname device to break any basic zfs mirrors.

Then delete any metadb’s that you have on the bad disk. This can be a little tricky. You want at least 3 dbs to remain. If you followed SUN’s advice and put 2 replica state databases on each of the two disks (SunFire v210) then you might want to add some more before you delete the ones on the bad disk. FYI: You cannot add db’s to a slice which already has DB’s on it.

Assuming the metadb’s are on slice 3, metadb -d c1t1d0s3 will delete them and leave you free to offline the disk.

bash-3.00# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 CD-ROM       connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 disk         connected    configured   unknown
c1::dsk/c1t1d0                 disk         connected    configured   unknown
c2                             scsi-bus     connected    unconfigured unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
bash-3.00# cfgadm -c unconfigure c1::dsk/c1t1d0

At this point, a blue LED should light up next to the disk which needs to be replaced (at least it does in a V210, other hardware might be different). Replace the disk and get ready to undo everything we did 😉

bash-3.00# cfgadm -c configure c1::dsk/c1t1d0
bash-3.00# format 
# Label the disk with format if necessary
bash-3.00# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
bash-3.00# metadb 
     flags           first blk       block count
  a m  p  luo        16              8192            /dev/dsk/c1t0d0s3
  a    p  luo        8208            8192            /dev/dsk/c1t0d0s3
  a    p  luo        16400           8192            /dev/dsk/c1t0d0s3
  a    p  luo        24592           8192            /dev/dsk/c1t0d0s3
bash-3.00# metadb -a -c 4 c1t1d0s3
bash-3.00# metadb
     flags           first blk       block count
  a m  p  luo        16              8192            /dev/dsk/c1t0d0s3
  a    p  luo        8208            8192            /dev/dsk/c1t0d0s3
  a    p  luo        16400           8192            /dev/dsk/c1t0d0s3
  a    p  luo        24592           8192            /dev/dsk/c1t0d0s3
  a        u         16              8192            /dev/dsk/c1t1d0s3
  a        u         8208            8192            /dev/dsk/c1t1d0s3
  a        u         16400           8192            /dev/dsk/c1t1d0s3
  a        u         24592           8192            /dev/dsk/c1t1d0s3
bash-3.00# metastat -c
d20              m  4.0GB d21  d21
          s  4.0GB c1t0d0s1
d10              m  4.0GB d11  d11
          s  4.0GB c1t0d0s0
bash-3.00# metainit d22 1 1 c1t1d0s1
d22: Concat/Stripe is setup
bash-3.00# metainit d12 1 1 c1t1d0s0
d12: Concat/Stripe is setup
bash-3.00# metattach d20 d22
d20: submirror d22 is attached
bash-3.00# metattach d10 d12
d10: submirror d12 is attached
bash-3.00# metastat
d20: Mirror
   Submirror 0: d21
    State: Okay
   Submirror 1: d22
    State: Resyncing
  Resync in progress: 8 % done
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 8392080 blocks (4.0 GB)
d21:
Submirror of d20
  State: Okay
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t0d0s1          0     No            Okay   Yes
d22: Submirror of d20
  State: Resyncing
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t1d0s1          0     No            Okay   Yes
d10: Mirror
  Submirror 0: d11
    State: Okay
  Submirror 1: d12
    State: Resyncing
  Resync in progress: 0 % done
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 8392080 blocks (4.0 GB)
d11: Submirror of d10
  State: Okay
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t0d0s0          0     No            Okay   Yes
d12: Submirror of d10
  State: Resyncing
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t1d0s0          0     No            Okay   Yes
Device Relocation Information:
Device   Reloc  Device ID
c1t1d0   Yes    id1,[email protected]_MAW3147NC_______DAA0P7203F0V
c1t0d0   Yes    id1,[email protected]_MAW3147NC_______DAA0P7203F1N

Don’t forget to rebuild your zfs pool if necessary.

Sparc Solaris 10 Jumpstart Flar DVD – Part 1

The Solaris Flash installation feature enables you to use a single reference installation of the Solaris OS on a system, which is called the master system. Then, you can replicate that installation on a number of systems, which are called clone systems. You can replicate clone systems with a Solaris Flash initial installation that overwrites all files on the system or with a Solaris Flash update that only includes the differences between two system images. A differential update changes only the files that are specified and is restricted to systems that contain software consistent with the old master image.

By combining Flash installation with Custom Jumpstart, and packaging all that on a re-mastered Solaris installation DVD, you can create very fast and efficient, standalone, and automated installation media.

I ran into several issues trying to create such a DVD when following the standard Google results so I thought I’d summarize my experiences. This is a work in progress- I might hit a brick wall at some point, but I hope not.

First, I built the prototype system. I’m running Solaris 10 11/06 with one non-global zone based entirely on a ZFS file system. This will make things challenging since Solaris Flash Archives are not completely compatible (or even supported) for these kinds of configurations and Jumpstart is not ZFS aware.

Creating the Flash Archive

  1. Make sure you have the right packages installed (SUNWinst, SUNWadmc, SUNWadmfw, SUNWbtool) Theoretically, you should install platform support for all possible hardware- I forget the name of the cluster- but if you will only be installing on the same hardware, this isn’t necessary. NOTE- If you try to install packages from inside single user mode with non-global zones it will give you issues.
  2. Put the prototype system into single user mode
  3. Create a text file, called for example ‘exclude’, with the directories not to include in the flash archive (man flarcreate)
  4. flarcreate -n system -X exclude -c system.flar
    Full Flash
    Checking integrity...
    Integrity OK.
    Running precreation scripts...
    Precreation scripts done.
    Determining the size of the archive...
    cpio: File size of "etc/mnttab" has decreased by 136
    2259925 blocks
    1 error(s)
    The archive will be approximately 764.41MB.
    Creating the archive...
    2259925 blocks
    Archive creation complete.
    Running postcreation scripts...
    Postcreation scripts done.

    Running pre-exit scripts...
    Pre-exit scripts done.
  5. Verify your archive: flar info -l system.flar

More to come…

Howto resize or shrink UFS partitions

A friend of mine asked me the other day if there was such a thing as Partition Magic for Solaris. Apparently, someone had installed a system on a single slice and they’re security team was requiring a separate partition for the DB.

Here are the givens:

  • Sunfire V210
  • Solaris 8 (Otherwise we’d be using zfs)
  • 2 73GB disks
  • 1 slice on disk1
  • Disk 2 is supposed to be a mirror of disk 1 but it isn’t used yet
  • Downtime is allowed
  • Reinstalling is not an option

I personally don’t know of any tool that lets you shrink UFS partitions but that doesn’t mean that we can’t perform some Partition Magic of our own.

NOTE:
I have not tested this procedure. I think it is logical and should work and it should do no harm as the first disk remains fully intact.

  1. Go into single user mode
  2. Partition the second disk as required.
  3. newfs the partitions on the second disk
  4. Mount the second disk’s partitions
  5. Use ufsdump/ufsrestore to copy the filesystem into it’s smaller home

    ufsdump 0f - / | ( cd /mnt/newroot ;ufsrestore xvf - )

  6. When all the partitions are done, use installboot to make the second disk bootable.

    installboot /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1t2d0s0

  7. Shutdown the system, physically swap the disks, and do a reconfiguration reboot.

If rebooting goes smoothly, test your new system thoroughly and then build your mirrors.