Replace faulty disk in SVM mirror

A disk I have in a production machine went bad:

d4: Mirror
  Submirror 0: d14
    State: Okay
  Submirror 1: d24
    State: Needs maintenance
     Pass: 1
     Read option: roundrobin (default)
     Write option: parallel (default)
     Size: 120850176 blocks (57 GB)
d14: Submirror of d4
     State: Okay
     Size: 120850176 blocks (57 GB)
     Stripe 0:
          Device     Start Block  Dbase        State Reloc Hot Spare
          c1t0d0s4          0     No            Okay   Yes
d24: Submirror of d4
     State: Needs maintenance
     Invoke: metareplace d4 c1t1d0s4
     Size: 120850176 blocks (57 GB)
     Stripe 0:
          Device     Start Block  Dbase        State Reloc Hot Spare
          c1t1d0s4          0     No     Maintenance   Yes

The first thing I did was check iostat to see how bad the situation was:

bash-3.00# iostat -En
...
c1t1d0           Soft Errors: 9 Hard Errors: 98 Transport Errors: 27
Vendor: SEAGATE  Product: ST373207LSUN72G  Revision: 045A Serial No: 060133PK2W
Size: 73.40GB <73400057856>
Media Error: 84 Device Not Ready: 0 No Device: 14 Recoverable: 9
Illegal Request: 0 Predictive Failure Analysis: 0...

98 Hard Errors doesn’t look good. (It was probably less the first time I noticed the problem.) Let’s do a surface scan: format -> 1 -> analyze -> read -> y

Without posting the output- suffice it to say that I need to replace the disk. To do this we will have to dettach it from the mirror and offline the disk. If your disk is also part of a ZFS pool, you will need to dettach it from there as well.

Assuming the bad disk is c1t1d0, this will break the mirror:

for a in `metastat -c | grep c1t1 | awk '{print $1}'`;
     do A=`echo $a | sed 's/.$/0/'`;
     metadetach -f $A $a;
     metaclear -f $a;
done

You can use zpool detach poolname device to break any basic zfs mirrors.

Then delete any metadb’s that you have on the bad disk. This can be a little tricky. You want at least 3 dbs to remain. If you followed SUN’s advice and put 2 replica state databases on each of the two disks (SunFire v210) then you might want to add some more before you delete the ones on the bad disk. FYI: You cannot add db’s to a slice which already has DB’s on it.

Assuming the metadb’s are on slice 3, metadb -d c1t1d0s3 will delete them and leave you free to offline the disk.

bash-3.00# cfgadm -al
Ap_Id                          Type         Receptacle   Occupant     Condition
c0                             scsi-bus     connected    configured   unknown
c0::dsk/c0t0d0                 CD-ROM       connected    configured   unknown
c1                             scsi-bus     connected    configured   unknown
c1::dsk/c1t0d0                 disk         connected    configured   unknown
c1::dsk/c1t1d0                 disk         connected    configured   unknown
c2                             scsi-bus     connected    unconfigured unknown
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         unknown      empty        unconfigured ok
bash-3.00# cfgadm -c unconfigure c1::dsk/c1t1d0

At this point, a blue LED should light up next to the disk which needs to be replaced (at least it does in a V210, other hardware might be different). Replace the disk and get ready to undo everything we did 😉

bash-3.00# cfgadm -c configure c1::dsk/c1t1d0
bash-3.00# format 
# Label the disk with format if necessary
bash-3.00# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2
bash-3.00# metadb 
     flags           first blk       block count
  a m  p  luo        16              8192            /dev/dsk/c1t0d0s3
  a    p  luo        8208            8192            /dev/dsk/c1t0d0s3
  a    p  luo        16400           8192            /dev/dsk/c1t0d0s3
  a    p  luo        24592           8192            /dev/dsk/c1t0d0s3
bash-3.00# metadb -a -c 4 c1t1d0s3
bash-3.00# metadb
     flags           first blk       block count
  a m  p  luo        16              8192            /dev/dsk/c1t0d0s3
  a    p  luo        8208            8192            /dev/dsk/c1t0d0s3
  a    p  luo        16400           8192            /dev/dsk/c1t0d0s3
  a    p  luo        24592           8192            /dev/dsk/c1t0d0s3
  a        u         16              8192            /dev/dsk/c1t1d0s3
  a        u         8208            8192            /dev/dsk/c1t1d0s3
  a        u         16400           8192            /dev/dsk/c1t1d0s3
  a        u         24592           8192            /dev/dsk/c1t1d0s3
bash-3.00# metastat -c
d20              m  4.0GB d21  d21
          s  4.0GB c1t0d0s1
d10              m  4.0GB d11  d11
          s  4.0GB c1t0d0s0
bash-3.00# metainit d22 1 1 c1t1d0s1
d22: Concat/Stripe is setup
bash-3.00# metainit d12 1 1 c1t1d0s0
d12: Concat/Stripe is setup
bash-3.00# metattach d20 d22
d20: submirror d22 is attached
bash-3.00# metattach d10 d12
d10: submirror d12 is attached
bash-3.00# metastat
d20: Mirror
   Submirror 0: d21
    State: Okay
   Submirror 1: d22
    State: Resyncing
  Resync in progress: 8 % done
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 8392080 blocks (4.0 GB)
d21:
Submirror of d20
  State: Okay
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t0d0s1          0     No            Okay   Yes
d22: Submirror of d20
  State: Resyncing
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t1d0s1          0     No            Okay   Yes
d10: Mirror
  Submirror 0: d11
    State: Okay
  Submirror 1: d12
    State: Resyncing
  Resync in progress: 0 % done
  Pass: 1
  Read option: roundrobin (default)
  Write option: parallel (default)
  Size: 8392080 blocks (4.0 GB)
d11: Submirror of d10
  State: Okay
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t0d0s0          0     No            Okay   Yes
d12: Submirror of d10
  State: Resyncing
  Size: 8392080 blocks (4.0 GB)
  Stripe 0:
      Device     Start Block  Dbase        State Reloc Hot Spare
      c1t1d0s0          0     No            Okay   Yes
Device Relocation Information:
Device   Reloc  Device ID
c1t1d0   Yes    id1,[email protected]_MAW3147NC_______DAA0P7203F0V
c1t0d0   Yes    id1,[email protected]_MAW3147NC_______DAA0P7203F1N

Don’t forget to rebuild your zfs pool if necessary.

Be Sociable, Share!
  • Twitter
  • Facebook
  • email
  • LinkedIn
  • HackerNews
  • Reddit

2 comments for “Replace faulty disk in SVM mirror

  1. Anonymous
    June 1, 2009 at 7:06 am

    Very useful. Thanks.

  2. Dima
    September 21, 2010 at 6:29 am

    thank’s . worked again and again

Leave a Reply

Your email address will not be published.