Copyright © 2010 Yonah Russ' Journal. All Rights Reserved. Snowblind by Themes by bavotasan.com. Powered by WordPress.
A disk I have in a production machine went bad:
d4: Mirror Submirror 0: d14 State: Okay Submirror 1: d24 State: Needs maintenance Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 120850176 blocks (57 GB) d14: Submirror of d4 State: Okay Size: 120850176 blocks (57 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s4 0 No Okay Yes d24: Submirror of d4 State: Needs maintenance Invoke: metareplace d4 c1t1d0s4Size: 120850176 blocks (57 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t1d0s4 0 No Maintenance Yes
The first thing I did was check iostat to see how bad the situation was:
bash-3.00# iostat -En...c1t1d0 Soft Errors: 9 Hard Errors: 98 Transport Errors: 27Vendor: SEAGATE Product: ST373207LSUN72G Revision: 045A Serial No: 060133PK2WSize: 73.40GB <73400057856>Media Error: 84 Device Not Ready: 0 No Device: 14 Recoverable: 9Illegal Request: 0 Predictive Failure Analysis: 0...
98 Hard Errors doesn’t look good. (It was probably less the first time I noticed the problem.) Let’s do a surface scan: format -> 1 -> analyze -> read -> y
Without posting the output- suffice it to say that I need to replace the disk. To do this we will have to dettach it from the mirror and offline the disk. If your disk is also part of a ZFS pool, you will need to dettach it from there as well.
Assuming the bad disk is c1t1d0, this will break the mirror:
for a in `metastat -c | grep c1t1 | awk '{print $1}'`; do A=`echo $a | sed 's/.$/0/'`; metadetach -f $A $a; metaclear -f $a;done
You can use zpool detach poolname device to break any basic zfs mirrors.
Then delete any metadb’s that you have on the bad disk. This can be a little tricky. You want at least 3 dbs to remain. If you followed SUN’s advice and put 2 replica state databases on each of the two disks (SunFire v210) then you might want to add some more before you delete the ones on the bad disk. FYI: You cannot add db’s to a slice which already has DB’s on it.
Assuming the metadb’s are on slice 3, metadb -d c1t1d0s3 will delete them and leave you free to offline the disk.
bash-3.00# cfgadm -alAp_Id Type Receptacle Occupant Conditionc0 scsi-bus connected configured unknownc0::dsk/c0t0d0 CD-ROM connected configured unknownc1 scsi-bus connected configured unknownc1::dsk/c1t0d0 disk connected configured unknownc1::dsk/c1t1d0 disk connected configured unknownc2 scsi-bus connected unconfigured unknownusb0/1 unknown empty unconfigured okusb0/2 unknown empty unconfigured okbash-3.00# cfgadm -c unconfigure c1::dsk/c1t1d0
At this point, a blue LED should light up next to the disk which needs to be replaced (at least it does in a V210, other hardware might be different). Replace the disk and get ready to undo everything we did
bash-3.00# cfgadm -c configure c1::dsk/c1t1d0bash-3.00# format # Label the disk with format if necessarybash-3.00# prtvtoc /dev/rdsk/c1t0d0s2 | fmthard -s - /dev/rdsk/c1t1d0s2bash-3.00# metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c1t0d0s3 a p luo 8208 8192 /dev/dsk/c1t0d0s3 a p luo 16400 8192 /dev/dsk/c1t0d0s3 a p luo 24592 8192 /dev/dsk/c1t0d0s3bash-3.00# metadb -a -c 4 c1t1d0s3bash-3.00# metadb flags first blk block count a m p luo 16 8192 /dev/dsk/c1t0d0s3 a p luo 8208 8192 /dev/dsk/c1t0d0s3 a p luo 16400 8192 /dev/dsk/c1t0d0s3 a p luo 24592 8192 /dev/dsk/c1t0d0s3 a u 16 8192 /dev/dsk/c1t1d0s3 a u 8208 8192 /dev/dsk/c1t1d0s3 a u 16400 8192 /dev/dsk/c1t1d0s3 a u 24592 8192 /dev/dsk/c1t1d0s3bash-3.00# metastat -cd20 m 4.0GB d21 d21 s 4.0GB c1t0d0s1d10 m 4.0GB d11 d11 s 4.0GB c1t0d0s0bash-3.00# metainit d22 1 1 c1t1d0s1 d22: Concat/Stripe is setupbash-3.00# metainit d12 1 1 c1t1d0s0d12: Concat/Stripe is setupbash-3.00# metattach d20 d22d20: submirror d22 is attachedbash-3.00# metattach d10 d12d10: submirror d12 is attachedbash-3.00# metastatd20: Mirror Submirror 0: d21 State: Okay Submirror 1: d22 State: Resyncing Resync in progress: 8 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 8392080 blocks (4.0 GB) d21: Submirror of d20 State: Okay Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s1 0 No Okay Yes d22: Submirror of d20 State: Resyncing Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t1d0s1 0 No Okay Yes d10: Mirror Submirror 0: d11 State: Okay Submirror 1: d12 State: Resyncing Resync in progress: 0 % done Pass: 1 Read option: roundrobin (default) Write option: parallel (default) Size: 8392080 blocks (4.0 GB) d11: Submirror of d10 State: Okay Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t0d0s0 0 No Okay Yes d12: Submirror of d10 State: Resyncing Size: 8392080 blocks (4.0 GB) Stripe 0: Device Start Block Dbase State Reloc Hot Spare c1t1d0s0 0 No Okay Yes Device Relocation Information:Device Reloc Device IDc1t1d0 Yes id1,sd@SFUJITSU_MAW3147NC_______DAA0P7203F0Vc1t0d0 Yes id1,sd@SFUJITSU_MAW3147NC_______DAA0P7203F1N
Don’t forget to rebuild your zfs pool if necessary.










June 1, 2009 at 7:06 am
Very useful. Thanks.