[UCI-Linux] Death and Resurrection with mdadm

Fri Dec 14 09:51:45 PST 2007

Hi All,

This is an FYI and a HOWTO as to how to (hopefully) recover from a 
software raid (mdadm) failure.  The Linux software mdadm software
RAID driver is a cheap alternative to hardware RAID controllers.  Its
setup and initialization is so magical that I never thought about
recovery from a failure.  Until, of course, it reared up and bit me on
the nose.  Maybe this will help you in your hour of need.

The machine in question is a P3 Compaq with 4x500GB SATA disks on a
cheap ($60) but so far reliable Promise SATA300/TX4 4-port card, being
used as a (slow) data repository.  It's set up as a RAID5 (one parity
disk), with no spares.

This morning, I was transferring a multiGB file to the machine via
rsync.   Someone else was trying to do some work on the same
filesystem via samba.   While I was helping him, the samba share
started acting peculiar and  I checked my large transfer to see if it
was causing trouble.

At about the same time, I got 2 messages from the mdadm daemon saying 
that there were problems with the RAID system and when I checked 
dmesg, it looked like disks sdc and sdd had failed within a few 
seconds of each other (at least the mdadm mailer works  as
advertised).  Since it was a RAID5, the RAID failed.   Disks sda  and
sdb still looked to be OK (responded to smartctl queries), but  sdc
and sdc did not. I commented out the fstab entry for the raid and
tried to reboot.

The instigating reason for this is still unclear - it seems to have
been  electrical in nature, but no other machines in the server room
were affected, so it was probably isolated to the machine.

2 warm reboots hung, with it getting to an apparent failed grub
section in  the 1st case (implying an error with the hda IDE disk) and
an  apparent failed disk controller in the 2nd (the Promise SATA
controller).  As will be seen later, all  the disks were fine (on the
hardware level).

I then powered off the machine for a few minutes.  Powering on made
everything come up as expected, altho when I tried to have the
machine mount the RAID automatically, it failed with:

$ mount /dev/md0
mount: /dev/md0: can't read superblock

Trying to force mdadm to assemble it did not work:

$ mdadm --assemble  /dev/md0   --chunk 16  /dev/sd*1
mdadm: /dev/md0 assembled from 2 drives - not enough to start the 
array.

It was a 4-disk array, so this is a failure.  However, it did not 
destroy any data either.

To cut a long story shorter, I recovered the RAID by stopping the raid
device:

$ mdadm --stop /dev/md0
mdadm: stopped /dev/md0

then creating it with the '--assume-clean' option, which prevents any 
data from being written to the RAID. (tho answering 'y' was fairly 
sweaty...)

$ mdadm --create --assume-clean /dev/md0  -n4 --chunk 
16 --level=raid5 /dev/sd*1
mdadm: /dev/sda1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Tue Nov 27 14:17:40 2007
mdadm: /dev/sdb1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Tue Nov 27 14:17:40 2007
mdadm: /dev/sdc1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Tue Nov 27 14:17:40 2007
mdadm: /dev/sdd1 appears to be part of a raid array:
    level=raid5 devices=4 ctime=Tue Nov 27 14:17:40 2007
Continue creating array? y
mdadm: array /dev/md0 started.

and stop md0 again
$ mdadm --stop /dev/md0
mdadm: stopped /dev/md0

and then assembling it with the resync option, which forces a 
data-integrity check on the entire raid (this takes a while - about 4
hours on the 1.5TB raid, the process (md0_raid5) consuming about 1/2
the CPU while it does so)

$ mdadm --assemble --update=resync  /dev/md0  /dev/sd*1
mdadm: /dev/md0 has been started with 4 drives.

Immediately afterwards, it could then be mounted:
 $ mount -t xfs /dev/md0 /r5

and shows up appropriately with df:
 df
Filesystem           1K-blocks      Used Available Use% Mounted on
/dev/hda1              9614116   1974496   7151248  22% /
...
/dev/md0             1465020416  77115328 1387905088   6% /r5

It can be used while the resync is in progress, but if you can keep it
offline, the check will go faster, and if it shows irrepairable
errors, the data you've read/written may be damaged).

Besides the mdadm manual, there was a useful discussion of this at:
http://www.storageforum.net/forum/archive/index.php/t-5890.html and
the linux software RAID list (linux-raid at vger.kernel.org) is very
helpful (and where the Neil Brown, author of mdadm, hangs out).

-- 
Harry Mangalam - Research Computing, NACS, E2148, Engineering Gateway, 
UC Irvine 92697  949 824 0084(o), 949 285 4487(c)
--
[Don't be afraid to go out on a limb.  That's where the fruit is.
H. Jackson Browne]