[UCI-Linux] some comments and crude benchmarks on SW SATA raid5 on linux

Wed Mar 30 20:53:47 PST 2005

Hi All,

FYI, the machine platform is a 2xOpteron, running ubuntu hoary preview 
(64bit), 4GB RAM, system running off a single IDE drive.

The raid drives are running on the on-board 4way Silicon Image SATA 
controller.  The drives are identical 250GB WD SATAs Model: WDC WD2500JD-00G, 
each partitioned for 232GB on /dev/sdx1 and 1.8GB on /dev/sdx2 (for 
parallel swap partitions).

I'm using the mdadm suite to set them up and control the raid:
1 - create the raid:
$  mdadm --create --verbose /dev/md0 --level=5 --raid-devices=4 
--spare-devices=0 -c128 /dev/sd{a,b,c,d}1 
mdadm: layout defaults to left-symmetric
mdadm: /dev/sda1 appears to contain a reiserfs file system
    size = 242220004K
mdadm: /dev/sdb1 appears to contain a reiserfs file system
    size = 242220004K
mdadm: /dev/sdc1 appears to contain a reiserfs file system
    size = 242220004K
mdadm: /dev/sdd1 appears to contain a reiserfs file system
    size = 242220004K
mdadm: size set to 242219904K
Continue creating array? y
mdadm: array /dev/md0 started.

2 - make sure we monitor it.
 $ nohup mdadm --monitor --mail='hjm at tacgi.com' --delay=300 /dev/md0 &

3 - make the reiserfs on md0 (it was made on the individual partitions before, 
but apparantly it needs to be made on the virtual device)
$ mkreiserfs /dev/md0 

4 - # then mount it
$ mount -t reiserfs /dev/md0 /r

5- #then admire it
 $ df
Filesystem           1K-blocks      Used Available Use% Mounted on
...
/dev/md0             726637532     32840 726604692   1% /r

so for a raid5 array, we end up with about 78% of the input space (more than I 
expected) - the rest is lost to the parity info which is striped across all 
the disks, giving the redundancy.

When the raid initialized, mdadm immediately sent me an email warning of a 
degraded array - this was not welcome news, but it turns out that this is 
normal - in building the parity checksums, it essentially fakes a dead disk 
and rebuilds all the parity info.  This took about 8 hrs to do for 1 TB, 
however, the array was available and pretty peppy without waiting for it to 
finish.  And the message did confirm that mdadm was actually monitoring the 
array.

I immediately tried a few cp's to and from it - and on the 'degraded' array, 
got ~40MB/s to and from the IDE drive on some 100-600MB files.  There was not 
much difference after it finished doing the parity rebuild - possibly it was 
deferring the parity calculations until afterwards?  If anything it's 
slightly slower now that the parity info is complete - maybe 38-40MB/s.  
(this measure includes the sync time - with 4GB of RAM, GB files can be 
buffered to RAM and so appear to be copied in a few sec).

On my home 2xPIII system with IDE drives, I only get ~7-8MB/s between drives, 
so 40MB/s sounds pretty good.  Bonnie++ reports (a bunch of confusing #s) but 
seems to indicate that depending on CPU utilization, type of io, and size of 
file, disk io will range from ~80MB/s to 24MB/s on the SATA raid.

On my old IDE laptop (but with a newer disk), bonnie returns numbers that are 
surprisingly good - about 1/3 to 1/4 the RAID speed.

On the 2xPII home IDE system, bonnie returns numbers that are not much better 
than the laptop.

So there you have it - linux SW SATA raid is pretty easy to set up, can be 
configured to be reasonably informative via email, is pretty cheap (relative 
to the true HW raid cards that go for $300-$400 each) and seems to be pretty 
fast.  Long term, I can't say yet.

Also note that this is using an md device without any further wrapping with 
lvm - we just need a huge data space, not much needed in the way of 
administering different group allocations etc.

Would like to hear others' experiences.

hjm