[UCI-Linux] RFC on RCS proposal for cheap, reliable storage

Dan Stromberg strombrg at dcs.nac.uci.edu
Thu Jun 15 11:11:42 PDT 2006

On Wed, 2006-06-14 at 12:38 -0700, Harry Mangalam wrote:
> On Wednesday 14 June 2006 11:11, Dan Stromberg wrote:
> > Hi Harry.
> >
> > What do the RAID's look like in your sizings below?  I could probably
> > paper-and-pencil it based on the raw capacity and your figures, but it
> > might be easier to just get it from you :)
> >
> > If it's a single RAID volume, that seems like a pretty aggressive ratio,
> > even for RAID 6 with spares.
> No, it's 2 RAID5's, each with 12 disks.  So the capacity is 10 disks worth 
> plus the parity disk, with one spinning spare.  As you know, I've had plenty  
> of opportunity to check whether the RAID recovery works with the 3wares, and 
> despite other problems, it has worked pretty well.  You can buy 24-port Areca 
> cards, but that's putting too many eggs in one basket for me as well as 
> stressing the IO.

It's probably intended to be a global spare then.

1 to 11 still seems like a pretty aggressive ratio, but you've already
suggested that this is pretty "salt to taste".

As the disks get bigger, and the ratio gets higher, the odds of losing a
second disk in a RAID 5 before a rebuild for the first go up.  But you
probably know that.

> > If not, then there can be significant performance and/or storage
> > penalties - EG, a RAID 5 of RAID 5's will slow down a good bit, and a
> > RAID 10 will provide about half the raw storage.  A stripe or
> > concatenation of RAID 5's might be pretty comfortable though...
> no - my base system is a single level raid5 that can be managed/sliced/diced 
> with EVMS, but you could easily reconfig it into RAID10 if you needed the 
> speed.  My sense is that most people want /reasonably/ reliable storage for 
> reasonable prices.

Wait - single level?  Or two single level RAID 5's striped/catenated?
Or two distinct RAID 5's with a filesystem in each?

> > I like the idea of doing this sort of stuff with commodity hardware.
> > One needs to be especially careful though, in such a scenario, that
> > things are as reliable as "needed" (each use has its own definition) if
> > forgoing vendor support.
> yes - this is a key point - you'll have to decide what your comfort level is 
> here.  One thing that started me on this tho was the problems we had with [2 
> top-tier vendors whose names I will not utter].  Paying top prices does not 
> always map to top support or performance.  But it does buy you someone to 
> blame. :)

Yup.  :)  Although in one of the cases you're likely thinking of, we got
a bargain basement storage system from a big name vendor that sold us
storage assembled from a bunch of smaller companies - but if we'd gotten
something that vendor normally sells and had had good experiences with,
we probably would've been happy with the storage, and our wallets
would've been quite a bit lighter.

> > Then there's the theoretical problem with RAID 5 (and I assume, RAID 6
> > as well) that rewriting a data block + a suitable checksum isn't atomic
> > at the block level.  Supposedly Sun's new "RAID Z" has such atomicity,
> > but it seems to me it should be possible to journal the writes to get
> > around this in RAID 5.  But I haven't heard of anyone actually doing it
> > with RAID 5, and I suppose it's possible that a journaled filesystem
> > overtop of the RAID might help with this.  Then again, it might not....
> > In fact, it "feels" like it wouldn't.
> But most RAID controllers have battery-backed caches, so if the write isn't 
> completed, it will be maintained in-queue for the next write.

That's good to know.  Does it show up in controller feature lists much?
You seem to be saying that they're journaling to battery backed (or NV?)

> > You might find this of interest - it's just something I bookmarked a
> > while back, not something I've looked at much:
> > http://slashdot.org/palm/21/06/05/30/1453254_1.shtml .  It's based on
> > FreeBSD.
> That's very much like what I'm aiming for, except with a little more emphasis 
> on the bigger end.  


> > Supposedly xfs isn't good at ensuring integrity of metadata - I believe
> > it was Ted T'so who suggested only using xfs for valuable data with a
> > battery backup.  Recent version of reiserfs have a decidedly theoretical
> > flavor to them, and IINM, it's proving difficult to get a newer Reiser
> > into the mainline linux kernel.  However, I'm doing my home backups with
> > an oldish Reiser, and it's been working well - I don't miss the fsck'ing
> > with ext3 a bit.
> I've heard this as well, but it's never bitten me (sand's raid is sitting on 
> XFS and it is MUCH faster on large files than reiserfs or ext3 (as well as 
> being incredibly fast to mkfs).  I personally haven't had a problem with 
> reiser but I have heard of others who have. 

Interesting.  I'd've expected XFS to perform better on lots of tiny
files, more than on one large one.  Maybe it performs well on both :)

> > Supposedly EVMS and LVM both sit over top of the Linux Kernel's "Device
> > Mapper" subsystem, and you can use EVMS to manage something configured
> > with LVM, and vice versa, -but- supposedly EVMS may set some options
> > that'll close the door to supporting LVM again on that set of volumes.
> I'm not sure this is an issue, but I'll look into that in more detail - any 
> URLs to this?


...about a quarter of the way down the page.

> > I agree that BackupPC looks interesting.  I also see some promise in
> > http://dcs.nac.uci.edu/~strombrg/plucker-urls/ . 
> foo!?

Oops.  I copied the URL of the enclosing frame by accident.  I meant:


> > Also unadorned rsync 
> > has an option for creating distinct trees with hardlinks, which creates
> > something that feels like both a fullsave and an incremental, but has
> > the storage space of a single fullsave and n incrementals, and when you
> > remove "the fullsave", you could think of one of the incrementals as
> > "becoming" the new fullsave - not sure how BackupPC relates to that, but
> > rdiffweb appears to be more storage efficient than that.  IINM, rdiffweb
> > stores full changes using rsync's binary diff'ing algorithm, while
> > BackupPC let's rsync do the diff'ing for data in transit - but that's a
> > pretty wild guess.  :)
> I looked at rdiff-backup, but while technically sweet (some of the same 
> advantages of the DataDomain server), the whole package isn't usable enough 
> for a widely rolled out package.  Some backuppc users were talking about 
> subsuming this technology into backuppc tho.

Sounds plausible.


More information about the UCI-Linux mailing list