SUMMARY: RAID 5 with 22 disks

Rasana Atreya (atreya@library.ucsf.edu)
Mon, 09 Jun 1997 11:39:23 -0700

Hi Managers,

Thanks to:

From: "Birger A. Wathne" <birger@Vest.Skrivervik.No>
From: vnarayan@haverford.edu (Vasantha Narayanan)
From: patesa@aur.alcatel.com (Sanjay Patel)
From: David Robson <robbo@box.net.au>
From: Jim Harmon <jharmon@telecnnct.com>

I had 22 disks, 2Gig each with which I wanted to set up RAID5. But I came
across a previous summary that said:

"Sun advised that more than 6 disks in a RAID 5 stripe was _bad_. Brian Wong's
paper suggested that we can create RAID 5 stripes, each with six disks, and
then _concatenate_ them together to make larger devices!"

In Brian Wong's paper
(http://www.sun.com/sunworldonline/swol-09-1995/swol-09-raid5.html)
I found out the answer to my question: Why it is wise to limit the
width of a parity RAID volume to no more than 6 disks:

Suppose you have a 30 disk RAID with parity (RAID 3 or RAID 5), and
one of them fails; a read would require 29 physical I/O operations -
to recover the failed member's data! Writes to such a volume are also
very expensive. He says that most array software permits the
concatenation of multiple RAID volumes if larger capacity volumes are
required.

Jim Harmon thought that "limit" the paper talks about is the RAID
controller, since a RAID controller (per channel) can only support 6
drives on one chain. While it is true that the _controller_ can
support only 6 devices, this is not what the paper was talking about
(see above). I had these disks on 5 different channels.

Jim went on to give some useful info about controllers:
However, that is FAST NARROW SCSI. In a FAST WIDE SCSI system, it is
theoretically possible to mount 256 drives and under the right
management program, treat them all as one huge virtual drive.
Typically, a 4-channel or 5-channel RAID Controller can easily
control 32 -or more- drives, and under various levels of raid, can be
configured as seperate drives, collections of drives, or one virtual
drive. Mirroring, striping, hotswapping, etc. can all be mixed under
the newer controllers.

David Robson commented: With RAID 5, write access is considerably
slower than normal and causes a significant system overhead
continualy recalculating the parity. If a disk dies, the system will
(should) hold up but performance will degrade further, and then when
you recover the replacment disk it will take quite some time! You
should also note that "growing" a RAID 5 metadisk is not recommended,
which means if you have 4 disks in a RAID 5 device and try to add two
more, performance may be reduced. this means you will have to back off
your data and recreate the entire device! If you can afford the
disks, concatenate and then mirror to gain redundancy (thats what Sun
recommended to me).

He's right about everything except the "write access is considerably
slower than normal and causes a significant system overhead continualy
recalculating the parity" part. Brian Wong's paper says that this is
a common misconception. According to him "This process is commonly and
erroneously thought to be the most expensive part of RAID-5 overhead,
but parity computation consumes less than a millisecond, a figure
dwarfed by the typical 3-15 millisecond service times for I/O to
member disks.

My first question was if I could create 4 independent metadevices (RAID 5),
each with one hot spare and then mount each of the metadevices under a
different mount point.

The anwer is yes. It is indeed possible. I went ahead and setup 4 RAID5
metadevices each with 5 disks. But instead of asssociating each
metadevice with a hotspare, (thanks to help from Sanjay Patel, Birger A.
Wathne and Vasantha Narayanan) I created a hot spare pool with 2 hot
spares in it. I associated the pool with each metadevice by indicating
this in the md.tab file.

Vasantha Narayanan wasn't sure if we could use a single disk as the
hotspare for multiple metadevices. This is definitely possible. All
you need to do is to indicate this in md.tab.

Birger A. Wathne pointed out that I did not need one spare for each
raid set (as was my original plan). He felt that 1 spare disk for all
raid sets should be enough since raid5 sets can run with a disk failure.

He also said: The rule is that you cannot survive two failed disks in
the same RAID 5 set. With one hot spare the first disk failure gets the
RAID set containing the failed disk in a critical situation only for a
limited time. The file system cannot survive another hit while the hot
spare is syncing up. But after that, you are ready to take minimum two
more blows before you lose any file system.
I have been told to expect 1 to 2 % failed disks each year in
big disk farms. My own experience is that the failure rate for new
disks is rather high the first months. So be very vigilant for the first
2 months.

My second question was: If I create RAID 5 stripes, and then concatenate them
together to make larger devices, would this be a good thing to do? If so, how
would I do it?

I ended up not doing any striping (I'm quitting this Friday, and I did
not want to leave behind something I wasn't sure of).

Sanjay Patel's suggestions:

Hot spare pool -> 2 disks
RAID 5 - 1 -> 6 disks
RAID 5 - 2 -> 6 disks
RAID 5 - 3 -> 5 disks
RAID 5 - 4 -> 3 disks

concat/stripe 1 -> contains RAID 5 - [ 1 thru 4 ]

total disk space available (raw) will equal 16 x disk size
note. 2.1 GB disks have a formated capacity of 1.8 GB.
RAW = (16 * 1.8) = 28.8 GB

attach the hot spare pool to all raid stripes.

if your disks are hot swapable, then i would only have one hot
spare and place the extra disk with the RAID 5 - 3 stripe.

to create a concatentate/stripe of all the raid devices in disk suite,
simply create an empty contactentate/stripe and place all of the raid
devices you have previously created into the concatentate/stripe as if
they were normal disks.

a hot swapable disk is a a disk that can be unplugged while the system
is running. most SSAs (110, 112, 114) are not truely hot swapable since
an entire tray has to be removed to replace a disk. how swapable arrays
include Netras, DiskPacks (the new type), and the RSM arrays.

An example of concatenation in md.tab for Solaris 2.5.1 & DiskSuite 4.0:

if you are starting this server from scratch, i would recommend you get
SDS 4.1 (dont forget to to download the patches). if you dont have a
copy and you need to use SDS 4.0:

create the RAID 5 devices then to concatenate:

/dev/md/dsk/d? 4 1 /dev/md/dsk/d? 1 /dev/md/dsk/d? 1 \
/dev/md/dsk/d? 1 /dev/md/dsk/d?

the first d? is the next available metadevice number
the 4 is the number of items to concatenate followed by the devices that
are part of the concatenate (ie. your raid stripes)

in SDS 4.1, its all GUI, and its all point-click, drag & drop :->

---------------------------------------------------------------------------
Thanks much.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
~ Rasana Atreya Voice: (415) 476-3623 ~
~ System Administrator Fax: (415) 476-4653 ~
~ Library & Ctr for Knowledge Mgmt, Univ. of California at San Francisco ~
~ 530 Parnassus Ave, Box 0840, San Francisco, CA 94143-0840 ~
~ atreya@library.ucsf.edu ~
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~