SUMMARY: DiskSuite/Multi-pack on Solaris 2.5 configuration

Marc S. Gibian (gibian@stars1.hanscom.af.mil)
Tue, 29 Apr 1997 11:20:15 -0400

--Boundary_(ID_o/b0OfAJxXhlpJS5UDLciA)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7bit
Content-MD5: 9uN6CoAK08mzYsqq6UmQDg==
X-Sun-Data-type: text

I asked if people could help me understand the tradeoffs involved in building
filesystems for optimal performance on a new 12 drive Multi-pack. The full
question along with the responses I got are attached.

The only considerations raised were:

1. Remember that the fast/wide controller is rated at 20MB/sec, while each of
the 7200 RPM drives are rated in the 3-5MB/sec range. Thus, the configuration
should reflect the maximum number of concurrently active drives on the single
fast/wide controller... i.e. 4-6 of these drives at their maximum transfer rate
will consume all available bandwidth for the controller. Since all 12 disks are
in the same unit, this has some interesting consequences as 12 x 3-5MB/sec gives
a theoretical maximum of 36-60 MB/sec, which would make things totally
bottlenecked by the fast/wide SCSI bus/controller. This certainly makes me
wonder just how well balanced the 12 7200 RPM drive unit is as a storage
product. Since it has no onboard intellegence, all processing requires transfers
over the F/W SCSI bus. Clearly a smarter device could make better usage of this
number of drives if it has multiple or faster internal drive interconnections,
while presenting an external F/W SCSI connection.

It was suggested that 4 drive stripes would be optimal as more would be too much
for F/W SCSI. Of course, nothing ever runs at top rated speeds, so it remains to
be seen what the actual transfer rates are, and thus more drives in a single
stripe may be faster and make more use of the F/W SCSI bandwidth.

2. I got a strong recommendation to reconsider mirroring in addition to
striping. I suspect that due to budget, space needs, and bandwidth limitations,
I will go without mirroring (other than mirroring my trans-log(s), which is
pretty much a must). I'll count on good and regular backups. Another budget
driven solution that is less than optimal but works "good enough."

3. Remember that the larger the individual filesystem, the longer the backup and
restore time. Emphasis was placed on restore time. While I agree that a server
crash takes down the entire team, given the budget within which I am working, I
am somewhat less worried about potential disk crash and recovery times. I need
to emphasize getting enough space to easily manage the server and get the best
performance I can to keep my engineers as productive as possible. From a
mirroring standpoint, a 3 or 4 drive configuration was recommended.

4. Finally, it was recommended that if there are multiple "hot spots", I keep
them on seperate disks to avoid seek thrashing.

SO, it looks like I will be going to multiple full disk stripes each containing
3 or 4 drives. As for swap and tmp, or even location of my trans-log mirror, I
really didn't get much input. I suspect I will reserve a couple of drives for
experimentation and future expansion of stripe filesystems, probably putting
swap and tmp on them and maybe even my trans-log mirror. They still would be
underused if I used two drives exclusively for these tasks. The underlying
problem with such use is that it adds to the bandwidth competition on the single
F/W SCSI connection supported by this unit.

All I need now is the completed purchase and delivery of the new equipment and I
can get going on this!

My thanks to:

brion@dia.state.ma.us
dano@nodewarrior.net (Dan Goldman)
birger@Vest.Sdata.No (Birger A. Wathne)

-Marc

Marc S. Gibian
Telos Comsys phone: (617) 377-6350
PRISM/TFS email: gibian@stars1.hanscom.af.mil

--Boundary_(ID_o/b0OfAJxXhlpJS5UDLciA)
Content-type: MESSAGE/RFC822
Content-description: Mailbox
Content-MD5: sQdwGYtVpg10UggjETHxKQ==
X-Sun-Data-type: mail-message

Return-path: <sun-managers-relay@ra.mcs.anl.gov>
Received: from stars1.hanscom.af.mil by drizzle.tfs.com (SMI-8.6/SMI-SVR4)
id NAA08169; Fri, 11 Apr 1997 13:40:37 -0400
Received: from hpux2.hanscom.af.mil by stars1.hanscom.af.mil (SMI-8.6/SMI-SVR4)
id NAA22926; Fri, 11 Apr 1997 13:28:35 -0400
Received: from ra.mcs.anl.gov (ra.mcs.anl.gov [140.221.9.21])
by hpux2.hanscom.af.mil (8.6.12/8.6.12) with ESMTP id NAA04446; Fri,
11 Apr 1997 13:35:14 -0400
Received: from localhost (daemon@localhost) by ra.mcs.anl.gov (8.8.3/8.8.3)
with SMTP id MAA29626; Fri, 11 Apr 1997 12:29:00 -0500 (CDT)
Received: by ra.mcs.anl.gov (bulk_mailer v1.5); Fri, 11 Apr 1997 12:27:40 -0500
Received: (from daemon@localhost) by ra.mcs.anl.gov (8.8.3/8.8.3)
id MAA29520 for sun-managers-outbound; Fri, 11 Apr 1997 12:08:31 -0500 (CDT)
Received: from hpux2.hanscom.af.mil (gw1.hanscom.af.mil [129.53.1.101])
by ra.mcs.anl.gov (8.8.3/8.8.3) with SMTP id MAA29513 for
<sun-managers@ra.mcs.anl.gov>; Fri, 11 Apr 1997 12:08:20 -0500 (CDT)
Received: from stars1.hanscom.af.mil (stars1.hanscom.af.mil [129.53.46.10])
by hpux2.hanscom.af.mil (8.6.12/8.6.12) with ESMTP id NAA02303 for
<sun-managers@ra.mcs.anl.gov>; Fri, 11 Apr 1997 13:05:33 -0400
Received: from drizzle.tfs.com by stars1.hanscom.af.mil (SMI-8.6/SMI-SVR4)
id MAA22665; Fri, 11 Apr 1997 12:58:51 -0400
Received: from hail.tfs.com by drizzle.tfs.com (SMI-8.6/SMI-SVR4)
id NAA07382; Fri, 11 Apr 1997 13:10:49 -0400
Received: by hail.tfs.com (SMI-8.6/SMI-SVR4) id NAA16395; Fri,
11 Apr 1997 13:10:41 -0400
Date: Fri, 11 Apr 1997 13:10:41 -0400
From: gibian@stars1.hanscom.af.mil (Marc S. Gibian)
Subject: DiskSuite/Multi-pack on Solaris 2.5 configuration
Sender: sun-managers-relay@ra.mcs.anl.gov
To: sun-managers@ra.mcs.anl.gov
Reply-to: gibian@stars1.hanscom.af.mil (Marc S. Gibian)
Message-id: <199704111710.NAA16395@hail.tfs.com>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7bit
Content-MD5: rU3JKLwafniDkewc/H8g+g==
Precedence: bulk
Content-length: 4717
>From gibian@stars1.hanscom.af.mil Fri Apr 11 13:40:38 1997
Followup-to: gibian@stars1.hanscom.af.mil (Marc S. Gibian)
Status: RO

It is Friday afternoon, a good time for a system administrator to dream, and try
to do a little planning while waiting for the inevitable pre-weekend disaster
that is going to make me work all weekend ;-) ...

My server is currently running Solaris 2.5 and DiskSuite (version bundled with
2.5). I have two stripes setup, with the groupings determined by the
characteristics & geometry of the drives. One is constructed from 4 7200 RPM
unipack drives, the second from the classic 1.05 gig standard speed drives. The
first set of disks has a small chunk partitioned out, the combination of which
comprise the mirror for the trans device log. Almost all of my filesystems are
configured to share that single trans log. The remainder of those disks form one
high speed filesystem and the second stripe forms a less speedy filesystem for
non-time critical things that need a large filesystem. In observing the
performance of my server, I suspect that placing the trans log mirror on the
same devices as my high speed stripe is hurting my performance.

Due to the need for more disk space on the server, the 4 7200 RPM unipack drives
are being replaced by a 12 7200 RPM multipack drive along with a new fast wide
SCSI adapter to be dedicated to this multipack. Well, my customer being the U.S.
Air Force, such purchases are never a sure thing until they actually get
delivered, so I am going to assume the order actually makes it all the way
through the purchase process... in time to relieve the space shortage we are
starting to encounter.

This raises a number of questions for me, as I now have to plan how I am going
to organize the configuration of 12 individual drives rather than just my four:

1. Should I combine all of these into one very large stripe / filesystem? What
are the space usage/wastage and performance implications of going that route?

2. Is it better to split things into a few stripes / filesystems? I DO have one
set of data that is very performance critical to the team while the rest of the
data currently on my high performance stripe is less frequently used, though
when used, high performance IS quite desirable. If splitting things up is the
way to go, are there any "best practices" or rules of thumb for finding the
"right" combination?

3. Is there any benefit to splitting the 12 drives into multiple partitions, and
then creating multiple 12 disk/partition stripes/filesystems rather than using
the whole drive and creating a single 12 disk stripe/filesystem?

4. I will still need to find a good place to setup my trans device log mirror,
but if I identically partition all 12 drives, it doesn't make sense to have a 12
replica mirror. The ONE benefit I can see to option 3 is that I could actually
make one of the stripes use fewer than 12 drives, say 9, and then use the
corresponding region on the remaining 3 drives to hold my trans log mirror.

5. With all of this high speed disk space, I would also like to put all of my
swap and tmp space onto the multipack. It currently likes in the default
partition of the boot drive, c0t3d0s1, also shared with /tmp. I don't have much
paging going on as the server has enough RAM to hold the working set most times.
/tmp also does not get a huge amount of usage. It just seems to make sense to
place these areas, which when used can quickly bog down the server, on the
faster disks since I WILL have enough space in the multipack to do it. Again, is
there a prefered way to do this? The multi-12-drive stripe approach I mentioned
above seems to have the benefit of making THIS a bit easier.

6. Given the overall picture, would I be better off using say 10 drives for my
primary stripe/filesystem(s) (depending on whether a single big one or multiple
smaller ones are "better"), and using the remaining two drives for the trans
log, swap, and tmp. Since this is an enormous amount of space for these roles, I
would want to put SOMETHING in the remaining areas of these drives... say my
/opt or /var partitions...?

I realize these are very complex questions, but I am trying to get some feel for
the trade-offs involved. There is administrative ease in reducing the number of
filesystems, but at what cost? A given configuration may offer the greatest
performance, but require many smaller stripes/filesystems and thus more
administration. I think you get the idea. Any advice you can offer would be much
appreciated as making an error and having to reconfigure will carry a very large
time penalty given the amount of storage this involves.

TIA,
Marc

Marc S. Gibian
Telos Comsys phone: (617) 377-6350
PRISM/TFS email: gibian@stars1.hanscom.af.mil

--Boundary_(ID_o/b0OfAJxXhlpJS5UDLciA)
Content-type: MESSAGE/RFC822
Content-description: Mailbox
Content-MD5: qzIuVokl1JYR/QOdF8G7CQ==
X-Sun-Data-type: mail-message

Return-path: <brion@dia.state.ma.us>
Received: from stars1.hanscom.af.mil by drizzle.tfs.com (SMI-8.6/SMI-SVR4)
id OAA09814; Fri, 11 Apr 1997 14:39:42 -0400
Received: from hpux2.hanscom.af.mil by stars1.hanscom.af.mil (SMI-8.6/SMI-SVR4)
id OAA23476; Fri, 11 Apr 1997 14:27:38 -0400
Received: from dia.state.ma.us.dia.state.ma.us
(diasmtp.dia.state.ma.us [146.243.16.71])
by hpux2.hanscom.af.mil (8.6.12/8.6.12) with ESMTP id OAA08864 for
<gibian@stars1.hanscom.af.mil>; Fri, 11 Apr 1997 14:34:16 -0400
Received: by dia.state.ma.us.dia.state.ma.us (SMI-8.6/SMI-SVR4)
id OAA12140; Fri, 11 Apr 1997 14:36:14 -0400
Received: from dia10_1(198.203.237.5) by diasmtp.dia.state.ma.us via smap
(V2.0beta) id xma012136; Fri, 11 Apr 97 14:36:10 -0400
Received: from bri95 (brion95) by dia.state.ma.us (4.1/SMI-4.1)
id AA28513; Fri, 11 Apr 97 14:37:38 EDT
Date: Fri, 11 Apr 1997 14:40:20 -0500
From: brion@dia.state.ma.us
Subject: Re: DiskSuite/Multi-pack on Solaris 2.5 configuration
In-reply-to: <199704111710.NAA16395@hail.tfs.com>
To: gibian@stars1.hanscom.af.mil
Message-id: <9704111837.AA28513@dia.state.ma.us>
MIME-version: 1.0
X-Mailer: Pegasus Mail for Win32 (v2.53/R1)
Content-type: text/plain; charset=US-ASCII
Content-transfer-encoding: 7BIT
Priority: normal
Content-length: 1317
>From brion@dia.state.ma.us Fri Apr 11 14:39:43 1997
Comments: Authenticated sender is <brion@dia10>
Status: RO

Marc,

I am in the processes of specing a new server for my agency. So, I
thought I would share my thoughts with you. I'm looking to build a
configuration to support an Oracle database in the 6 GB range.
I'm using a 'rule of thumb' of multiple spindles (disks) across
multiple controllers to build a fast disk system.

The Fast/Wide controller has a max rate of 20MB/sec. The 7200 RPM
drives run in the 3-5MB/sec range. Four drives running concurently
will saturate the controller. I would build my high-performance
data set as a four disk stripe. The remainder of the multipack I
would build into concatenated drives. The concatenated drives
will allow high speed access to their data without being able to
generate more than 5MB/sec traffic. This will keep the performance
of the four disk stripe from slowing down too much.

In our case I'm looking at two Fast/Wide controllers and two multi-
packs with 4 disks each (or 4 single-packs). For OS and non-database
filesystems I'll pickup a couple of single packs and hang them off
a third controller.

I'm not familiar with the 'translog' - but if it records changes to
the filesystem or a database - I would put it on at least a disk
separate from the data-stripe if not on a separate disk on a separate
controller.

Brion Leary <brion@dia.state.ma.us>

--Boundary_(ID_o/b0OfAJxXhlpJS5UDLciA)
Content-type: MESSAGE/RFC822
Content-description: Mailbox
Content-MD5: 8GUvuOL06CruIh2kYQXmgQ==
X-Sun-Data-type: mail-message

Return-path: <dano@nodewarrior.net>
Received: from stars1.hanscom.af.mil by drizzle.tfs.com (SMI-8.6/SMI-SVR4)
id OAA18265; Sat, 12 Apr 1997 14:43:04 -0400
Received: from hpux2.hanscom.af.mil by stars1.hanscom.af.mil (SMI-8.6/SMI-SVR4)
id OAA05572; Sat, 12 Apr 1997 14:31:00 -0400
Received: from icarus.nodewarrior.net (icarus.nodewarrior.net [206.117.97.4])
by hpux2.hanscom.af.mil (8.6.12/8.6.12) with ESMTP id OAA02120 for
<gibian@stars1.hanscom.af.mil>; Sat, 12 Apr 1997 14:37:38 -0400
Received: from [207.217.8.20] ([207.217.7.188]) by icarus.nodewarrior.net
(post.office MTA v2.0 0813 ID# 0-13116) with ESMTP id AAB6915 for
<gibian@stars1.hanscom.af.mil>; Sat, 12 Apr 1997 11:37:07 -0700
Date: Sat, 12 Apr 1997 11:37:07 -0700
From: dano@nodewarrior.net (Dan Goldman)
Subject: Re: DiskSuite/Multi-pack on Solaris 2.5 configuration
In-reply-to: <199704111710.NAA16395@hail.tfs.com>
X-Sender: dano@icarus.nodewarrior.net
To: gibian@stars1.hanscom.af.mil (Marc S. Gibian)
Message-id: <v03007807af75235e51cc@[207.217.8.20]>
MIME-version: 1.0
Content-type: text/plain; charset=us-ascii
Content-length: 6342
>From dano@nodewarrior.net Sat Apr 12 14:43:05 1997
Status: RO

It sounds like you are only stripping, not striping and mirroring. If so,
I would recomend stripping and mirrioring which of-course would require a
second 12-pack and additional scsi controller. Otherwise, keep in mind
that since you are stripping only, any one drive going bad would take down
the whole filesystem. If $$$ is not an issue, mirror accross 2 scsi busses
on 2x12-packs while stripping.

If you take my recommendation of miroring, you may want to devide up the
12-pack into smaller partitions so that when and if you need to do a
mettasync, you dont have to rebuild all 12 drives. I recomend either a 3
drive stripe or a 4 drive stripe. Worse case you would have to mettasync 4
drives. (of-course this is only if you are mirrioring) If you are only
stripping, then you only have to restore from tape 4 or 3 drives instead of
all 12.

Reagrding the swap and tmp, I would not put them on the 12-pack because if
the 12-pack goes down, you probably would not be able to bootup since swap
and /tmp no longer exist. Keep it simple and leave the swap and tmp on the
internal.

Reagrding the trans log, if you have the ability to put the trans log on a
seperate scsi bus, I would do that. Otherwise, atleast keep it on a
separate disk and hopefully mirrored.

dano

At 1:10 PM -0400 4/11/97, Marc S. Gibian wrote:
>It is Friday afternoon, a good time for a system administrator to dream,
>and try
>to do a little planning while waiting for the inevitable pre-weekend disaster
>that is going to make me work all weekend ;-) ...
>
>My server is currently running Solaris 2.5 and DiskSuite (version bundled
>with
>2.5). I have two stripes setup, with the groupings determined by the
>characteristics & geometry of the drives. One is constructed from 4 7200 RPM
>unipack drives, the second from the classic 1.05 gig standard speed
>drives. The
>first set of disks has a small chunk partitioned out, the combination of
>which
>comprise the mirror for the trans device log. Almost all of my filesystems
>are
>configured to share that single trans log. The remainder of those disks
>form one
>high speed filesystem and the second stripe forms a less speedy filesystem
>for
>non-time critical things that need a large filesystem. In observing the
>performance of my server, I suspect that placing the trans log mirror on the
>same devices as my high speed stripe is hurting my performance.
>
>Due to the need for more disk space on the server, the 4 7200 RPM unipack
>drives
>are being replaced by a 12 7200 RPM multipack drive along with a new fast
>wide
>SCSI adapter to be dedicated to this multipack. Well, my customer being
>the U.S.
>Air Force, such purchases are never a sure thing until they actually get
>delivered, so I am going to assume the order actually makes it all the way
>through the purchase process... in time to relieve the space shortage we are
>starting to encounter.
>
>This raises a number of questions for me, as I now have to plan how I am
>going
>to organize the configuration of 12 individual drives rather than just my
>four:
>
>1. Should I combine all of these into one very large stripe / filesystem?
>What
>are the space usage/wastage and performance implications of going that route?
>
>2. Is it better to split things into a few stripes / filesystems? I DO
>have one
>set of data that is very performance critical to the team while the rest
>of the
>data currently on my high performance stripe is less frequently used, though
>when used, high performance IS quite desirable. If splitting things up is the
>way to go, are there any "best practices" or rules of thumb for finding the
>"right" combination?
>
>3. Is there any benefit to splitting the 12 drives into multiple
>partitions, and
>then creating multiple 12 disk/partition stripes/filesystems rather than
>using
>the whole drive and creating a single 12 disk stripe/filesystem?
>
>4. I will still need to find a good place to setup my trans device log
>mirror,
>but if I identically partition all 12 drives, it doesn't make sense to
>have a 12
>replica mirror. The ONE benefit I can see to option 3 is that I could
>actually
>make one of the stripes use fewer than 12 drives, say 9, and then use the
>corresponding region on the remaining 3 drives to hold my trans log mirror.
>
>5. With all of this high speed disk space, I would also like to put all of my
>swap and tmp space onto the multipack. It currently likes in the default
>partition of the boot drive, c0t3d0s1, also shared with /tmp. I don't have
>much
>paging going on as the server has enough RAM to hold the working set most
>times.
>/tmp also does not get a huge amount of usage. It just seems to make sense to
>place these areas, which when used can quickly bog down the server, on the
>faster disks since I WILL have enough space in the multipack to do it.
>Again, is
>there a prefered way to do this? The multi-12-drive stripe approach I
>mentioned
>above seems to have the benefit of making THIS a bit easier.
>
>6. Given the overall picture, would I be better off using say 10 drives
>for my
>primary stripe/filesystem(s) (depending on whether a single big one or
>multiple
>smaller ones are "better"), and using the remaining two drives for the trans
>log, swap, and tmp. Since this is an enormous amount of space for these
>roles, I
>would want to put SOMETHING in the remaining areas of these drives... say my
>/opt or /var partitions...?
>
>I realize these are very complex questions, but I am trying to get some
>feel for
>the trade-offs involved. There is administrative ease in reducing the
>number of
>filesystems, but at what cost? A given configuration may offer the greatest
>performance, but require many smaller stripes/filesystems and thus more
>administration. I think you get the idea. Any advice you can offer would
>be much
>appreciated as making an error and having to reconfigure will carry a very
>large
>time penalty given the amount of storage this involves.
>
>TIA,
>Marc
>
>Marc S. Gibian
>Telos Comsys phone: (617) 377-6350
>PRISM/TFS email: gibian@stars1.hanscom.af.mil

<><><><><><><><><><><><><> Cut Here <><><><><><><><><><><><><><><>
Daniel H. Goldman

NodeWarrior Networks, Inc
mailto:dano@nodewarrior.net
http://www.nodewarrior.net
<><><><><><><><><><><><><> Cut Here <><><><><><><><><><><><><><><>

--Boundary_(ID_o/b0OfAJxXhlpJS5UDLciA)
Content-type: MESSAGE/RFC822
Content-description: Mailbox
Content-MD5: /PVaiOIXLZcEAUSlvkf1Dg==
X-Sun-Data-type: mail-message

Return-path: <birger@Vest.Sdata.No>
Received: from stars1.hanscom.af.mil by drizzle.tfs.com (SMI-8.6/SMI-SVR4)
id DAA17258; Mon, 14 Apr 1997 03:57:14 -0400
Received: from hpux2.hanscom.af.mil by stars1.hanscom.af.mil (SMI-8.6/SMI-SVR4)
id DAA23513; Mon, 14 Apr 1997 03:45:08 -0400
Received: from Vest.Sdata.No (ulriken.vest.sdata.no [193.216.10.1])
by hpux2.hanscom.af.mil (8.6.12/8.6.12) with ESMTP id DAA14763 for
<gibian@stars1.hanscom.af.mil>; Mon, 14 Apr 1997 03:51:28 -0400
Received: by Vest.Sdata.No (SMI-8.6/SMI-SVR4) id JAA24942; Mon,
14 Apr 1997 09:54:16 +0200
Date: Mon, 14 Apr 1997 09:54:16 +0200
From: birger@Vest.Sdata.No (Birger A. Wathne)
Subject: Re: DiskSuite/Multi-pack on Solaris 2.5 configuration
To: gibian@stars1.hanscom.af.mil
Message-id: <199704140754.JAA24942@Vest.Sdata.No>
MIME-version: 1.0
Content-type: TEXT/PLAIN; CHARSET=US-ASCII
Content-length: 1452
>From birger@Vest.Sdata.No Mon Apr 14 03:57:17 1997
Status: RO

There is one very important issue that some people forget completely when
discussing partition sizes: Restore time.
One of my customers had been told (by a different vendor... puh....)
that with RAID5 there was no danger with large file systems.
And then they got a file system corruption on a 60GB file system.
RAID5 doesn't help much against software corruption, so they had
to run a 2 day restore.

Check the actual throughput of your backup system, and think about
how much downtime you can afford if you loose the whole file system.

This customer now runs 20GB stripe sets with most stripes set up
with only one partition. With this setup they have all raid sets
striped across all available SCSI channels on their RAID controller.

If you want to run stripes over several disks on a single SCSI chain,
you should consider the maximum available throughput on the SCSI
chain as well as the disks maximum sustained throuhput. Having
a stripe with a potential aggregated bandwith much larger than the
bandwith of the bus may not be productive if the data can be accessed
sequentially.

Also, having different file systems with random access I/O on the same raid
set can give you a lot of wait time, as they compete for the same
set of read/write heads. If you have several 'hot' areas for
random access I/O that are used simultaneously, if would be best to
have them on separate disks so the read/write heads can stay
within the same area.

Birger

--Boundary_(ID_o/b0OfAJxXhlpJS5UDLciA)--