SUMMARY: Performance of a single job in a multi-CPU environment

Zina Yung (zina@cs.ust.hk)
Thu, 13 Mar 1997 15:39:05 +0800 (HKT)

Hello SUN managers:

******************************************************************
My original posting:

Can one get any speed gain from the dual-CPU config on a single
*ordinary* job? I.e., assuming no attempt has been made by a
programmer to write a parallel program, and you run just one single
job, does the system automatically do any low-level parallelizing
to speed up your single job? If so, what kind of speed gain could
one reasonably hope to get? (I realize that any speed gain would
be highly dependent on the nature of the job, but just want a rough
idea.)

*****************************************************************
My summary:
Thanks for all those replied! The majority said there is NO gain
unless:
. you use parallelizing compiler such as SUN's Impact.
. use some parallel virtual machine (e.g. PVM). This will
allow your source to be modified to treat each processor as
a distributed node.
. the program does some forking, or spawning of other jobs.

Many thanks to:
Bert N. Shure
Michael R. Zika
Matthew Stier
Rich Kulawiec
Kai O'Yang
James H. McG. Sibl
Greg Price
Peter Bestel
Kevin Sheehan
Alex Finkel
Marc S. Gibian
Jay Lessert

*****************************************************************
Details of the responses:

From: "Bert Shure" <bert@virtual.com>

no luck unless the application is written to take advantage of multiple
cpu's.

the only winning aspect of the two processor system is that one
processor could use all the memory if nothing else is happening on the
system.

-------------------------------------------------------------------
From: "Michael R. Zika" <zika@oconto.tamu.edu>

No.
None.
If you're looking at a parallel applications, I would recommend one of
the following:

o Purchase the iMPact toolkit with the SunSoft compilers. This will
let you code shared-memory parallel directives into your code.
I've used these and had good success with some small real world
applications. It also provides and auto parallelization option.

o Install PVM (Parallel Virtual Machine) or MPI (Message Passing
Interface). This will allow your source to be modified to treat
each processor as a distributed node. This allows you to treat
your shared memory machine as a distributed memory machine (if
that's what you want)

In both cases, source code modifications will be required.
-------------------------------------------------------------------
From: "Matthew Stier" <mstier@hotmail.com>

1) There never is just 'one' job running on a computer.

2) If the program does any forking, or spawning of other jobs, you can expect
some improvement.

3) You don't necessarily need multi-threading, to get a performance boost from
a multi-processor computer.

-------------------------------------------------------------------
From: Rich Kulawiec <rsk@itw.com>

The answer to your question is "no". The only thing that having a dual-CPU
config will buy you in that case is the ability to run two jobs on the
same machine at the same time with not much of a performance loss over
running those two jobs on two single-CPU machines at the same time.
(In fact, if they share the same large text (executable) space then
the total amount of RAM you need will be less.)

But unless you explicitly code the threads into your application,
you won't get as much back as you might hope.

If, BTW, your project work consists of numerous large-but-similar jobs,
e.g. multiple simulations trials, etc., then it's probably worth your
time to check out Sequent's multiprocessor machines. I've worked with
them on and off since the beta-test days of the original Balance, and
I've always been impressed with the robustness and scalability of their
offerings. It might well be that your needs are better addressed by
one large 16- or 32-processor machine, which would (today) allow you
to combine resources such as disk and RAM and would (tomorrow) give you
a *huge* win when you eventually do re-code your applications to use
multiple execution threads.
-------------------------------------------------------------------
From: oyang@mars.fcit.monash.edu.au (Kai O'Yang)

The simple answer is no. However, if your code uses things like LinPack, LAPa
ck,
BLAS, you can get parallel gains by linking with Sun's Performance Library.

-------------------------------------------------------------------
From: "James H. McG. Sibley" <jims@chat.freezone.com>

From my experience observing our newly acquired Ultra 2 dual sparc
machine, the answer is no. I've run some intensive jobs and what I
notice happening is that the job completely occupies one processor
leaving the other one free. The load goes to 1 but the system is still
50% idle.

I might be totally wrong about this, but this is the behavior I've
observed. If you find out any differently, please let me know.

-------------------------------------------------------------------
From: Greg Price <greg@defcen.gov.au>

You won't see a blinding speedup, unless the compiler has some autotasking
tricks built into it. You should see a bit of a gain (dependent on what the
app does) based on the fact that the kernel could run on the other cpu and
so disk and other thing while the app runs...

-------------------------------------------------------------------
From: Peter Bestel <peter.bestel@uniq.com.au>

You can get some performance improvment if the job is truly
CPU bound on an MP system. Using something like the proctool
utility or a system call via processor_bind(2), you can force
the CPU bound job onto a single processor. Then use priocntl(2)
to make the time slice for the process class large or change
the priority of the process. Another option, as long as there is
_only_ one of these guys on an MP system, is to place the process
into the RealTime scheduling class. You don't want multiple CPU
bound processes in the RT class - you'll kill off everything else
on the system.

The benefits here are that the OS and associated processes continue
to use one processor, ensuring throughput on the system. The CPU
bound job runs flat-out on the other processor. However, if you
have lots of disk wait cycles associated with the process, the
benefits may not be as great. Really depends on the exact behaviour
of this hefty process. An MP system _could_ be most useful here.
-------------------------------------------------------------------
Fromi: Kevin Sheehan kevin@uniq.com.au Mon Mar 10 07:29:15 1997

The system doesn't do any optimization of you job per se. Howerver, there
*is* another CPU available for *other* stuff (like the window system,
daemons &c) so your job benefits from that.

One note I will make - if your job id I/O intensive with disk, then multi
threading and using mmap() will make a *huge* difference. mmap() is vastly
more efficient in terms of access and page use, and multi-threading means
while one thread is waiting for I/O, another thread can happily be executing.

-------------------------------------------------------------------
From: Alex Finkel <afinkel@pfn.com>

If the process has been written multi-threaded, the OS can schedule the
threads individually on any available processor. Since multi-threading is
beneficial to performance even on single-cpu systems, it is likely that the
code is at least split into some number of threads... but then again, I
would not assume.

You will see some improvement with 2 CPUs as the OS still has to share the
processors, even if you only run this one job by itself.

If you have proctool, you can examine the process while it is running to
see if it employs multiple threads.
-------------------------------------------------------------------
From: gibian@stars1.hanscom.af.mil (Marc S. Gibian)

Yes, you can get performance improvements on a dual CPU machine running Solaris
2.5 or higher even without special coding of your application. This is because
the standard libraries in Solaris are multithreaded. Also, there may be
background activity on your machine and that can run in parallel with your
application rather than interrupting it.

Just how much improvement you will see has a lot to do with the amount of time
spent in the system libraries that are multithreaded and the number of processes
that want to run at the same time.
-------------------------------------------------------------------
From: Jay Lessert <jayl@latticesemi.com>

No. However, you can run two such jobs at the same time without
degrading the performance of either.

-------------------------------------------------------------------

Zina

--------------------------------------------------------------------------
Zina Yung
Computer Science Department
Hong Kong University of Science and Technology
Clear Water Bay,
Hong Kong