SUMMARY: UE 6000 250 MHz CPU failures on boot

GECFS Sun-list (sunlist@bofh.fleet.capital.ge.com)
Tue, 29 Apr 1997 21:50:06 +0000 (GMT)

My original issue was:

> We recently brought in another UE 6000 with 250 MHz CPUs, and began having
> some problems with 1-2 CPUs failing upon boot. I called in SunService,
> who replaced them. Then, a couple different CPU's failed upon boot. The
> FE checked around, found out that a flash-prom patch (103346) needed to be
> applied. I did that yesterday.
>
> Today, upon reboot--twice--the same CPU's failed.
>
> However, the funny thing is that mpstat shows all 8 CPU's functioning.
> prtdiag's (patched to 104595) diagnostic listing shows them as failed,
> although it's config listing shows them as there.
>
> I can't believe that--out of the 10 total CPU's this system has had (8 w/
> 2 replacements)--4 have been toast. So, I'm looking for alternative
> explainations, such as an error in PROM which would provide a
> false/misleading error message, or another patch which would be required.
>
> Ideas?
>

Thanks all who replied, specifically:

Jeff Wasilko
James Wendling
Brett Lymn
Casper Dilt
Justin Young

Overall the answers ranged from the flashprom upgrade mentioned above, to
the following message from Jeff Wasilko:

////////////////////////////////////
Mark:

I work for Sun in Boston.

The most common cause for what looks like 'failed' CPUs are
improper installation of the modules on the board. The Mezcon
connector on the module needs to be installed with a torque
wrench to 6 inch-pounds of tourque. The screws also need to be
tightened in a specific order.

Were the CPU modules preinstalled, or did you install them
yourself, or did a Sun FE install them?

The Sun FE should be using a torque wrench during replacement. The
wrenches are stocked in the parts depots and can be ordered along
with the replacement CPU.

If you're getting failures on the same CPU board, it's possible
you have a marginal board.

The patch 103346-05 really doesn't address any problems with
flagging good cpus as bad (I've never seen this happen, unless
the CPU was loose on the CPU board). Rather, alot of work was
done to make the POST more robust in dealing with strange failure modes.

Jeff
///////////////////////////////

Between my initial posting and this summary, I received a few error
messages, which I copied and showed the FE when he showed up today. In
examining the errors, he's led to believe that it's actually some bad
memory. We'll be running VTS tomorrow.

Once again thanks..
-----------------------------------------------------------------------------
Mark P. Beckman |"Oh yes. Those are the days when you
Technical Systems Analyst/SysAdmin | want to head back to Iowa, and become
GE Capital Fleet Services | a Pioneer seedcorn salesman."
beckman@bofh.fleet.capital.ge.com | --Dave Durnbaugh, former co-worker
My opinions/comments...Not GE's.