SUMMARY: SS5 panic: asynchronous memory fault

Andy Gay (andy_gay@VNET.IBM.COM)
Wed, 28 May 1997 18:13:44 +0100 (BST)

Question was:

> I'm wondering if this is hardware or software.
> We're seeing several crashes a week recently on a SS5 clone
> (made by Solair), 64M mem, running 2.5, no patches, with:

> panic: asynchronous memory fault: MFSR=81802040 MFAR=8b26180

> in the messages file. I'm intending to get the patches up to
> date but if there's a harware problem I'd want to get it
> fixed first.

I had a handful of responses, everyone reckons it's hardware.
(I did too but was hopeful there just may be a software issue.
Getting the hardware fixed is going to be a pain!).

Apparently this is fairly common and due to marginal memory
SIMMs. Scott MacDonald says you can tell which SIMM from the
numbers in the panic message but he'd lost his reference sheet.
If anyone reading this knows how to decode the message I'd
appreciate a note.

I'm wondering if this may even be a design flaw with the SS5
systems. Eric S Johnson mentioned an interesting and worrying
anecdote, apparently he sees this occasionally and tried swapping
SIMMs between systems. This invariably makes the problem go away
for a few weeks until it returns on the *original* machine!
Makes you wonder...

Several people reminded me how to run the memory tests - thanks.
Problem is that this system is at a remote site so I'll need
to make a special trip. So it goes.

Regards to y'all -

--
Andy Gay, IBM Global Services (Network Services)  AIX/Unix support
Phone (44) (0)1705 568396, email andy_gay@vnet.ibm.com