SUMMARY: Major Network Trouble

Leo Crombach (lcrombach@tropel.com)
Sat, 14 Jun 1997 09:57:13 -0400

My Original Posting:

This has been a bizarre week so far. First, last week (Friday) our senior
sysadmin quit for a job in another state leaving me (a relative rookie) to
take care of our network. Wouldn't you know I come in Monday morning to two
crashed systems - one a server for our CAD workstations. Then, today, we
have been having major network difficulties. Everything is very slow and
some of the servers are generating the infamous "le0: No carrier - cable
disconnected or hub link test disabled? In addition, on one of our Sparc
330 (SunOS 4.1.1) servers I have noticed the following message:

le0: Receive: giant packet from 0:a0:40:3:c6:ca
le0: Receive: STP in rmd cleared

The ethernet address is not constant. I actually noticed this message first
a few weeks ago but it did not seem to be a problem at the time - and it is
infrequent. I was able to determine which machine corresponded to the above
ethernet address (a MacIntosh) but removing it from the network did not help.

Using snoop with no options I have noticed the following message appearing
frequently:

hrisc34 -> odin RPC R XID=1 Program unavailable (low=1, high=2)
odin -> hrisc34 RPC R XID=1 Program unavailable (low=2, high=1)
hrisc34 -> odin RPC R XID=1
odin -> hrisc34 RPC R XID=1 Program unavailable (low=1, high=1)
hrisc34 -> odin RPC R XID=1 Program unavailable (low=2, high=2)

The constant here is odin. Also, if I run snoop -v broadcast it will die
out with either "Segmentation Fault(coredump)" or "Bus Error(coredump)"
depending on which machine I run it on. However, it always dies at the
following point:

RIP: ----- Routing Information Protocol -----
RIP:
RIP: Opcode = 2 (route response)
RIP: Version = 1
RIP:
RIP: Address Port Metric
RIP: 192.10.10.0 metnet 0 1

Bus Error(coredump)

I searched the Sun Manager archives and found many similar postings;
however, it seemed that in most cases the problem was solved in different
ways. I do not have a network sniffer - all I have is snoop. Any
suggestions for tracking this problem down would be greatly appreciated.

********************************************************************************

The general consensus was that there is a problem at the physical layer: a
bad transceiver, ethernet card, wire/cable, hub, etc. The responses I
received are supported by many previous postings in the sun managers
archives (www.latech.edu/sunman.html). Some other suggestions were to
contact www.cert.org and to apply patch 101954-07 on the SunOS 4.1.1 machine
to correct the "giant packet" error message.

After everyone went home for the day, I began my search to try and determine
where the culprit was located by first removing entire segments from the
network and then working my way down to individual machines. I ended up
replacing one transceiver, two data cables, resetting the main hubs, and
rebooting several servers. Also, I found a reference in the archives about
SQE (Signal Quality Error) switches on the transceivers. We have a few
transceivers from ACSYS with this feature so I checked to verify that the
switch was set correctly. Everything seems to be o.k. right now but, I
really don't know if any of the above items were responsible or not. I'm
sure I'll found out if the trouble-maker is still out there.

Special thanks to all the following individuals for their suggestions:

David Fetrow
Erwin Fritz
Bruce Cheng
Aggeliki Karabas
Gnuchev Fedor
robin.landis@imail.exim.gov
Bismark Espinoza
Raymond Fagnon
K.Ravi
****************************************************************

Leo Crombach
System Administrator
Tropel Corporation
60 O'Connor Road
Fairport, New York 14450
(716)388-3566
lcrombach@tropel.com

****************************************************************