This has been a bizarre week so far.  First, last week (Friday) our senior
sysadmin quit for a job in another state leaving me (a relative rookie) to
take care of our network.  Wouldn't you know I come in Monday morning to two
crashed systems - one a server for our CAD workstations.  Then, today, we
have been having major network difficulties.  Everything is very slow and
some of the servers are generating the infamous "le0: No carrier - cable
disconnected or hub link test disabled?  In addition, on one of our Sparc
330 (SunOS 4.1.1) servers I have noticed the following message:
        le0: Receive: giant packet from 0:a0:40:3:c6:ca
        le0: Receive: STP in rmd cleared
The ethernet address is not constant.  I actually noticed this message first
a few weeks ago but it did not seem to be a problem at the time - and it is
infrequent.  I was able to determine which machine corresponded to the above
ethernet address (a MacIntosh) but removing it from the network did not help.
Using snoop with no options I have noticed the following message appearing
frequently:
        hrisc34 -> odin      RPC R XID=1 Program unavailable (low=1, high=2)
        odin -> hrisc34      RPC R XID=1 Program unavailable (low=2, high=1)
     hrisc34 -> odin         RPC R XID=1
        odin -> hrisc34      RPC R XID=1 Program unavailable (low=1, high=1)
     hrisc34 -> odin         RPC R XID=1 Program unavailable (low=2, high=2)
The constant here is odin.  Also, if I run snoop -v broadcast it will die
out with either "Segmentation Fault(coredump)" or "Bus Error(coredump)"
depending on which machine I run it on.  However, it always dies at the
following point:
        RIP:  ----- Routing Information Protocol -----
RIP:  
RIP:  Opcode = 2 (route response)
RIP:  Version = 1
RIP:  
RIP:  Address                        Port   Metric
RIP:  192.10.10.0     metnet           0     1
 
Bus Error(coredump)
I searched the Sun Manager archives and found many similar postings;
however, it seemed that in most cases the problem was solved in different
ways.  I do not have a network sniffer - all I have is snoop.  Any
suggestions for tracking this problem down would be greatly appreciated.
********************************************************************************
The general consensus was that there is a problem at the physical layer: a
bad transceiver, ethernet card, wire/cable, hub, etc.  The responses I
received are supported by many previous postings in the sun managers
archives (www.latech.edu/sunman.html).  Some other suggestions were to
contact www.cert.org and to apply patch 101954-07 on the SunOS 4.1.1 machine
to correct the "giant packet" error message.
After everyone went home for the day, I began my search to try and determine
where the culprit was located by first removing entire segments from the
network and then working my way down to individual machines.  I ended up
replacing one transceiver, two data cables, resetting the main hubs, and
rebooting several servers.  Also, I found a reference in the archives about
SQE (Signal Quality Error) switches on the transceivers.  We have a few
transceivers from ACSYS with this feature so I checked to verify that the
switch was set correctly.  Everything seems to be o.k. right now but, I
really don't know if any of the above items were responsible or not.  I'm
sure I'll found out if the trouble-maker is still out there.
Special thanks to all the following individuals for their suggestions:
David Fetrow
Erwin Fritz
Bruce Cheng
Aggeliki Karabas
Gnuchev Fedor
robin.landis@imail.exim.gov
Bismark Espinoza
Raymond Fagnon
K.Ravi
****************************************************************
Leo Crombach
System Administrator
Tropel Corporation
60 O'Connor Road
Fairport, New York 14450
(716)388-3566
lcrombach@tropel.com
****************************************************************