Here's the basic set of answers, followed by the respondant's names,
then the original quesion. If we resolve the problem itself, I'll
forward that in a followup Summary.
(in no particular order)
--------------------------suggestions---------------------------------
[Use system logging to check ofr anomalies]
[Use extended tools to monitor system condition]
May be you can look into syslog ,dmesg,/var/adm/messages
for any unusual thing happening with your system (disk
error or le0 errorrs )check the daemons ( using top or
similar software tools)to check any process occupying
good amount of CPU or memory. (tryout vmstat,iostat and
nfstat).
--[Verify LAN condition, connections.] [Observe changes by moving suspect to new/different subnet]
Initial checks would be probably to check the number of workstations on this sub-net & see how any other WS on the same subnet performs,like the collision rate..& if there are more that 50-60 WS than try moving this system to a different subnet & see the performance.
As per the data shown here the collision rate is something like 5% which is OK.A saturated network is like 8% or more.
--[Verify LAN condition] [Review CERT warnings for potential compromise signatures/responses]
Have you investigated the condition of your network ? Excessive collisions are usually an indication of network hardware problems, and/or some of the denial-of-service attacks that have been mentioned in recent CERT advisory messages.
--[Error Rate is more important than Collision Rate] [Investigate possible external sources of error count/collisions.]
More important than the collisions are the errors. These generally indicate a bad network card though not necessarily on the Solaris system. Under normal circumstances, you should not see any errors. Hopefully someone else will have a suggestion as to how to isolate which system.
--[R&R (remove/replace) I/F Card]
This sounds suspiciously like your network card is failing. I'd replace that if I were you. I bet it fixes the problem.
--[External influence, killing system.]
It looks like someone else on your network is behaving badly; that many errors, in surges, looks like some other node is periodically dropping a whole lot of packets on the net and not caring about congestion.
Suggestion : use etherfind to look for a node sending out broadcasts. If you can get a sniffer or a copy of 'etherman', start it up and watch for surges.
--[Verify/install PATCH]
Perhaps patch 102430-02 might help.
Patch-ID# 102430-02 Keywords: macio le hard hang FSBE ss5 ss10 rdump sun4m Synopsis: SunOS 4.1.4: le patch that fixes sun4m ethernet hang problems Date: Jul/26/95
--[Observe error rate.] [Suspect induced problem, not inherent problem]
The collisions may or may not be a problem. Don't get fixated on that. However your errs count is very, very scary. You should not be within a factor of 100 of that!
I don't know anything about what else is on your network, but it is unlikely that there is anything wrong with your SS20. Far more likely that you have a problem with cabling, or with hubs, or with a bad NIC on a PC somewhere. What's changed recently?
Try etherfind on a couple of different hosts (snoop is better if you have it), look for unexpected traffic. Pull and re-seat everything. Is the problem isolated to this one host? Swap all networking HW related to it.
--Additional note: UNIX GURU Mailing List posted a "Tip of the Day" last week that applies here:
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
UNIX GURU UNIVERSE UNIX HOT TIP
Unix Tip #372- January 7, 1998
http://www.ugu.com/sui/ugu/show?tip.today =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
TALKING NFS 3 WHEN I ONLY TALK NFS 2
Most new versions of NFS are now talking NFS 3. With many systems still talking NFS 2, the newer system will eventually timeout if a mount from and NFS 2 system is attempted.
The NFS 3 system will eventually fallback to the NFS 2 protocol, but to make life easier and quicker, especially when booting and mounting these types of filesystems add the following create your mount points with the "nfs2" entry added to the /etc/fstab or the /etc/vfstab file:
#============================================================= # filesystem directory type options frequency pass #============================================================= foo:/usr3 /usr3 nfs2 rw,bg,hard,intr 0 0
>From a shell the line would read:
# mount -t nfs2 foo:/usr3 /usr3
------------------------------------------------------------------------ To unsubscribe to this list, mail to tips@ugu.com Subject: unsubscribe tips ======================================================================== ---------------------------respondants------------------------------
Thanks again all!
Jerome A Joseph j_alphonse@hotmail.com Chenthil Kumar chenthil@lucent.com Ronald Loftin reloftin@mailbox.syr.edu Harry Levinson levinson@ll.mit.edu Erwin Fritz efritz@glja.com http://www.glja.com John Reynolds reynolds@informix.com Mark Henderson mch@squirrel.com> Jay Lessert jay_lessert@latticesemi.com
---------------------------Original quesion:------------------------
Jim Harmon wrote: > > Running SunOS 4.1.4 on SPARC20, 2 CPU, 128MB Memory. > DNS/NIS Host. > > 3 times in the last 2 days our network has crawled to a stop. > > The first two times it seemed to take a reboot to clear the network > problems, the last time, with some additional research in the Solaris > 2.6 Answer Book, we found a few things to check that led us to > > ifconfig le0 down > ifconfig le0 up > > Which cleared the net and allowed rebooting. > > Now, watching with > > netstat -i <#> (using # = 5 sec interval) > > I'm seeing collisions appearing at a larger rate than typical. > (Normally we only see about 1-2% collisions over 2-3 weeks.) > > Now I'm getting: > > input (le0) output input (Total) output > packets errs packets errs colls packets errs packets errs colls > 105649 8 144505 618 8688 119358 8 158214 618 8688 > 117 2 79 9 30 117 2 79 9 30 > 67 0 51 7 35 67 0 51 7 35 > 48 2 34 10 29 48 2 34 10 29 > 35 0 31 10 25 35 0 31 10 25 > 105 0 82 16 56 111 0 88 16 56 > 154 0 149 3 52 160 0 155 3 52 > 103 0 129 2 67 124 0 150 2 67 > 277 0 220 1 94 461 0 404 1 94 > 91 0 54 2 24 97 0 60 2 24 > 198 0 267 0 68 198 0 267 0 68 > 113 0 71 0 19 119 0 77 0 19 > 122 0 144 0 28 128 0 150 0 28 > 209 0 185 0 48 211 0 187 0 48 > 183 0 155 0 48 193 0 165 0 48 > 164 0 206 1 34 166 0 208 1 34 > 123 0 149 0 28 127 0 153 0 28 > 184 0 180 0 40 190 0 186 0 40 > 92 0 58 3 13 98 0 64 3 13 > 96 0 65 8 47 98 0 67 8 47 > > every 2-3 screens, where the rest of the time I'm getting 1 or 2 > collisions per screen. > > With this going on, my access to search the archives is limited (When > this host hangs, our entire net hangs). > > Here's my kernal info: > > *************** showrev version 1.15 ***************** > * Hostname: "<system>" > * Hostid: "<xxxxxxxx>" > * Kernel Arch: "sun4m" > * Application Arch: "sun4" > * Kernel Revision: > 4.1.4_DBE1.4 (<SYSTEM>) #5: Mon Mar 24 15:43:35 EST 1997 > * Release: 4.1.4_DBE1.4 > ******************************************************* > > Can anyone suggest any patches I may need or what else I can check to > find the source of this problem? I'm looking everywhere, but some > pointers would be greatly appreciated! > > TIA > > -- > Jim Harmon The Telephone Connection > jim@telecnnct.com Rockville, Maryland
-- Jim Harmon The Telephone Connection jim@telecnnct.com Rockville, Maryland