first of all, I am very grateful to the following people for their help and advice:
Rich Kulawiec <rsk@gsp.org>
Toby Creek <creek@aur.alcatel.com>
Karl Vogel <vogelke@c17mis.region2.wpafb.af.mil>
<oyeyemi_ade@jpmorgan.com>
Joe Yao <jsdy@gwyn.tux.org>
My original post:
++++++++++++++++++++++++++++++++++++++++++++++++++++
we have recently experienced multiple problems with our home directories' disk.
Here is a sample fsck output:
# fsck /dev/rdsk/c0t2d0s6
** /dev/rdsk/c0t2d0s6
** Last Mounted on /export/home/d
** Phase 1 - Check Blocks and Sizes
** Phase 2 - Check Pathnames
DIRECTORY CORRUPTED I=30721 OWNER=root MODE=40755
SIZE=1536 MTIME=Aug 22 14:53 1995
SALVAGE? y
MISSING '.' I=30721 OWNER=root MODE=40755
SIZE=1536 MTIME=Aug 22 14:53 1995
CANNOT FIX, FIRST ENTRY IN DIRECTORY CONTAINS java_help
MISSING '..' I=30721 OWNER=root MODE=40755
SIZE=1536 MTIME=Aug 22 14:53 1995
FIX? y
I was using clri and then fsck to repair the filesystem. The problems is that
we are getting more and more of these errors, and the filesystem repair
becomes my evening occupation. I'd appreciate any opinions on why
this might be happening.
Here is the setup: 9G scsi disk attached to an IPC (Solaris 2.5.1)
(hostname newton).
The disk is exported to the rest of the network; the options are:
# niscat auto_master.org_dir
/fileservers/newton auto_newton -rw,intr,nosuid,retry=10,timeo=200
/- auto_direct -rw,intr,nosuid,retry=10,timeo=200
/fileservers/usr auto_local -rw,intr,nosuid,retry=10,timeo=200
auto_direct table is used to automount home directories on all other machines.
# niscat auto_direct.org_dir
/home/a newton:/export/home/a
/home/b newton:/export/home/b
/home/c newton:/export/home/c
/home/d newton:/export/home/d
++++++++++++++++++++++++++++++++++++++++++++++++++++
The problem became so aggravating that we decided to replace the hardware, so
I'll never know the exact cause of the problem. After replacing the
computer, we had no problem with the disk so far.
<oyeyemi_ade@jpmorgan.com> had a similar problem in the past, which was
traced to the bad memory. This sounds like the most probable cause
of our problem (we added memory to that machine recently).
Karl Vogel mailed the following hint:
Check your power supply. We replaced two disks that probably had
nothing wrong with them because our Storage Array power supply went to
hell.
Again, thanks a lot to all who replied!
Vladimir
vladimir@math.uic.edu