SUMMARY: fsck & news server

Joe R. Jah (jjah@sol.ccsf.cc.ca.us)
Mon, 14 Apr 1997 11:22:17 -0700 (PDT)

Many thanks to the following and those whose responses may be on the way:

Karl E. Vogel <vogelke@c17.wpafb.af.mil>
Justin Young <justiny@cluster.engr.subr.edu>
Ted Marigomen <tmarigom@trimedia.scs.philips.com>
Peter Marelas <maral@phase-one.com.au>
Brett Lymn <blymn@awadi.com.au>
Mattias Zhabinskiy <mattias@txc.com>

The original message:

> Hi Folks,
>
> I run a Netscape news server on a Tatung Compstation 5, Sun5 equivalent,
> running Solaris 2.5.
>
> Last week while I was reading messages on pine, as a regular user, the
> system crashed. It read on console that I should run fsck manually to fix
> the /export/home file system. I ran "fsck -y /export/home"; it took over
> four darn hours:^<.
>
> After the first half hour the messages became similar to:
>
> UNREF FILE I=503705 OWNER=news MODED=100664
> SIZE=1433 MTIME=Mar 23 16:55 1997
> RECONNECT? yes
>
> SORRY. NO SPACE IN lost+found DIRECTORY
> CLEAR? yes
>
> There must have been a couple of hundred thousand problem files.
>
> Later I removed everything from lost+found directory; then I did a
>
> find /export/home/local/news/spool -type f -mtime +7 -print>news.trash
>
> I found out that since March 20 my old news files had accumulated and for
> some reason, unknown to me, had crashed the system. I had to amnuslly
> remove them too. I didn't find any helpful pointers in the logs either.
>
> How do I deal with a filled lost+found directory? How can I speed up fsck
> when there are large numbers of fixes? How do I pinpoint the cause of a
> crash?

===========

The answers:

------
Peter Marelas wrote:

We have had similar problems with news servers as well. The bottom line
is to have quality disks. We found after turning on crash dumps in
/etc/init.d/sysetup by uncommenting the savecore lines, we found during
the news expire process, it would crash due to the system trying to free
an inode. The bottom line was the disk was in a bad state, and it didnt
agree with the OS.

I suggest you try format the disk, then newfs. If this doesnt work,
replace it.

------
Ted Marigomen wrote:

On our Solaris 2.5 machines over here, we have experienced crashes
because of full file systems. Also, we experienced a crash because, for
some reason, the lost+found directory was deleted. So, a preventive
measure and not really a solution is to monitor those filesystems and
keeping them from getting full.

-----
Brett Lymn wrote:

You need to check your expiry of news. Either you are not running the
expire process or you are telling the expire you want to keep things
for too long.

> How do I deal with a filled lost+found directory?

You need to create lots of files in the directory and then delete them
- this "reserves" directory entries for fsck to use in an emergency.

> How can I speed up fsck when there are large numbers of fixes?

Faster disk? ;-) There is no real way of doing this but probably what
you should do is look at splitting your news spool out of your home
directory partition and put it onto it's own partition. This way if
the machine dies the problem is slightly more manageable - my usual
attitude to news is that if we lose our history that's tough, in that
situation I would probably just newfs the news spool and just let it
go at that.

> How do I pinpoint the cause of a crash?

You need to enable crash dumps on your machine and have enough space
for the dump to be written. If you do this then the operating system
should write it's state out to swap on a crash and during boot take
that dump and write it to your file system so that you can analyse the
cause (or send the thing to Sun support for them to look at).

-----
Justin Young wrote:

In the future, mirror your /var/spool/news or whatever filesystem in
which your news articles are stored. I don't even think I need to tell
you to have a UPS that will do a *graceful* shutdown.

Both INN and DNEWS would have messed you up, too.

News servers leave a lot of open files that are extremely susceptible to
corruption if not properly closed.

-----
Karl E. Vogel wrote:

J> How do I deal with a filled lost+found directory?

I'd recommend a cron script that periodically checks all of your lost+found
directories and makes a fuss when it finds anything at all. lost+found
should always be empty.

J> How do I pinpoint the cause of a crash?

If the crash left a corefile of some kind, /usr/sbin/crash might tell you
something about what was happening just before.

-----

Joe

_/ _/_/_/ _/ ____________ __o
_/ _/ _/ _/ ______________ _-\<,_
_/ _/ _/_/_/ _/ _/ ......(_)/ (_)
_/_/ oe _/ _/. _/_/ ah jjah@sol.ccsf.cc.ca.us