SUMMARY 2: Problem with Last

Tuan-Eng Tan (tetan@uwin.siu.edu)
Wed, 08 Apr 1998 10:26:48 -0500 (CDT)

Hi, all

After sending the first summary, I still received reposonses. The following
are the attachment. Thanks to all those who response. Also, I apologize for
late summary.

Date: Wed, 01 Apr 1998 18:27:33 +0200
From: "Simon Convey" <simon@iway.nl>

This is exactly the kind of thing which causes all those questions about "Hey
df and du tell me I have space left, but I can't write files !!!" The above
will of course works, but the utmp daemon might have a handle open to that
inode. A move and and touch will generate a new inode number ! So the running
utmp daemon will still be writing entries to file which no longer exists ! A
filename is a meaningless concept in UNIX, the system works with inodes, not
names. your solution works fine if you kill the utmp daemon, do the moves and
then restart the daemon. Or if you don't care about keeping the the .old
files, just cp /dev/null /var/adm/wtmp. This preserves the inode number
associted with the name. Admins should get into the habbit of using this
technique for any logfiles which may be open by processes. The lsof and fuser
tools can come in handy here.....
This is really a programming issuse, and it depends whether the author
'stats' the file before adding each entry, or just stats the file once at
startup, and puts it in append mode....

Date: Wed, 01 Apr 1998 10:39:18 -0500
From: Daniel Ellis <dellis@frycomm.com>

I am attaching a summary regarding this exact problem.
--------------1ECC57BE7ABAC88B9D3D1B00
Content-Type: message/rfc822
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Message-ID: <34A8F730.47BF0083@frycomm.com>
Date: Tue, 30 Dec 1997 08:29:20 -0500
From: Daniel Ellis <dellis@frycomm.com>
Organization: Fry Communications, Inc.
X-Mailer: Mozilla 4.04 [en] (Win95; I)
MIME-Version: 1.0
To: Sun Managers <sun-managers@ra.mcs.anl.gov>
Subject: SUMMARY: utmp(x) and wtmp(x) history
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

After reading the responses, I decided the best way to repair the wtmpx
file was to write a (simple)perl script to read in the records, cut out
the garbage(only 164 bytes of crap was screwing everything up), and
close the space back up(I should be a surgeon). This worked, and I also
found out lots of interesting stuff about the files in question. Thank
you to these people who gave me sample scripts and a good history of the
[uw]tmpx? files.

BILLY <billy@student.adelaide.edu.au>
Jean-Philippe.LEROY@st.com
Jim Harmon <jharmon@telecnnct.com>
Chris_Marble@hmc.edu
jsdy@cais.com
Aleksandar Milivojevic <alex@srce.hr>
"Karl E. Vogel" <vogelke@c17mis.region2.wpafb.af.mil>

Original question:
====================
In short, can everyone tell me all they know about the utmp,wtmp,utmpx,
and wtmpx files?

I have read the man pages for [uw]tmpx? and fwtmp, know how to truncate
them and know how to rotate them, and realize that the "x" files are
extended versions of the non-"x" files, but I wonder:

why are all four files necessary, i.e. why is all accounting info not
kept in one huge file? Is this so older programs can still read the old
format of the utmp and wtmp?

why is there both a U-tmp(x) and a W-tmp(x)(emphasis on first letter)?
Again, I am curious why four files are needed.

what function each serves(sure they help commands like who and write,
but more specifically what commands rely on which files and why)

what is the history behind the files(since the x files are "extensions",
I assume that at some point there were only the utmp and wtmp)

is there an equivalent to fwtmp that can read the wtmpx file and write
it out in ascii so I can try to repair my wtmpx file? This is the real
reason for this message: my wtmpx file is messed up somehow because a
"last" command only lists people up to Dec 6. It looks as though noone
has logged in since then.

I could just truncate the file and get on with life, but I need to keep
the information intact(I do analysis on the connections to this
machine). The file is still growing, so the new logins are getting
written still, but there must be a bad spot in the file that "last"
chokes on. I might eventually write a C program to do what i want, but
I wanted to understand the history, structure, and uses for these files
first(also maybe there is already a program out there). I will check if
there are any good backups after Dec 6 of the wtmpx file(maybe it just
recently got hosed), but I would still like to know this stuff just to
be more educated.

Thank you.
====================

RESPONSES:
----------
=> why is there both a U-tmp(x) and a W-tmp(x)(emphasis on first
letter)?
=> what function each serves(sure they help commands like who and write,
=> but more specifically what commands rely on which files and why)

utmp(x) contains the current state, and is used by things like
finger(1),
write(1) and who(1)
wtmp(x) contains the login history, and is used by things like last(1)

=> is there an equivalent to fwtmp that can read the wtmpx file and
write
=> it out in ascii so I can try to repair my wtmpx file?

not that i know of... but [uw]tmp(x) manipulators are easy to write...
in
perl, you'd want something like this to read [uw]tmpx:

open(UTMPX, "/var/adm/utmpx"); # or whatever
while(read(UTMPX, $utmpx, 372)) {
($user, $id, $line, $pid, $type, $exit_1, $exit_2, $tv_1, $tv_2,
$session, $pad_1, $pad_2, $pad_3, $pad_4, $pad_5, $syslen, $host)
= unpack('A32 A4 A32 l s ss xx ll l lllll s A257', $utmpx);
# do stuff here
}
close(UTMPX);

have a peek through /usr/include/utmp.h and utmpx.h to get an idea of
the
structures and functions available... if it helps, i can send you a perl
hack
i wrote (from which i pulled the code above) that basically duplicates
"finger|sort"...

----------

>From a unix administration book (accounting chapter) :
"First of all utmp is created by the init daemon when it runs for the
first time. wtmp must be create dby the administrator. Each record is
writen in utmp by a terminal: for example login writes user name and
remote node (if any) and the connection time. When the connection ends
init process will clean this information. So the file size is more or
less stable and proportional to the number of terminals. The records are
similar in wtmp but it will contain two records by session: one for the
begining and one for the end date. This file needs to be clean
periodically based on the number of connection (nb of terminals and
users)..."

To clean wtmp you just need to "cp /dev/null /var/adm/wtmp".

----------

There's an administrative command called "wtmpfix" that will probably do
what you're looking for.

look for it in the (1m) section of the Answerbook Manpages.

It should be in the the /usr/lib/acct dir.

----------

We wrote a program here to read in and trim the files as desired.
We didn't want to simply truncate but retain the last login date
for each user no matter how old. Our program's written in perl
and should be readable and modifyable. Hope it helps.

[http://www3.hmc.edu/docs/coolstuff/wtmpx]

----------

As you clearly have deduced, "tradition" accounts for a lot of this.

In the beginning, there were just the utmp and wtmp files, in /etc/.
The utmp file, as now, contains structures for those who are currently
logged in. With the introduction of System V, certain other processes
logged themselves into the utmp file [notably 'init'], and the locations
of terminal lines became fixed in the file - no longer would a new login
just insert itself in the first empty slot. This meant, too, that many
programs began to depend on the format of utmp and wtmp.

Meanwhile, again as from the beginning, "wtmp" was just the
concatenation of 'utmp' structures to indicate when users had logged in
and out and when other system, events (notably time changes and reboots)
had happened. No attempt was made to verify whether the file was intact
before appending another 'wtmp' record. This is of especial importance
to you, as we will see.

But along came networked logins, X-windows sessions, and other things
that needed to be logged along with a 'utmp'/'wtmp' entry. Different
groups have reacted to this in different ways. Sun decided to add the
utmpx and wtmpx files. Some of the information is mirrored; but the
string lengths are notably longer. Other information is added, and
other information is omitted.

So, now, when a Sun program needs to get all of the information for a
given current login, it looks in both the utmp and the utmpx files. For
historical information, it looks in both the wtmp and wtmpx files.

I've had the problem you describe, when a 'wtmp' structure was partially
written to the "wtmp" file just when the machine went down. You need to
re-synch the file, by reading as many good records as you can, skipping
over the bad record, and repeating. You have a particular problem with
the Sun solution, in that you might want to maintain consistency between
"wtmp" and "wtmpx". I did this by doing a 'who wtmp', 'dd'ing the
appropriate number of records, using 'dd' again to skip over the mangled
record, etc. This may or may not be more onerous when synchronizing
with the 'utmpx' structures in "wtmpx".

----------

Sometimes after crash you'll get messed up wtmpx file (becose it was
not cleanly closed). If you look wtmpx, you'll see that there is some
garbage (usualy lots of zeros) that confuses commands like last.
Since wtmpx is binary file, it will be hard to repair it by hand.
But, you can write small program (similar to last) that will read the
file and ignore errors in it.

----------

Is /usr/lib/utmpd running? That should be started in
/etc/rc2.d/S88utmpd.
It's supposed to correct distortions in the utmp and utmpx files, but
it can misbehave. The current version of utmpd seems to work quite
well as long as the defaults are set properly in /etc/default/utmpd:

SCAN_PERIOD=300
MAX_FDS = 0

These values come from

http://remus.rutgers.edu/~adrian/solaris/problems.html

We use the values

SCAN_PERIOD=30
MAX_FDS = 3

If none of this helps, try modifying the S88utmpd script to remove
the utmp
and utmpx files from /var/adm, and then create new ones before
starting
utmpd:

rm /var/adm/utmp /var/adm/utmpx
cp /dev/null /var/adm/utmp
cp /dev/null /var/adm/utmpx
chown root /var/adm/utmp*
chgrp bin /var/adm/utmp*
chmod 644 /var/adm/utmp*

----------

--------------1ECC57BE7ABAC88B9D3D1B00--

Date: Wed, 1 Apr 1998 08:42:22 -0500
From: David Thorburn-Gundlach <david@bae.uga.edu>

Tuan-Eng --

How big are your /var/adm/{u,w}tmp[x] files? IIRC, when they get too
big for some of the commands (though well below the filesystem limit --
especially under 2.6), you simply get empty output.

From: Jim Robertori <jimr@lucent.com>
Date: Wed, 1 Apr 1998 08:29:21 -0500 (EST)

one thing that you may have missed, is that this is an indication that you have
been hacked. the trigger is that if all the {ut}tmp[x] files are corrupted and
no other files are damaged, what could have caused that other than the
accounting process. Usually the accounting process is as *CLEAN* as all others
and rarely corrupts files.

-- 
Tuan-Eng Tan
Universities Water Information Network
Southern Illinois University at Carbondale