More details about gzip...

=?UNKNOWN-8BIT?Q?Micha=B3?= Zalewski (lcamtuf@POLBOX.COM)
Sat, 27 Dec 1997 17:38:27 +0100

This is a multi-part message in MIME format.

--Boundary_(ID_KLX8hoF8rCqCjn+/jXrWhA)
Content-type: text/plain; charset=iso-8859-2
Content-transfer-encoding: quoted-printable

Here is even more detailed report about gzip and it's vunerabilities.
At the beginning, a short description of typical .gz header (more
details in rfc#1952). Here's a hex dump of sample archive, Altered.gz,
which has been attached to my previous letter about gzip:

offset | 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13
-------|+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
value | 1F 8B 08 00 00 00 00 00 00 03 95 00 00 00 00 00 00 00 00 00

Offs | Size | Description
-----+------+-------------
00 | 2 | Static header of gzip archive
02 | 1 | Compression method, usually 08 (deflate)
03 | 1 | Additional flags, 04 if original filename is stored
04 | 4 | Modification time
08 | 1 | Additional compression flags (depends on compr. method)
09 | 1 | Operating system (00 - DOS, 03 - Unix)
0A | ? | If original filename stored - ASCIZ string
?? | ? | Compressed data blocks
?? | 4 | CRC-32 checksum
?? | 4 | Size of uncompressed file

header of compressed data block consist of one bit, which
is set when this block is the last one, and two next bits,
which describes compression method:

00 - uncompressed data
01 - Huffman algorithm with fixed codes
10 - Huffman with dynamic codes
11 - (reserved)

After header we can found an important information, which
is used by gzip to rebuild compression tree, and that's
the most sensitive point of archive (more information about
compression tree, deflating (Huffman algorithm) can be found
in rfc#1951). Hmm, now the most interesting part - any change to
this information may cause undesirable effects. Usually,
gzip quits with 'format violated' message... But sometimes attempts
of decompression ends with segmentation faults, crashes, etc.
Eg. when compressed file is empty, by changing byte at offset
0x0B + length of original filename (or at offset 0x0A if there's no
original filename inside your archive) to 0x95, 0xA5, 0xB5, or
similar, you will cause an segmentation fault under Linux and
MS-DOS (or other funny things when you're using win95 :). Behaviour
of gzip strongly depends on archive contents, so I believe there's
a way to exploit it. As a proof I attached to this letter another,
totally different examples - dos-gpf.gz, which should cause an
General Protection Fault under DOS/Win95, and linux.gz, which works
properly ("format violated" :) only under MS-DOS.

But this vunerability still isn't well exploited. Any ideas? It's
really hard work, but just imagine...original gzip routines are
extactly... ehem, copied into many programs, including viewers,
compression utilities (WinZip) and other software... And .gz
files are very often uncompressed transparently...

_______________________________________________________________________
Michal Zalewski [tel 9690] | finger 2 PGP [lcamtuf@boss.staszic.waw.pl]
=3D--------- [ echo "while [ -f \$0 ]; do \$0 &;done" >_;. _ ] =
---------=3D

--Boundary_(ID_KLX8hoF8rCqCjn+/jXrWhA)
Content-type: model/"vrmlx-world/x-vrml"; name=Dos-gpf.gz
Content-disposition: attachment; filename=Dos-gpf.gz
Content-transfer-encoding: base64

H4sIAAAAAAAAAxrhAgBCVZ2gAgAAAA==

--Boundary_(ID_KLX8hoF8rCqCjn+/jXrWhA)
Content-type: model/"vrmlx-world/x-vrml"; name=Linux.gz
Content-disposition: attachment; filename=Linux.gz
Content-transfer-encoding: base64

H4sIAAAAAAAAAxwJDXDkAgDuT4nyBQAAAA==

--Boundary_(ID_KLX8hoF8rCqCjn+/jXrWhA)--