Issue 2846: Gzip cannot handle zero-padded output + patch (original) (raw)

There are cases when gzip produces/receives a zero-padded output, for example when creating a compressed tar archive with a pipe:

tar cz /dev/null > foo.tgz

ls -la foo.tgz -rw-r----- 1 tadek tadek 10240 May 13 23:40 foo.tgz

tar tvfz foo.tgz crw-rw-rw- root/root 1,3 2007-10-18 18:27:25 dev/null

This is a known behavior (http://www.gzip.org/#faq8) and recent versions of gzip handle it gracefully by skipping all zero bytes after the end of the file (see gzip.c:1394-1406 in the version 1.3.12).

The Python gzip module crashes on those files:

#:/python2.5/py2.5$ tar cz /dev/null > foo.tgz tar: Removing leading `/' from member names #:/python2.5/py2.5$ bin/python Python 2.5.2 (r252:60911, May 14 2008, 00:02:24) [GCC 4.0.3 (Ubuntu 4.0.3-1ubuntu5)] on linux2 Type "help", "copyright", "credits" or "license" for more information.

import gzip f=gzip.open("foo.tgz") f.read() Traceback (most recent call last): File "", line 1, in File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 220, in read self._read(readsize) File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 263, in _read self._read_gzip_header() File "/home/tadek/python2.5/py2.5/lib/python2.5/gzip.py", line 164, in _read_gzip_header raise IOError, 'Not a gzipped file' IOError: Not a gzipped file

The proposed patch fixes this behavior by reading all zero characters at the end of the file. I tested that it works with: regular archives, zero-padded archives, concatenated archives and concatenated zero-padded archives.

Regards, Tadek