Issue 822668: tarfile exception on large .tar files (original) (raw)

The following exception is thrown when I write a lot of data > 10Gb directly to my tapestreamer. Is this normal?

Traceback (most recent call last): File "/usr/local/metroField/fieldPlugins/Backup.py", line 184, in run self._doBackup() File "/usr/local/metroField/fieldPlugins/Backup.py", line 333, in _doBackup arc.close() File "/usr/local/metroField/fieldPlugins/Backup.py", line 533, in close self.tf.close() File "/usr/local/lib/python2.3/tarfile.py", line 1009, in close self.fileobj.close() File "/usr/local/lib/python2.3/tarfile.py", line 360, in close self.fileobj.write(struct.pack("<L", self.pos)) OverflowError: long int too large to convert

Logged In: YES user_id=887415

Hi, I think I've found the correct solution to the problem (though I havn't actually tested it). Looking in tarfile.py...

358: if self.type == "gz":

359: self.fileobj.write(struct.pack("<l", self.crc))

360: self.fileobj.write(struct.pack("<L", self.pos))

...shows that this error only occurs when using .gz extensions. Testing shows that this error occurs when self.pos > sys. maxint*2+2, that is for files larger than 4Gb. This is not good since the newest tar and gzip versions can handle files larger than that.

According to the gzip file format spec from www.wotsit.org, the last 4 bytes of a gzip file "contains the size of the original (uncompressed) input data modulo 2^32". All that has to be done is to perform this calculation prior to the call to struct. pack. Here is my proposed fix:

358: if self.type == "gz":

359: self.fileobj.write(struct.pack("<l", self.crc))

360: self.fileobj.write(struct.pack("<L", self.pos % 2**32) )

I also noted that in Jython 2.1 struct.pack('<L', sys. maxint*2+2) does not raise an OverflowError but wraps around and returns '\x00\x00\x00\x00'. This results in the correct size calculation for gzip but to silent the overflow is probably not a good idea.

...johahn