The critical part is being able to copy the compression object. 
 Without compression it is trivial to determine if a given file will 
 "fit" inside the archive.  When using compression, the compression 
 ratio of a file depends partially on all the data that has been 
 compressed prior to it. 
 

The current implementation in the standard library does not allow you 
 to copy these compression objects in a useful way, so I've made some 
 minor modifications (patch attached) to the standard 2.4.2 library: 
  - Add copy() method to zlib compression object.  This returns a new 
 compression object with the same internal state.  I named it copy() to 
  keep it consistent with things like sha.copy(). 
  - Add snapshot() / restore() methods to GzipFile and TarFile.  These 
 work only in write mode.  snapshot() returns a state object.  Passing 
 in this state object to restore() will restore the state of the 
  GzipFile / TarFile to the state represented by the object. 
 

Future work: 
 - Decompression objects could use a copy() method too 
 - Add support for copying bzip2 compression objects 
 

Although this patch isn't complete, does this seem like a good approach? 
 

Cheers, 
 Chris 
 

">

(original) (raw)

I'm writing a program in python that creates tar files of a certain
maximum size (to fit onto CD/DVD).  One of the problems I'm running
into is that when using compression, it's pretty much impossible to

determine if a file, once added to an archive, will cause the archive
size to exceed the maximum size.

I believe that to do this properly, you need to copy the state of tar

file (basically the current file offset as well as the state of the
compression object), then add the file.  If the new size of the archive
exceeds the maximum, you need to restore the original state.

The critical part is being able to copy the compression object.
Without compression it is trivial to determine if a given file will
"fit" inside the archive.  When using compression, the compression

ratio of a file depends partially on all the data that has been
compressed prior to it.

The current implementation in the standard library does not allow you

to copy these compression objects in a useful way, so I've made some
minor modifications (patch attached) to the standard 2.4.2 library:
- Add copy() method to zlib compression object.  This returns a new

compression object with the same internal state.  I named it copy() to
keep it consistent with things like sha.copy().
- Add snapshot() / restore() methods to GzipFile and TarFile.  These
work only in write mode.  snapshot() returns a state object.  Passing

in this state object to restore() will restore the state of the
GzipFile / TarFile to the state represented by the object.

Future work:

- Decompression objects could use a copy() method too
- Add support for copying bzip2 compression objects

Although this patch isn't complete, does this seem like a good approach?

Cheers,
Chris