[Python-Dev] Copying zlib compression objects (original) (raw)
Chris AtLee chris at atlee.ca
Fri Feb 17 17:48:28 CET 2006
- Previous message: [Python-Dev] Deprecate ``multifile``?
- Next message: [Python-Dev] Copying zlib compression objects
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I'm writing a program in python that creates tar files of a certain maximum size (to fit onto CD/DVD). One of the problems I'm running into is that when using compression, it's pretty much impossible to determine if a file, once added to an archive, will cause the archive size to exceed the maximum size.
I believe that to do this properly, you need to copy the state of tar file (basically the current file offset as well as the state of the compression object), then add the file. If the new size of the archive exceeds the maximum, you need to restore the original state.
The critical part is being able to copy the compression object. Without compression it is trivial to determine if a given file will "fit" inside the archive. When using compression, the compression ratio of a file depends partially on all the data that has been compressed prior to it.
The current implementation in the standard library does not allow you to copy these compression objects in a useful way, so I've made some minor modifications (patch attached) to the standard 2.4.2 library:
- Add copy() method to zlib compression object. This returns a new compression object with the same internal state. I named it copy() to keep it consistent with things like sha.copy().
- Add snapshot() / restore() methods to GzipFile and TarFile. These work only in write mode. snapshot() returns a state object. Passing in this state object to restore() will restore the state of the GzipFile / TarFile to the state represented by the object.
Future work:
- Decompression objects could use a copy() method too
- Add support for copying bzip2 compression objects
Although this patch isn't complete, does this seem like a good approach?
Cheers, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.python.org/pipermail/python-dev/attachments/20060217/5f26b769/attachment-0001.html -------------- next part -------------- A non-text attachment was scrubbed... Name: snapshots.diff Type: text/x-patch Size: 3500 bytes Desc: not available Url : http://mail.python.org/pipermail/python-dev/attachments/20060217/5f26b769/attachment-0001.bin
- Previous message: [Python-Dev] Deprecate ``multifile``?
- Next message: [Python-Dev] Copying zlib compression objects
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]