[Python-Dev] Request for Pronouncement: PEP 441 (original) (raw)
[Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
Thomas Wouters thomas at python.org
Mon Feb 23 20:41:35 CET 2015
- Previous message: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
- Next message: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, Feb 23, 2015 at 8:22 PM, Ethan Furman <ethan at stoneleaf.us> wrote:
On 02/23/2015 11:01 AM, Daniel Holth wrote: > On Mon, Feb 23, 2015 at 1:49 PM, Paul Moore wrote: >> On 23 February 2015 at 18:40, Brett Cannon wrote: >>> >>> Couldn't you just keep it in memory as bytes and then write directly over >>> the file? I realize that's a bit wasteful memory-wise but it is possible. >>> The docs could mention the memory cost is something to watch out for when >>> doing an in-place replacement. Heck the code could even make it an >>> io.BytesIO instance so the rest of the code doesn't have to care about this >>> special case. >> >> I did consider this option, and I still quite like it. In fact, >> originally I wrote the API to only be in-place, until I realised >> that wouldn't work for things bigger than memory (but who has a Python >> app that's bigger than RAM?) >> >> I'm happy to modify the API along these lines (details to be thrashed >> out) if people think it's worthwhile. > > Sounds reasonable. It could be done by just reading the entire file > contents after the shebang and re-writing them with the necessary > offset all in RAM, truncating the file if necessary, without involving > the zipfile module very much; the shebang could have some amount of > padding by default; the file could just be re-compressed in memory > depending on your appetite for complexity.
This could be a completely stupid question, but how does the zip file know where the individual files are? More to the point, does the index work via relative or absolute offset? If absolute, wouldn't the index have to be rewritten if the zip portion of the file moves?
Yes and no. The ZIP format uses a 'central directory' which is a record of each file in the archive. The offsets are relative (although the specification is a little vague on what they're relative to when using a .zip file. The wording talks about disk numbers, ZIP being from the era of floppy disks.) You find the central directory by searching from the end (or reading a specific spot at the end, if you don't support archive comments. zipimport, for example, doesn't support archive comments) and it turns out you can find the central directory from just that information (and as far as I know, all tools do.) However, there are still some offsets that would change if you add stuff to the front of the ZIP file (or remove it), and some zip tools will complain (usually just in verbose mode, though.)
-- Thomas Wouters <thomas at python.org>
Hi! I'm an email virus! Think twice before sending your email to help me spread! -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20150223/d7efb704/attachment.html>
- Previous message: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
- Next message: [Python-Dev] Request for Pronouncement: PEP 441 - Improving Python ZIP Application Support
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]