gh-51067: Add remove() in ZipInfo by yudilevi2 · Pull Request #19358 · python/cpython (original) (raw)
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters
[ Show hidden characters]({{ revealButtonHref }})
Hello, and thanks for your contribution!
I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).
CLA Missing
Our records indicate the following people have not signed the CLA:
@nergall2
For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.
If you have recently signed the CLA, please wait at least one business day
before our records are updated.
You can check yourself to see if the CLA has been received.
Thanks again for the contribution, we look forward to reviewing it!
yudilevi2 changed the title
enhancement-6818: Add remove() in ZipInfo enhancement 6818: Add remove() in ZipInfo
yudilevi2 changed the title
enhancement 6818: Add remove() in ZipInfo bpo-6818: Add remove() in ZipInfo
This addition would be useful for my projects. Thanks!
I think this should change should include unit tests in https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py -- perhaps a test to remove a middle entry and one to remove the last entry in a zip file.
Does removing an entry truncate the zipfile to the new file length? If not, should it? Either way, a comment in the code and a test for verification would be helpful too.
Does this code work on both 32-bit and 64-bit zip files?
@yudilevi2, it seems that you haven't created a user on bugs.python.org (or if you have, then I can't find it.) For our CLA bot to recognize your signature you'll have to have a user there. See the above comments for links with more details.
Sorry for requiring this bureaucracy, but please just let us know if you run into any trouble or would like more help with this.
@yudilevi2, on b.p.o. the username yudilevi has a different GitHub username configured. That's likely why, for this PR, our bot isn't picking up on the fact that you've signed the CLA.
It seems that your options are:
- Change the GitHub user associated with your b.p.o. account to "yudilevi2"
- Create a new "yudilevi2" user on b.p.o., associate it with your yudilevi2 GitHub username, and sign the CLA again with that account.
Could we get an update on this? Is it just being held back by the CLA?
Sorry, there has been some confusion because I renamed the account. I did sign the CLA multiple times already so it should be good but I'm not sure it's all been registered correctly. How do I check?
Sorry, there has been some confusion because I renamed the account. I did sign the CLA multiple times already so it should be good but I'm not sure it's all been registered correctly. How do I check?
@yudilevi2 Checking here https://check-python-cla.herokuapp.com/ says you don't have an account associated with your Github account.
Was there any progress on this?
As to progress, nothing yet on my end, can't find the time to dig into the code/flows yet in order to write proper tests, will try to get to it soon.
Since this has gotten a bit stuck, I'm just chiming in to share that I came here while researching a legislative proposal to generate massive bulk data files of American legal documents. One of the sticking points I anticipate with the proposal to create bulk files is that sealed content will need to occasionally be removed from them. Currently, doing so with Python requires decompressing the zip, removing a file, and then recompressing it (doing it with the zip
CLI tool would work sans Python).
All this to say, I'm excited to see this PR and hope it can move forward. Thank you all.
I hope this remove()
get merge soon.
As mentioned, it will not get merged without tests and documentation.
if i == len(filelist) - 1: |
---|
entry_size = self.start_dir - info.header_offset |
else: |
entry_size = filelist[i + 1].header_offset - info.header_offset |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there anything in zipfile spec which says the header offsets have to be contiguous?
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't matter anyway. The implementation moves all bytes between the start of a section and the start of the next section. If there were to be random bytes between the end of a section and the start of the next section, they will be moved too.
# Make sure we have an info object |
---|
if isinstance(member, ZipInfo): |
# 'member' is already an info object |
zinfo = member |
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsafe. The ZipInfo
instances in self.filelist
will not compare equal with this zinfo even if the filename is matching - they compare by identity.
Additionally, if remove
is able to accept a ZipInfo
instance, then there needs to be handling of the possibility that the instance is not present in the zipfile at all.
I recommend to document that member is a string, and then use self.getinfo(member)
, avoiding these issues.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comparing by identity when passing a zinfo is a feature and is consistent to the behavior of writestr
etc. I don't think this need to change.
It actually is a problem if the zinfo/filename does not exist in the zipfile at all, my revision has added a check to prevent that.
terryjreedy changed the title
bpo-6818: Add remove() in ZipInfo gh-51067: Add remove() in ZipInfo