gh-51067: Add remove() in ZipInfo by yudilevi2 · Pull Request #19358 · python/cpython (original) (raw)

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service andprivacy statement. We’ll occasionally send you account related emails.

Already on GitHub?Sign in to your account

Conversation

This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.Learn more about bidirectional Unicode characters

[ Show hidden characters]({{ revealButtonHref }})

yudilevi2

@yudilevi2

@the-knights-who-say-ni

Hello, and thanks for your contribution!

I'm a bot set up to make sure that the project can legally accept this contribution by verifying everyone involved has signed the PSF contributor agreement (CLA).

CLA Missing

Our records indicate the following people have not signed the CLA:

@nergall2

For legal reasons we need all the people listed to sign the CLA before we can look at your contribution. Please follow the steps outlined in the CPython devguide to rectify this issue.

If you have recently signed the CLA, please wait at least one business day
before our records are updated.

You can check yourself to see if the CLA has been received.

Thanks again for the contribution, we look forward to reviewing it!

@yudilevi2 yudilevi2 changed the titleenhancement-6818: Add remove() in ZipInfo enhancement 6818: Add remove() in ZipInfo

Apr 4, 2020

@yudilevi2 yudilevi2 changed the titleenhancement 6818: Add remove() in ZipInfo bpo-6818: Add remove() in ZipInfo

Apr 4, 2020

@cmeyer

This addition would be useful for my projects. Thanks!

I think this should change should include unit tests in https://github.com/python/cpython/blob/master/Lib/test/test_zipfile.py -- perhaps a test to remove a middle entry and one to remove the last entry in a zip file.

Does removing an entry truncate the zipfile to the new file length? If not, should it? Either way, a comment in the code and a test for verification would be helpful too.

Does this code work on both 32-bit and 64-bit zip files?

@csabella

@yudilevi2

@remilapeyre

@taleinat

@yudilevi2, it seems that you haven't created a user on bugs.python.org (or if you have, then I can't find it.) For our CLA bot to recognize your signature you'll have to have a user there. See the above comments for links with more details.

Sorry for requiring this bureaucracy, but please just let us know if you run into any trouble or would like more help with this.

@yudilevi2

@taleinat

@yudilevi2, on b.p.o. the username yudilevi has a different GitHub username configured. That's likely why, for this PR, our bot isn't picking up on the fact that you've signed the CLA.

It seems that your options are:

  1. Change the GitHub user associated with your b.p.o. account to "yudilevi2"
  2. Create a new "yudilevi2" user on b.p.o., associate it with your yudilevi2 GitHub username, and sign the CLA again with that account.

@pmdevita

Could we get an update on this? Is it just being held back by the CLA?

@yudilevi2

Sorry, there has been some confusion because I renamed the account. I did sign the CLA multiple times already so it should be good but I'm not sure it's all been registered correctly. How do I check?

@peterstanton

Sorry, there has been some confusion because I renamed the account. I did sign the CLA multiple times already so it should be good but I'm not sure it's all been registered correctly. How do I check?

@yudilevi2 Checking here https://check-python-cla.herokuapp.com/ says you don't have an account associated with your Github account.

@marchinidavide

Was there any progress on this?

@yudilevi2

@yudilevi2

As to progress, nothing yet on my end, can't find the time to dig into the code/flows yet in order to write proper tests, will try to get to it soon.

@mlissner

Since this has gotten a bit stuck, I'm just chiming in to share that I came here while researching a legislative proposal to generate massive bulk data files of American legal documents. One of the sticking points I anticipate with the proposal to create bulk files is that sealed content will need to occasionally be removed from them. Currently, doing so with Python requires decompressing the zip, removing a file, and then recompressing it (doing it with the zip CLI tool would work sans Python).

All this to say, I'm excited to see this PR and hope it can move forward. Thank you all.

@quyentruong

I hope this remove() get merge soon.

@wimglenn

As mentioned, it will not get merged without tests and documentation.

wimglenn

if i == len(filelist) - 1:
entry_size = self.start_dir - info.header_offset
else:
entry_size = filelist[i + 1].header_offset - info.header_offset

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there anything in zipfile spec which says the header offsets have to be contiguous?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't matter anyway. The implementation moves all bytes between the start of a section and the start of the next section. If there were to be random bytes between the end of a section and the start of the next section, they will be moved too.

wimglenn

# Make sure we have an info object
if isinstance(member, ZipInfo):
# 'member' is already an info object
zinfo = member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsafe. The ZipInfo instances in self.filelist will not compare equal with this zinfo even if the filename is matching - they compare by identity.

Additionally, if remove is able to accept a ZipInfo instance, then there needs to be handling of the possibility that the instance is not present in the zipfile at all.

I recommend to document that member is a string, and then use self.getinfo(member), avoiding these issues.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comparing by identity when passing a zinfo is a feature and is consistent to the behavior of writestr etc. I don't think this need to change.

It actually is a problem if the zinfo/filename does not exist in the zipfile at all, my revision has added a check to prevent that.

@terryjreedy terryjreedy changed the titlebpo-6818: Add remove() in ZipInfo gh-51067: Add remove() in ZipInfo

Dec 15, 2022

@python-cla-bot