Symlink (and other) handling of archives · Issue #5919 · pypa/pip (original) (raw)
Following on from #5848 I've been looking into pip's handling of symlinks in zips (and tars). I've got some experiments to PR, but I think it might be more useful to have a quick conversation about what pip actually wants to do with symlinks in archives it handles, if anything.
The current state
At present, pip handles symlinks present in tar archives (largely because Python's TarFile
handles them). By contrast, pip doesn't handle symlinks in zips properly (again, largely because Python's ZipFile
doesn't); it extracts them as a regular file containing the name of the target. However, there is enough information in the file-attributes in a zip to accurately reconstruct symlinks (demonstrably as the infozip tools do this).
Then there's the OS differences. On UNIX-like platforms, symlinks (in tars) will be re-constructed normally. However, Windows presents certain challenges:
- Symlink support only appeared in Vista (is XP still a concern?)
- More importantly, it's a privileged operation (there's a capability for it, but it's only granted to administrators by default)
Hence, if pip is running as Administrator on a recent (Vista onwards) version of Windows, symlinks (in tars) will be re-constructed as symlinks. Otherwise, they'll be extracted as a regular file containing the target's contents (i.e. the target file will be duplicated under the symlink's name).
Questions
- Do we want pip to treat symlinks in all archives equally? I'm assuming that the current disparity between tars and zips should be corrected. Which leads onto...
- Do we want pip to handle symlinks at all? Given that support on Windows is unlikely to work (without administrative privileges, and I don't think it's normal to run pip with administrative privileges on Windows?), there's several options:
- The simplest option: treat the presence of symlinks as an error and fail if we find one
- The "only on supported platforms" option: attempt to create the symlink and if it fails, throw an error (or log a warning?)
- The "always work" option: attempt to create the symlink; if it fails, extract the target file again under the symlink's name (as mentioned above, this is what pip's
untar_file
currently does viaTarFile
)
Other stuff
Basically, I think unzip_file
and untar_file
need a bit of a re-work to make them both consistent with each other. While I'm at it there's a few other things that I'd like to fix:
- I'm not particularly happy that
untar_file
handles symlinks by calling an "internal" method ofTarFile
(_extract_member
1); ideally I'd like to re-work that to avoid such calls. I admit it's unlikely to change (it's been there almost unchanged since 2.7) but still, I don't like relying on undocumented methods ZipFile.extract
2 fixes illegal characters in filenames when extracting in Windows (oddlyTarFile.extract
doesn't). Might be useful to add this functionality too - especially if the "always work" option is selected (as presumably the expectation there is that all archives should extract successfully on all platforms)- What about hard-links, FIFOs, and devices (all perfectly valid in a tar, and in some cases, zips)? My gut instinct is: throw an error, or possibly just a warning?
- Finally, I'd like to throw in some protection against absolute paths in archives (like
ZipFile.extract
3). Currently pip doesn't guard against this, and given it's occasionally run as root on UNIX-like systems that's a bit dangerous