Issue 26587: Possible duplicate entries in sys.path if .pth files are used with zip's (original) (raw)

Issue26587

Created on 2016-03-18 10:23 by tds333, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
issue26587.diff SilentGhost,2016-03-20 11:53 review
site.patch tds333,2016-03-20 14:14 Patch for site with test review
site2.patch tds333,2016-03-26 17:33 review
Messages (13)
msg261958 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-18 10:23
In site.py there is the internal function _init_pathinfo() This function builds a set of path entries from sys.path. This is used to avoid duplicate entries in sys.path. But this function has a check if it is a directory with os.path.isdir(...). All this is fine as long as someone has a .zip file in sys.path or a zipfile subpath. Then the path entry is not part of the set. With this duplicate detection with none directories does not work. The fix is as simple as removing the os.path.isdir(...) line and fixing the indent. Also the docstring should be modified. Detected by using this function in a project reusing addsitedir(...) functionality to add another path with .pth processing.
msg261974 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-03-18 16:26
Could you provide a code example of your using addsitedir that results in duplicates?
msg262059 - (view) Author: Mandeep Singh Grang (mgrang) Date: 2016-03-19 23:37
Here is a testcase to reproduce the issue: > cat test.py import site, sys site.addsitedir('/foo/bar') print (sys.path) This prints just a single instance of '/foo/bar': ['/local/mnt/workspace/python/tst', '/foo/bar', '/usr/local/lib/python36.zip', '/local/mnt/workspace/python/src/Lib', '/local/mnt/workspace/python/src/Lib/plat-linux', '/local/mnt/workspace/python/build/build/lib.linux-x86_64-3.6-pydebug'] Now if we explicitly set PYTHONPATH to include '/foo/bar' > export PYTHONPATH=/foo/bar and then run test.py here is the output: ['/local/mnt/workspace/python/tst', '/foo/bar', '/usr/local/lib/python36.zip', '/local/mnt/workspace/python/src/Lib', '/local/mnt/workspace/python/src/Lib/plat-linux', '/local/mnt/workspace/python/build/build/lib.linux-x86_64-3.6-pydebug', '/foo/bar'] We see that there are duplicate entries for '/foo/bar'. As Wolfgang rightly said the issue comes from the check for "if os.path.isdir(dir)" inside _init_pathinfo() in site.py. On removing this check I no longer see the duplicate entry for '/foo/bar'. But since this is the first bug I am looking at I am not sure of the implications of removing this check. Can someone please confirm that what I see is indeed a failing test case, or is this the intended behavior? Thanks, Mandeep
msg262070 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-03-20 11:53
Thanks for this, Mandeep. I don't think it is entirely the same issue that Wolfgang is describing. He's particularly concerned about the clash of .zip files from the sys.path with ones coming from .pth files for example. And while the proposed fix would resolve the issue, I don't feel it would be entirely correct. For the record, I'm not even sure sys.path is guaranteed to be contain no duplicates, at least I wasn't able to find a statement to that effect in the docs. In any case, I'm adding Brett here who was seemingly the last one to touch that bit of code. I'm also attaching a path with what I think is a more appropriate and smaller fix.
msg262078 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-20 14:14
Extended unit test for the issue and patch for site.py.
msg262193 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-22 16:01
I think a fix for 3.6 only is ok, because it changes behaviour. But this is only an internal function with a "_". Should I add a test with a temporary created pth file?
msg262195 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-03-22 16:23
Unfortunately you can't simply remove that directory check because you don't want to blindly normalize case. If someone put some token value on sys.path for their use that was case-sensitive even on a case-insensitive OS then the proposed change would break those semantics (e.g. if someone put a URL in sys.path for a REST-based importer). The possibilities I see are: 1. Don't change anything; duplicate entries don't really hurt anything 2. Remove duplicate entries, but only normalize case for directories 3. Remove duplicate entries, but normalize case for anything that points to something on the filesystem (i.e. both directories and files)
msg262196 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-03-22 16:23
And the code under discussion can be found at https://hg.python.org/cpython/file/default/Lib/site.py#l133
msg262200 - (view) Author: SilentGhost (SilentGhost) * (Python triager) Date: 2016-03-22 16:45
I still think my fix is more appropriate as it ensures that known_paths and sys.path stay connected somehow.
msg262493 - (view) Author: Wolfgang Langner (tds333) * Date: 2016-03-26 17:33
Ok, I implemented point 3. Check if it is a dir or file and makepath only in this case. All other entries are added unmodified to the set. Added a test case also for an URL path. I think duplicate detection is now improved and it should break nothing.
msg263045 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-08 22:04
New changeset bd1af1a97c2e by Brett Cannon in branch 'default': Issue #26587: Allow .pth files to specify file paths as well as https://hg.python.org/cpython/rev/bd1af1a97c2e
msg263046 - (view) Author: Roundup Robot (python-dev) (Python triager) Date: 2016-04-08 22:08
New changeset 09dc97edf454 by Brett Cannon in branch '3.5': Issue #26587: Remove an incorrect statement from the docs https://hg.python.org/cpython/rev/09dc97edf454 New changeset 94d5c57ee835 by Brett Cannon in branch 'default': Merge w/ 3.5 for issue #26587 https://hg.python.org/cpython/rev/94d5c57ee835
msg263047 - (view) Author: Brett Cannon (brett.cannon) * (Python committer) Date: 2016-04-08 22:10
I simplified Wolfgang's patch by simply using os.path.exists(). That eliminated the one place where os.path.isdir() was in use that was too specific to directories where files were reasonable to expect. I also quickly tweaked the docs for the site module in 3.5 to not promise that files would work. If you think there is still an issue with keeping things tied together, SilentGhost, feel free to open another issue to track it.
History
Date User Action Args
2022-04-11 14:58:28 admin set github: 70774
2016-04-08 22:10:20 brett.cannon set status: open -> closedresolution: fixedmessages: + stage: patch review -> resolved
2016-04-08 22:08:09 python-dev set messages: +
2016-04-08 22:04:36 python-dev set nosy: + python-devmessages: +
2016-03-26 18:26:11 brett.cannon set assignee: brett.cannon
2016-03-26 17:33:45 tds333 set files: + site2.patchmessages: +
2016-03-22 16:45:37 SilentGhost set messages: +
2016-03-22 16:23:59 brett.cannon set messages: +
2016-03-22 16:23:34 brett.cannon set messages: +
2016-03-22 16:01:53 tds333 set messages: +
2016-03-20 14:14:26 tds333 set files: + site.patchmessages: +
2016-03-20 11:53:07 SilentGhost set files: + issue26587.diffnosy: + brett.cannonmessages: + keywords: + patchstage: test needed -> patch review
2016-03-19 23:37:28 mgrang set nosy: + mgrangmessages: +
2016-03-18 16:26:20 SilentGhost set versions: - Python 3.4nosy: + SilentGhostmessages: + stage: test needed
2016-03-18 10:23:00 tds333 create