Issue 26012: pathlib.Path().rglob() is fooled by symlink loops (original) (raw)
Issue26012
Created on 2016-01-05 02:13 by gvanrossum, last changed 2022-04-11 14:58 by admin. This issue is now closed.
Messages (14) | ||
---|---|---|
msg257511 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-05 02:13 |
I created a symlink loop as follows: mkdir tmp cd tmp ln -s ../tmp baz cd .. Then I tried to list it recursively using rglob(): >>> list(pathlib.Path('tmp').rglob('**/*') This caused an infinite regress: Traceback (most recent call last): File "", line 1, in File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1065, in rglob for p in selector.select_from(self): File "/Users/guido/src/cpython36/Lib/pathlib.py", line 549, in _select_from for p in successor_select(starting_point, is_dir, exists, listdir): File "/Users/guido/src/cpython36/Lib/pathlib.py", line 548, in _select_from for starting_point in self._iterate_directories(parent_path, is_dir, listdir): File "/Users/guido/src/cpython36/Lib/pathlib.py", line 538, in _iterate_directories for p in self._iterate_directories(path, is_dir, listdir): [...] File "/Users/guido/src/cpython36/Lib/pathlib.py", line 538, in _iterate_directories for p in self._iterate_directories(path, is_dir, listdir): File "/Users/guido/src/cpython36/Lib/pathlib.py", line 537, in _iterate_directories if is_dir(path): File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1303, in is_dir return S_ISDIR(self.stat().st_mode) File "/Users/guido/src/cpython36/Lib/pathlib.py", line 1111, in stat return self._accessor.stat(self) File "/Users/guido/src/cpython36/Lib/pathlib.py", line 371, in wrapped return strfunc(str(pathobj), *args) OSError: [Errno 62] Too many levels of symbolic links: 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz' | ||
msg257516 - (view) | Author: Josh Rosenberg (josh.r) * ![]() |
Date: 2016-01-05 05:25 |
So would the goal of a fix be something like dev/inode memoization to prevent traversing the same link twice? To provide an argument to prevent following symlinks at all (as in stuff like os.walk), possibly defaulting to followlinks=False? It looks like languages like Ruby never follow symlinks when performing recursive globs to avoid this issue ( https://stackoverflow.com/questions/357754/can-i-traverse-symlinked-directories-in-ruby-with-a-glob ). It's probably easiest to just block recursing through symlinks (at least by default); memoizing risks unbounded memory overhead for large directory trees. | ||
msg257519 - (view) | Author: Serhiy Storchaka (serhiy.storchaka) * ![]() |
Date: 2016-01-05 06:21 |
glob.glob() just stops too deep recursion. >>> import glob >>> glob.glob('tmp/**/*') ['tmp/baz/baz'] >>> glob.glob('tmp/**/*', recursive=True) ['tmp/baz', 'tmp/baz/baz', 'tmp/baz/baz/baz', 'tmp/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz', 'tmp/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz/baz'] | ||
msg257522 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-05 07:03 |
I agree it's easiest just not to traverse symlinks matching **. glob.py avoids the errors (but not the senseless recursion) by simply ignoring OSError coming out of listdir(). That might be a good idea anyways (it might even be a simpler way to avoid the PermissionError reported in #24120). | ||
msg257620 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-06 17:45 |
I'm going to fix this by skipping all symlinks in _RecursiveWildcardSelector. | ||
msg257628 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2016-01-06 18:36 |
New changeset 18f5b125a863 by Guido van Rossum in branch '3.4': Issue #26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob(). https://hg.python.org/cpython/rev/18f5b125a863 New changeset 9826dbad1252 by Guido van Rossum in branch '3.5': Issue #26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob(). (Merge 3.4->3.5) https://hg.python.org/cpython/rev/9826dbad1252 New changeset 36864abbfe02 by Guido van Rossum in branch 'default': Issue #26012: Don't traverse into symlinks for ** pattern in pathlib.Path.[r]glob(). (Merge 3.5->3.6) https://hg.python.org/cpython/rev/36864abbfe02 | ||
msg257683 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-01-07 11:43 |
This seems to have broken test_pathlib, both on my computer and some Linux and BSD buildbots: ====================================================================== FAIL: test_rglob_symlink_loop (test.test_pathlib.PathTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/media/disk/home/proj/python/cpython/Lib/test/test_pathlib.py", line 1455, in test_rglob_symlink_loop self.assertEqual(given, {p / x for x in expect}) AssertionError: Items in the second set but not the first: PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirA/linkC') PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/fileB') PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/linkD') ====================================================================== FAIL: test_rglob_symlink_loop (test.test_pathlib.PosixPathTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "/media/disk/home/proj/python/cpython/Lib/test/test_pathlib.py", line 1455, in test_rglob_symlink_loop self.assertEqual(given, {p / x for x in expect}) AssertionError: Items in the second set but not the first: PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirA/linkC') PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/fileB') PosixPath('/media/disk/home/proj/python/cpython/build/test_python_27756/@test_27756_tmp/dirB/linkD') | ||
msg257699 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-07 16:44 |
Oops, sorry about that. I will see if I can figure out how to repro this -- on my own Mac it succeeds. In the mean time if you could have a look yourself (since you can repro it on your own computer) I'd be grateful. Initial hunches: maybe a case-insensitivity crept into the test? | ||
msg257708 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-07 18:44 |
Looks like the root cause of the breakage is actually that issue #24120 isn't quite fixed yet. I'm now able to repro on a Linux VM. | ||
msg257711 - (view) | Author: Roundup Robot (python-dev) ![]() |
Date: 2016-01-07 18:58 |
New changeset 4043e08e6e52 by Guido van Rossum in branch '3.4': Add another try/except PermissionError to avoid depending on listdir order. Fix issues #24120 and #26012. https://hg.python.org/cpython/rev/4043e08e6e52 New changeset 8a3b0c1fb3d3 by Guido van Rossum in branch '3.5': Add another try/except PermissionError to avoid depending on listdir order. Fix issues #24120 and #26012. (Merge 3.4->3.5) https://hg.python.org/cpython/rev/8a3b0c1fb3d3 New changeset 398cb8c183da by Guido van Rossum in branch 'default': Add another try/except PermissionError to avoid depending on listdir order. Fix issues #24120 and #26012. (Merge 3.5->3.6) https://hg.python.org/cpython/rev/398cb8c183da | ||
msg257712 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-07 18:59 |
Should be fixed for real now. | ||
msg257717 - (view) | Author: Martin Panter (martin.panter) * ![]() |
Date: 2016-01-07 21:15 |
Thanks, your latest change seems to have fixed it on my Linux computer, and most of the buildbots. However now there is a failure on <http://buildbot.python.org/all/builders/AMD64%20Windows7%20SP1%203.x>: ====================================================================== FAIL: test_rglob_common (test.test_pathlib.PathTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1460, in test_rglob_common "linkB/fileB", "dirA/linkC/fileB"]) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1451, in _check self.assertEqual(set(glob), { P(BASE, q) for q in expected }) AssertionError: Items in the second set but not the first: WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/linkB/fileB') WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirB/linkD/fileB') WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirA/linkC/fileB') ====================================================================== FAIL: test_rglob_common (test.test_pathlib.WindowsPathTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1460, in test_rglob_common "linkB/fileB", "dirA/linkC/fileB"]) File "C:\buildbot.python.org\3.x.kloth-win64\build\lib\test\test_pathlib.py", line 1451, in _check self.assertEqual(set(glob), { P(BASE, q) for q in expected }) AssertionError: Items in the second set but not the first: WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/linkB/fileB') WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirB/linkD/fileB') WindowsPath('C:/buildbot.python.org/3.x.kloth-win64/build/build/test_python_1700/@test_1700_tmp/dirA/linkC/fileB') I don’t have a Windows setup so I can’t really help much with this one. | ||
msg257718 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-07 21:30 |
I think I understand the Windows failure. I uncommented some tests that were previously broken due to the symlink loop and/or PermissionError, but one of these has a different expected outcome if symlinks don't work. I've pushed a hopeful fix for that (no Windows box here either, but the buildbots are good for this :-). | ||
msg257726 - (view) | Author: Guido van Rossum (gvanrossum) * ![]() |
Date: 2016-01-07 23:11 |
Looks like that Windows buildbot is now green. Of the stable bots only OpenIndiana is red, and it's unrelated. |
History | |||
---|---|---|---|
Date | User | Action | Args |
2022-04-11 14:58:25 | admin | set | github: 70200 |
2016-01-07 23:11:09 | gvanrossum | set | messages: + |
2016-01-07 21:30:15 | gvanrossum | set | messages: + |
2016-01-07 21:15:54 | martin.panter | set | messages: + |
2016-01-07 18:59:52 | gvanrossum | set | status: open -> closedmessages: + |
2016-01-07 18:58:59 | python-dev | set | messages: + |
2016-01-07 18:44:49 | gvanrossum | set | messages: + |
2016-01-07 16:44:14 | gvanrossum | set | messages: + |
2016-01-07 11:43:31 | martin.panter | set | status: closed -> opennosy: + martin.pantermessages: + |
2016-01-06 18:37:14 | gvanrossum | set | status: open -> closedtype: behaviorresolution: fixedstage: resolved |
2016-01-06 18:36:40 | python-dev | set | nosy: + python-devmessages: + |
2016-01-06 17:45:20 | gvanrossum | set | assignee: gvanrossummessages: + |
2016-01-05 07:03:20 | gvanrossum | set | messages: + |
2016-01-05 06:21:55 | serhiy.storchaka | set | nosy: + serhiy.storchakamessages: + |
2016-01-05 05:25:17 | josh.r | set | nosy: + josh.rmessages: + |
2016-01-05 02:39:44 | ned.deily | set | nosy: + pitrou |
2016-01-05 02:14:14 | gvanrossum | set | components: + Library (Lib) |
2016-01-05 02:13:51 | gvanrossum | create |