Issue 36128: ResourceReader for FileLoader inconsistently handles path separators (original) (raw)

Created on 2019-02-26 21:05 by indygreg, last changed 2022-04-11 14:59 by admin. This issue is now closed.

Messages (7)
msg336712 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-02-26 21:05
The implementation of the ResourceReader API for the FileLoader class in importlib/_bootstrap_external.py is inconsistent with regards to handling of path separators. Specifically, "is_resource()" returns False if "resource" has a path separator. But "open_resource()" will happily open resources containing a path separator. I would think the two would agree about whether a path with separators is a resource or not. The documentation at https://docs.python.org/3.7/library/importlib.html#importlib.abc.ResourceReader implies that resources in subdirectories should not be allowed. One can easily demonstrate this behavior oddity with Mercurial: (Pdb) p sys.modules['mercurial'].__spec__.loader.get_resource_reader('mercurial').open_resource('help/config.txt') <_io.FileIO name='/home/gps/src/hg/mercurial/help/config.txt' mode='rb' closefd=True> (Pdb) p sys.modules['mercurial'].__spec__.loader.get_resource_reader('mercurial').is_resource('help/config.txt') False The behavior has been present since the functionality was added (https://github.com/python/cpython/pull/5168).
msg336723 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-02-26 23:45
On Feb 26, 2019, at 13:05, Gregory Szorc <report@bugs.python.org> wrote: > > I would think the two would agree about whether a path with separators is a resource or not. The documentation at https://docs.python.org/3.7/library/importlib.html#importlib.abc.ResourceReader implies that resources in subdirectories should not be allowed. Historical context is in the standalone tracker: https://gitlab.com/python-devs/importlib_resources/issues/58 Clearly, with the current definition of the API, open_resource() should not allow slashes. Please follow up on the above link if you have opinions about changing the behavior. We would have an opportunity to relax that constraint in Python 3.8, if itโ€™s something we want to do.
msg356361 - (view) Author: Marcin Kasperski (Mekk) Date: 2019-11-11 14:13
Hmm, I noticed this but accidentally and tried to port https://github.com/pypa/setuptools/issues/1635 to new api. Well: >>> import multiprocessing >>> import sys >>> reader = sys.modules['multiprocessing'].__spec__.loader.get_resource_reader('multiprocessing') >>> reader.open_resource('../../../../etc/passwd') <_io.FileIO name='/usr/lib/python3.7/multiprocessing/../../../../etc/passwd' mode='rb' closefd=True> I suppose this is the case which deserve some thought (originally I faced it when some webapp used pkg_resources to provide static files and used resource api as a way to validate urls impacted by external input). Tested on python 3.7.3, on Ubuntu 19.04.
msg356378 - (view) Author: Barry A. Warsaw (barry) * (Python committer) Date: 2019-11-11 18:24
One simple restriction would be to disallow relative paths outside of the resource anchor location.
msg356408 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-11-12 03:12
I think disallowing relative paths that are parents of the current anchor point is a reasonable restriction and acceptable backwards incompatible behavior. Disallowing all relative paths with slashes is a larger issue. I would support that if designing/implementing things today as it simplifies things greatly. But since an implementation allowing slashes has shipped in 3.7 and 3.8, I'm not sure if the backwards incompatibility could be stomached, so I'm not going to advocate for it.
msg356409 - (view) Author: Gregory Szorc (indygreg) * Date: 2019-11-12 03:18
I just noticed that there is a parallel discussion ongoing in https://gitlab.com/python-devs/importlib_resources/issues/58.
msg394507 - (view) Author: Jason R. Coombs (jaraco) * (Python committer) Date: 2021-05-27 01:56
The preferred API as implemented in Python 3.9 and importlib_resources 1.1 is the `files()` API. This simpler API returns a Traversable object, a pathlib-like handle to the contents of a package. This approach side-steps the issues described above. In particular, `is_resource` no longer has a purpose. Path traversal is handled naturally through `Traversable.join_path`. Resources in subdirectories are now supported. Parent objects ('..') are allowed, but only incidentally and allowed in the same way as they're allowed for any Python code. That is, one can call `files('multiprocessing').joinpath('../../../../etc/passwd')`, but that provides no advantage over `pathlib.Path('/etc/passwd')`. I believe this new API addresses the concerns presented. Please open a new issue (here or in github.com/python/importlib_resources) if there are further concerns needing attention.
History
Date User Action Args
2022-04-11 14:59:11 admin set github: 80309
2021-05-27 01:56:33 jaraco set status: open -> closednosy: + jaracomessages: + resolution: fixedstage: resolved
2019-11-12 03๐Ÿ”ž08 indygreg set messages: +
2019-11-12 03:12:45 indygreg set messages: +
2019-11-11 18:24:24 barry set messages: +
2019-11-11 14:13:26 Mekk set nosy: + Mekkmessages: +
2019-02-27 22:58:33 brett.cannon set nosy: + brett.cannon
2019-02-26 23:45:24 barry set messages: +
2019-02-26 21:05:00 indygreg create