Issue 30906: os.path.join misjoins paths (original) (raw)

Created on 2017-07-11 20:43 by mesheb82, last changed 2022-04-11 14:58 by admin. This issue is now closed.

Messages (12)
msg298181 - (view) Author: mesheb82 (mesheb82) Date: 2017-07-11 20:43
I'm trying to join paths on Windows with data taken from a user generated file. In doing so, I came across: >>> os.path.join('dir1', '/dir2') '/dir2' I'd expect an error or: 'dir1\\dir2' This has been tested and is consistent with Python 2.7.13 and 3.6.1.
msg298184 - (view) Author: Paul Moore (paul.moore) * (Python committer) Date: 2017-07-11 21:59
This is as documented - see https://docs.python.org/3.6/library/os.path.html#os.path.join (" If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component"). In this case, "/dir2" is an absolute path as it starts with a slash.
msg298186 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-07-11 22:27
This differs slightly from WinAPI PathCchCombineEx, which fails the example case as an invalid parameter. If the second path is rooted but without a drive or UNC share, then if the first path is relative it must be at least drive relative (e.g. "C:dir1"). Should Python's documented behavior change in 3.7 to match PathCchCombineEx in this case?
msg298198 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-12 06:38
Arguably it isn't even against the documented behavior, since a component starting with a slash an absolute path. I'd be in favor of preserving the drive when encountering a component starting with a separator. Not sure of the value in changing the behavior in older versions - apparently I've never encountered this before, and I feel like I should have.
msg298199 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-12 06:39
> since a component starting with a slash *is not* an absolute path.
msg298202 - (view) Author: Serhiy Storchaka (serhiy.storchaka) * (Python committer) Date: 2017-07-12 08:01
I afraid that failing on os.path.join('', '/path') or os.path.join('.', '/path') can break a lot of code. > I'd be in favor of preserving the drive when encountering a component starting with a separator. Already done (). >>> import ntpath >>> ntpath.join('c:foo', '/bar') 'c:/bar'
msg298238 - (view) Author: Eric V. Smith (eric.smith) * (Python committer) Date: 2017-07-12 17:22
We absolutely cannot change this to give an error if the second or subsequent parameters is absolute. I have code that reads user-named config files. If the path is relative, it's relative to a config directory, but it's allowed to be absolute: config_filename = os.path.join(config_dir, user_supplied_name)
msg298241 - (view) Author: mesheb82 (mesheb82) Date: 2017-07-12 18:38
Testing on Python 2.7.12 on through Windows 10 bash (so Linux), I find an inconsistency with the documented statement "If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component" >>> os.path.join('dir', 'C:/dir2') 'dir/C:/dir2' To me, the is very similar to the original problem (Windows 10 Python 2.7.13 and 3.6.1): >>> os.path.join('dir1', '/dir2') '/dir2' I would argue that on Windows, '/dir2' is not an absolute path. Testing from cmd and powershell on Windows 10 from `C:` >>> cd /dir2 C:/dir2 I do agree though that is a terrible idea to not respect the second parameter in: os.path.join(absolute_path_or_local_path, absolute_path) I think the question is what is considered an absolute path and does that change depending on the OS?
msg298308 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-13 19:04
There's absolutely no risk of ignoring later parameters or raising a ValueError here, so please don't let those cloud the discussion. The behaviour of Python 3.6 seems to be correct for every case except: >>> os.path.join("C:\\dir1", "D:dir2") D:dir2 (expected D:\dir1\dir2) However, that's an incredible edge case that virtually nobody relies on and I'm sure nobody expects. The other combinations of relative and absolute paths seem to be correct. I'm not convinced that changing the behaviour of Python 2.7 significantly improves either the maintainability or security of that release, so unless someone wants to argue about that I'm closing this as not a bug. (And if someone *does* want to argue about it, don't bother arguing with me :) )
msg298313 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-07-13 20:16
The difference compared to PathCchCombineEx stems from the following snippet in ntpath.join: if p_path and p_path[0] in seps: # Second path is absolute if p_drive or not result_drive: result_drive = p_drive result_path = p_path The case that PathCchCombineEx fails is that the second path is rooted [*] but neither UNC nor drive-absolute (i.e. p_drive is empty) and the first path is relative but neither rooted nor drive-relative. When the second path is rooted but not absolute, PathCchCombineEx requires the joined path to use the root of the first path as determined by PathCchStripToRoot. The latter fails for a completely relative path (i.e. no root or drive), as it rightly should. The question is whether the join operation itself should fail because the first path has no root. Python makes a different choice, but it isn't necessarily wrong. [*] Path Type | Example ==================================== Relative file --------------- -------------------- Rooted \file Drive-Relative C:file ==================================== Drive-Absolute C:\file UNC \\server\share\file ==================================== Extended \\?\C:\file Device
msg298330 - (view) Author: Eryk Sun (eryksun) * (Python triager) Date: 2017-07-14 03:27
BTW, I don't see why one would expect join(r"C:\dir1", "D:dir2") to return r"D:\dir1\dir2" instead of "D:dir2". Python's result is in agreement with Windows PathCchCombineEx. Paths on different drives should not be combined. The first path has to be ignored: elif p_drive and p_drive != result_drive: if p_drive.lower() != result_drive.lower(): # Different drives => ignore the first path entirely result_drive = p_drive result_path = p_path continue
msg298341 - (view) Author: Steve Dower (steve.dower) * (Python committer) Date: 2017-07-14 05:58
Fair point. I was thinking of how chdir handles it, but of course that's relative to the cwd on D, so the rest of the path on C is ignored. D:dir2 is correct.
History
Date User Action Args
2022-04-11 14:58:48 admin set github: 75089
2017-07-14 05:58:23 steve.dower set messages: +
2017-07-14 03:27:41 eryksun set messages: +
2017-07-13 20:16:14 eryksun set messages: +
2017-07-13 19:04:06 steve.dower set status: open -> closedresolution: not a bugmessages: + stage: test needed -> resolved
2017-07-12 18:38:27 mesheb82 set messages: +
2017-07-12 17:22:02 eric.smith set nosy: + eric.smithmessages: +
2017-07-12 08:01:38 serhiy.storchaka set nosy: + serhiy.storchakamessages: +
2017-07-12 06:39:19 steve.dower set messages: +
2017-07-12 06:38:28 steve.dower set versions: + Python 3.7, - Python 2.7, Python 3.6resolution: not a bug -> (no value)messages: + type: behaviorstage: resolved -> test needed
2017-07-11 22:27:04 eryksun set status: closed -> opennosy: + eryksunmessages: +
2017-07-11 21:59:18 paul.moore set status: open -> closedresolution: not a bugmessages: + stage: resolved
2017-07-11 20:43:54 mesheb82 create