Issue 20443: code. co_filename should always be an absolute path (original) (raw)

Created on 2014-01-30 02:15 by yselivanov, last changed 2022-04-11 14:57 by admin. This issue is now closed.

Messages (32)

msg209699 - (view)

Author: Yury Selivanov (yselivanov) * (Python committer)

Date: 2014-01-30 02:15

There are many issues on tracker related to the relative paths in co_filename. Most of them are about introspection functions in inspect module--which are usually broken after 'os.chdir'.

Test case: create a file t.py:

def foo(): pass print(foo.code.co_filename)

Execute it with '$ python t.py' -> it will print 't.py'.

Ideally, when executing a python file, interpreter should expand all relative paths for file and code.co_filename attributes.

msg345501 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-06-13 11:43

PR 14053 is a different approach than PR 13527: compute the absolute path to the script filename in PyConfig_Read() just after parsing the command line.

msg345521 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-06-13 14:30

One of the side effect of my PR 14053 is that tracebacks are more verbose: the filename is now absolute rather than relative.

Currently:

$ python3 x.py Traceback (most recent call last): File "x.py", line 4, in func() File "x.py", line 2, in func bug NameError: name 'bug' is not defined

With my PR:

$ ./python x.py Traceback (most recent call last): File "/home/vstinner/prog/python/master/x.py", line 4, in func() File "/home/vstinner/prog/python/master/x.py", line 2, in func bug NameError: name 'bug' is not defined

msg345585 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-06-14 11:13

The site module tries to compute the absolute path of file and cached attributes of all modules in sys.modules:

def abs_paths(): """Set all module file and cached attributes to an absolute path""" for m in set(sys.modules.values()): if (getattr(getattr(m, 'loader', None), 'module', None) not in ('_frozen_importlib', '_frozen_importlib_external')): continue # don't mess with a PEP 302-supplied file try: m.file = os.path.abspath(m.file) except (AttributeError, OSError, TypeError): pass try: m.cached = os.path.abspath(m.cached) except (AttributeError, OSError, TypeError): pass

The path attribute isn't updated.

Another approach would be to hack importlib to compute the absolute path before loading a module, rather than trying to fix it afterwards.

One pratical problem: posixpath and ntpath are not available when importlib is setup, these modules are implemented in pure Python and so must be imported.

Maybe importlib could use a naive implementation of os.path.abspath(). Maybe the C function _Py_abspath() that I implemented in PR 14053 should be exposed somehow to importlib as a private function using a builtin module like _imp, so it can be used directly.

msg346207 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2019-06-21 11:56

Perhaps os._abspath_for_import? If importlib finds it, then it can handle making early paths absolute itself, otherwise it will defer doing that until it has the full external import system bootstrapped. (importlib does try to make all paths absolute as it loads modules - it's just that some early modules aren't loaded that way)

msg346262 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-06-21 22:35

Effect of my PR 14053 using script.py:

import sys print(f"{file=}") print(f"{sys.argv=}") print(f"{sys.path[0]=}")

Before:

$ ./python script.py file=script.py sys.argv=['script.py'] sys.path[0]=/home/vstinner/prog/python/master

With the change:

$ ./python script.py file=/home/vstinner/prog/python/master/script.py sys.argv=['/home/vstinner/prog/python/master/script.py'] sys.path[0]=/home/vstinner/prog/python/master

msg346524 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-06-25 13:03

New changeset 3939c321c90283b49eddde762656e4b1940e7150 by Victor Stinner in branch 'master': bpo-20443: _PyConfig_Read() gets the absolute path of run_filename (GH-14053) https://github.com/python/cpython/commit/3939c321c90283b49eddde762656e4b1940e7150

msg346789 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-06-28 00:18

Example of case where a module path is still relative:

import sys import os modname = 'relpath' filename = modname + '.py' sys.path.insert(0, os.curdir) with open(filename, "w") as fp: print("import sys", file=fp) print("mod = sys.modules[name]", file=fp) print("print(f'{file=}')", file=fp) print("print(f'{mod.file=}')", file=fp) print("print(f'{mod.cached=}')", file=fp) import(modname) os.unlink(filename)

Output:

file='./relpath.py' mod.file='./relpath.py' mod.cached='./pycache/relpath.cpython-39.pyc'

file and mod.file are relative paths: not absolute paths.

msg346800 - (view)

Author: Karthikeyan Singaravelan (xtreak) * (Python committer)

Date: 2019-06-28 04:46

This change seems to produce a new warning on my Mac.

./Modules/getpath.c:764:23: warning: incompatible pointer types passing 'char [1025]' to parameter of type 'const wchar_t *' (aka 'const int *') [-Wincompatible-pointer-types] _Py_isabs(execpath)) ^~~~~~~~ ./Include/fileutils.h:158:42: note: passing argument to parameter 'path' here PyAPI_FUNC(int) _Py_isabs(const wchar_t *path); ^ 1 warning generated.

msg346823 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-06-28 14:49

New changeset 3029035ef34c9bae0c8d965290cd9b273c8de1ea by Victor Stinner in branch 'master': bpo-20443: Fix calculate_program_full_path() warning (GH-14446) https://github.com/python/cpython/commit/3029035ef34c9bae0c8d965290cd9b273c8de1ea

msg354925 - (view)

Author: Michel Desmoulin (Michel Desmoulin)

Date: 2019-10-18 22:29

Isn't that change breaking compat ?

I'm assuming many scripts in the wild sys.argv[0] and play with it assuming it's relative.

I would expect such a change to be behind a flag or a future import.

msg354999 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2019-10-20 13:09

I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

That's straightforward to selectively revert, though: remove the "config->run_filename != NULL" clause at https://github.com/python/cpython/blob/24dc2f8c56697f9ee51a4887cf0814b6600c1815/Python/initconfig.c#L2201 and add a comment with the reason for the deliberate omission.

That way the OS-provided argv entry will continue to be passed through to sys.argv[0], while everywhere else will get the absolute path.

msg355055 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-10-21 11:26

What is the use case for having a relative sys.argv[0]?

Isn't that change breaking compat ?

Right. It has been made on purpose. The rationale can be found in the first message of this issue.

msg355085 - (view)

Author: Michel Desmoulin (Michel Desmoulin)

Date: 2019-10-21 17:04

The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did.

3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

Do we really want to make a habit of breaking a few things at every minor release ?

It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

I'm advocating that any breaking change,, even the tiniest, should be behind a future import or a flag, with warnings, and then switched on for good on Python 4.0.

Python is a language, not a simple library. Stability is key. Like Linus Torvald use to say: "do not break user space"

The fact this change is easy to fix and migrate is not a proper reason to ignore this important software rule, IMO. So many people and so much code rely on Python that the tinest glitch may have massive impact down the line.

Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

Le 21/10/2019 à 13:26, STINNER Victor a écrit :

STINNER Victor <vstinner@python.org> added the comment:

What is the use case for having a relative sys.argv[0]?

Isn't that change breaking compat ?

Right. It has been made on purpose. The rationale can be found in the first message of this issue.



Python tracker <report@bugs.python.org> <https://bugs.python.org/issue20443>


msg355088 - (view)

Author: Ammar Askar (ammar2) * (Python committer)

Date: 2019-10-21 17:44

I did a quick search to see what code would break from sys.argv[0] going relative

intext:"sys.argv[0]" ext:py site:github.com https://www.google.com/search?q=intext:"sys.argv%5B0%5D"+ext:py+site:github.com

and while most uses of it are to print usage messages. There is potentially code relying on it being a relative path that will break from this change:

I think Michel and Nick are correct here and the status-quo should be maintained for sys.argv[0]. Michel should not have to enumerate the use cases for a relative sys.argv[0].

msg355104 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-10-21 21:29

Nick Coghlan:

I think that's a valid point regarding sys.argv[0] - it's the import system and code introspection that wants(/needs) absolute paths, whereas sys.argv[0] gets used in situations (e.g. usage messages) where we should retain whatever the OS gave us, since that best reflects what the user entered.

Even before this cases, there were different cases where Python does modify sys.argv:

At the C level, the Py_GetArgcArgv() function provides the "original" argc and argv: before they are modified.

See open issues:

msg355105 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-10-21 21:44

I understand that the discussion is about my commit 3939c321c90283b49eddde762656e4b1940e7150 which changed multiple things:

Python now gets the absolute path of the script filename specified on the command line (ex: python3 script.py): the __file__ attribute of the __main__ module, sys.argv[0] and sys.path[0] become an absolute path, rather than a relative path.

I understand that the complain is not about sys.modules['main'].file nor sys.path[0], but only sys.argv[0].

The sys.argv documentation says:

"The list of command line arguments passed to a Python script. argv[0] is the script name (it is operating system dependent whether this is a full pathname or not)."

https://docs.python.org/dev/library/sys.html#sys.argv

I understand that before my change, sys.argv[0] was sometimes relative, and sometimes absolute. With my change, sys.argv[0] should now always be asolute.

There are cases where an absolute path is preferred. There are cases where a relative path is preferred. I'm trying to understand the scope of the issue.

https://github.com/google/python-gflags/blob/4f06c3d0d6cbe9b1fb90ee9fb1c082b3bf9285f6/gflags/flagvalues.py#L868-L869

I don't know this code. It seems like sys.argv[0] is used to lookup in a dict, and that dict keys are supposed to be "module names". I don't understand in which cases sys.argv[0] could be a module name, since sys.argv[0] seems like a relative or absolute path.

https://github.com/HcashOrg/hcOmniEngine/blob/f1acc2ba3640a8e1c651ddc90a86d569d00704fe/msc-cli.py#L12

The code starts by sys.argv.pop(0): remove sys.argv[0].

https://github.com/vmtk/vmtk/blob/64675f598e31bc6be3d4fba903fb59bf1394f492/PypeS/pyperun.py#L35

if sys.argv[0].startswith('pyperun'): ...

I understand that my change broke such test. Did this test work before my change when specifying the absolute path to run the script? Like path/to/pyperun.py rather than pyperun.py? Such code looks fragile: using os.path.basename() would be safer here.

https://github.com/apache/lucene-solr/blob/cfa49401671b5f9958d46c04120df8c7e3f358be/dev-tools/scripts/svnBranchToGit.py#L783

home = os.path.expanduser("~") svnWorkingCopiesPath = os.path.join(home, "svnwork") gitReposPath = os.path.join(home, "gitrepos") if sys.argv[0].startswith(svnWorkingCopiesPath) or sys.argv[0].startswith(gitReposPath): ...

On my Linux, os.path.expanduser("~") is an absolute path and so svnWorkingCopiesPath and gitReposPath should be absolute paths, no?

My change made this code more reliable, rather than breaking it, no?

msg355106 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-10-21 21:54

Michel Desmoulin: "The relevance of the use case isn't the problem. Even if people had been using it wrong for all this time, the update is still going to break their code if they did use it this way. And if it was possible, of course many people did."

You have to understand that the change has be done to fix some use cases and that Python 3.8.0 has been related with the change.

If we change again the behavior, we will get Python < 3.8.0, Python 3.8.0 and Python > 3.8.0 behaving differently. I would prefer to fully understand all use cases before starting to discuss changing the behavior again.

3.7 already broke a few packages by choosing to not do the True/False dance again and make async/await keywords. I took a lot of people by surprise, not because of the issue itself, but because of the change of philosophy compared of what Python had been in the past.

You have to understand that it was known since the start and it was a deliberate choice after balancing advantages and disadvantages.

https://www.python.org/dev/peps/pep-0492/#transition-plan

I'm only aware of a single function in a single project, Twisted. Are you aware of more projects broken by this change? Did it take long to fix them? Balance it with the ability to use async and await in cases which were not possible in Python 3.6 because they were not real keywords. The Python 3.6 grammar was fully of hack to emulate keywords.

Do we really want to make a habit of breaking a few things at every minor release ?

Usually, incompatible changes are discussed to balance if it's worth it or not.

There is no plan to break all user code just for your pleasure.

For the specific case of argv[0], the discussion occurred here on associated pull requests, and nobody mentioned any risk of backward compatibility.

FYI I'm working on this topic and I just proposed a the PEP 606 "Python Compatibility Version" to propose one possible solution to make incompatible changes less painful to users.

It's fair to expect, as a user, that any 3.x will be aiming to be forward compatible. This was the case for Python 2.X and was part of the popularity of the language: a reliable tech.

My PEP 606 tries to explain incompatible changes are needed: https://www.python.org/dev/peps/pep-0606/#the-need-to-evolve-frequently

Not to mention that after a decade of healing from the P2/3 transition, the community needs a rest.

I understand that but such comment doesn't help the discussion. See my previous comments and please try to help to fill the missing information to help us to take better decisions.

Since Python 3.8.0 has been release, a revert can make things even worse that the current state.

msg355218 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2019-10-23 13:35

This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

Unless it was backported in a Python 3.8 PR that didn't link back to this issue or the original Python 3.9 PR?

msg355219 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-10-23 14:14

This change isn't in Python 3.8 though, it's only in Python 3.9 (the PR merge date is after the beta 1 branch date).

Oh, you're right. Sorry, I thought that I made the change before 3.8 branch was created. In that case, we can do whatever we want :-) (We are free to revert.)

msg355278 - (view)

Author: Gregory P. Smith (gregory.p.smith) * (Python committer)

Date: 2019-10-24 05:46

Please revert. An absolute path changes semantics in many real world situations (altering symlink traversals, etc). People expect the current sys.argv[0] behavior which is "similar to C argv" and matches what was passed on the interpreter command line.

A getcwd() call doesn't even have to succeed. A single file python program should still be able to run in that environment rather than fail to start.

To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename. Allow it to be '' if getcwd failed at source load time. Code could check if co_cwd exists and join it with the co_filename. Also let co_cwd remain empty when there is no valid co_filename for the code.

Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state. That call is best avoided as modifying globals is impolite.

msg355334 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-10-24 14:34

Please revert.

Ok ok, I will revert my change, but only on sys.argv[0].

A getcwd() call doesn't even have to succeed. A single file python program should still be able to run in that environment rather than fail to start.

The current implementation leaves a path unchanged if it fails to make it absolute:

config_run_filename_abspath:

wchar_t *abs_filename;
if (_Py_abspath(config->run_filename, &abs_filename) < 0) {
    /* failed to get the absolute path of the command line filename:
       ignore the error, keep the relative path */
    return _PyStatus_OK();
}
if (abs_filename == NULL) {
    return _PyStatus_NO_MEMORY();
}

PyMem_RawFree(config->run_filename);
config->run_filename = abs_filename;

with:

/* Get an absolute path. On error (ex: fail to get the current directory), return -1. On memory allocation failure, set *abspath_p to NULL and return 0. On success, return a newly allocated to *abspath_p to and return 0. The string must be freed by PyMem_RawFree(). */ int _Py_abspath(const wchar_t *path, wchar_t **abspath_p);

msg355335 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-10-24 14:38

To help address the original report filing issue, we could add a notion of .co_cwd to code objects for use in resolving co_filename. Allow it to be '' if getcwd failed at source load time. Code could check if co_cwd exists and join it with the co_filename. Also let co_cwd remain empty when there is no valid co_filename for the code.

Do you mean that co_filename attribute of code objects must remain relative?

Preaching: Code that calls os.chdir() is always a potential problem as it alters process global state. That call is best avoided as modifying globals is impolite.

I thought that calling os.chdir("/") is a good practice when launching a program as a daemon. And thne call os.fork() twice to detach the process from its parent and run it in background.

I commonly use os.chdir() in my programs for various reasons. A classical use case is to mimick a shell script doing something like (cd $DIRECTORY; command): temporary change the current directory.

cwd parameter of subprocess.Popen helps to avoid changing the currently directory, but sometimes subprocess is not used at all.

Maybe avoiding os.chdir() is a good practice, but it's hard to prevent users to call it. I don't see how os.chdir() is bad by design. I'm not sure that it's really useful to debate this neither :-)

msg355339 - (view)

Author: Gregory P. Smith (gregory.p.smith) * (Python committer)

Date: 2019-10-24 16:51

I think sys.argv[0] is the important one as program logic often tends to use that as an input.

I'm honestly unsure of if this will be as much of a problem for .co_filename or sys.path[0].

msg355340 - (view)

Author: Gregory P. Smith (gregory.p.smith) * (Python committer)

Date: 2019-10-24 16:53

(read that as: feel free to keep the behavior for co_filename and path[0] and lets see what happens :)

msg358094 - (view)

Author: Łukasz Langa (lukasz.langa) * (Python committer)

Date: 2019-12-09 14:16

Anything new here? 3.9.0a2 is due next week.

msg358114 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-12-09 16:16

Anything new here? 3.9.0a2 is due next week.

I had it in my TODO list but I lost the bpo number. Thanks for the reminder :-) I wrote PR 17534 that I plan to merge as soon as the CI tests pass.

msg358116 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-12-09 16:34

New changeset a1a99b4bb7cbe2dbc55a1d92c3c509b4466d3c3b by Victor Stinner in branch 'master': bpo-20443: No longer make sys.argv[0] absolute for script (GH-17534) https://github.com/python/cpython/commit/a1a99b4bb7cbe2dbc55a1d92c3c509b4466d3c3b

msg358128 - (view)

Author: Batuhan Taskaya (BTaskaya) * (Python committer)

Date: 2019-12-09 18:57

I am not sure that if co_filename can break backwards compability or not (because until CodeType.replace; it was usual to break code object construction, people who are using code objects may take this too) but yes PR 17534 was necessary.

My initial patch (PR 13527) proposed a future directive to do this (and poorly implemented the concept) which can be helpful (yet it needs a PEP).

msg358661 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2019-12-19 08:59

I reverted my change, so I remove "release blocker" priority.

msg359098 - (view)

Author: Alyssa Coghlan (ncoghlan) * (Python committer)

Date: 2019-12-31 04:56

With the sys.argv[0] change reverted, I think this overall issue is fixed now - code objects will get absolute paths, while sys.argv[0] will continue to reflect how main was identified.

msg359902 - (view)

Author: STINNER Victor (vstinner) * (Python committer)

Date: 2020-01-13 13:57

New changeset c1ee6e5e9b87c9812c6745c1dd6c1788a984f9f9 by Victor Stinner in branch 'master': bpo-20443: Update What's New In Python 3.9 (GH-17986) https://github.com/python/cpython/commit/c1ee6e5e9b87c9812c6745c1dd6c1788a984f9f9

History

Date

User

Action

Args

2022-04-11 14:57:57

admin

set

github: 64642

2020-01-13 13:57:20

vstinner

set

messages: +

2020-01-13 13:44:04

vstinner

set

pull_requests: + <pull%5Frequest17390>

2019-12-31 04:56:03

ncoghlan

set

status: open -> closed
priority: normal
messages: +

resolution: fixed
stage: patch review -> resolved

2019-12-19 08:59:14

vstinner

set

priority: release blocker -> (no value)

messages: +

2019-12-09 18:57:38

BTaskaya

set

nosy: + BTaskaya
messages: +

2019-12-09 16:34:06

vstinner

set

messages: +

2019-12-09 16:16:44

vstinner

set

messages: +

2019-12-09 16:08:09

vstinner

set

pull_requests: + <pull%5Frequest17012>

2019-12-09 14:16:10

lukasz.langa

set

nosy: + lukasz.langa
messages: +

2019-10-24 16:54:14

yselivanov

set

nosy: - yselivanov

2019-10-24 16:53:02

gregory.p.smith

set

messages: +

2019-10-24 16:51:34

gregory.p.smith

set

messages: +

2019-10-24 14:38:54

vstinner

set

messages: +

2019-10-24 14:34:07

vstinner

set

messages: +

2019-10-24 05:46:47

gregory.p.smith

set

priority: normal -> release blocker
nosy: + gregory.p.smith
messages: +

2019-10-23 14:14:55

vstinner

set

messages: +

2019-10-23 13:35:16

ncoghlan

set

messages: +

2019-10-21 21:54:32

vstinner

set

messages: +

2019-10-21 21:44:31

vstinner

set

messages: +

2019-10-21 21:29:31

vstinner

set

messages: +

2019-10-21 17:44:21

ammar2

set

nosy: + ammar2
messages: +

2019-10-21 17:04:40

Michel Desmoulin

set

messages: +

2019-10-21 11:26:11

vstinner

set

messages: +

2019-10-20 13:09:00

ncoghlan

set

messages: +
versions: + Python 3.9, - Python 3.5

2019-10-18 22:29:43

Michel Desmoulin

set

nosy: + Michel Desmoulin
messages: +

2019-06-28 14:49:41

vstinner

set

messages: +

2019-06-28 14:17:51

vstinner

set

pull_requests: + <pull%5Frequest14262>

2019-06-28 04:46:45

xtreak

set

nosy: + xtreak
messages: +

2019-06-28 00🔞23

vstinner

set

messages: +

2019-06-25 13:03:08

vstinner

set

messages: +

2019-06-21 22:35:17

vstinner

set

messages: +

2019-06-21 11:56:15

ncoghlan

set

messages: +

2019-06-14 11:13:33

vstinner

set

messages: +

2019-06-13 14:30:45

vstinner

set

messages: +

2019-06-13 11:43:55

vstinner

set

nosy: + vstinner
messages: +

2019-06-13 11:42:55

vstinner

set

pull_requests: + <pull%5Frequest13915>

2019-05-23 19:20:53

BTaskaya

set

keywords: + patch
stage: needs patch -> patch review
pull_requests: + <pull%5Frequest13441>

2014-01-30 02:15:56

yselivanov

set

nosy: + ncoghlan

2014-01-30 02:15:15

yselivanov

create