msg24985 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2005-04-10 13:10 |
In a number of situations, the .pyc files can become "corrupted" in a subtle way: the co_filename attribute of the code objects it contains become wrong. This can occur if we move or rename directories, or if we access the same set of files from two different locations (e.g. over NFS). This corruption doesn't prevent the .pyc files from working, but the interpreter looses the reference to the source file. It causes trouble in tracebacks, in the inspect module, etc. A simple fix would be to use the following logic when importing a .py file: if there is a corresponding .pyc file, in addition to checking the timestamp, check the co_filename attribute of the loaded object. If it doesn't point to the original .py file, discard the code object and ignore the .pyc file. Alternatively, we could force all co_filenames to point to the .py file when loading the .pyc file. I'll write a patch for whichever alternative seems better. |
|
|
msg24986 - (view) |
Author: Martin v. Löwis (loewis) *  |
Date: 2007-03-28 12:01 |
I fail to see the corruption. It is quite desirable and normal to only ship pyc files - that the file name they refer to is actually present is not a requirement at all. |
|
|
msg24987 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2007-03-28 13:40 |
What I called "corruption" is the situation where both the .py and the .pyc files are present, but the filename stored in the .pyc co_filenames is no longer the valid absolute path of the corresponding .py file, for any reason (renaming, NFS views, etc.). This situation causes the tracebacks and the inspect module to fail to locate the .py file, which I consider a bug. |
|
|
msg24988 - (view) |
Author: Ziga Seilnacht (zseil) *  |
Date: 2007-04-03 07:16 |
This problem is reported quite often in the tracker, although it shows up in different places: http://www.python.org/sf/1666807 http://www.python.org/sf/1051638 I closed those bugs as duplicates of this one. The logging package is also affected: http://www.python.org/sf/1669498 http://www.python.org/sf/1633605 http://www.python.org/sf/1616422 |
|
|
msg24989 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2007-04-03 11:31 |
If you ask me, I think that when the importing system finds both a .py and a .pyc for a module, then it should ignore all co_filename and replace them with the real path of the .py file. I can't see any point of not doing so. There are many other quirks caused by .pyc files accidentally remaining around, but we cannot fix them all as long as the .pyc files are at the same time a cache for performance reason and a redistributable program format (e.g. if "rm x.py" or "svn up" deletes a .py file, then the module is still importable via the .pyc left behind, a great way to oversee the fact that imports elsewhere in the project need to be updated). |
|
|
msg24990 - (view) |
Author: Ziga Seilnacht (zseil) *  |
Date: 2007-04-03 13:46 |
Wouldn't your first solution be simpler? Changing all co_filenames would require either changing various marhal.c functions, or traversing the code object returned by import.c/read_compiled_module(). Discarding the compiled code when the file names don't match would be simpler and only require minor changes in import.c/load_source_module(). |
|
|
msg24991 - (view) |
Author: Ziga Seilnacht (zseil) *  |
Date: 2007-04-24 10:01 |
Here is a patch that implements arigo's last suggestion. File Added: update_co_filename.diff |
|
|
msg24992 - (view) |
Author: Armin Rigo (arigo) *  |
Date: 2007-05-02 19:42 |
It's an obscure detail, but I think that the .pyc file should not be rewritten again after we fix the co_filenames. Fixing the co_filenames is a very very cheap operation, and I can imagine cases where the same .py files are accessed from what appears to be two different paths, e.g. over NFS - this would cause .pyc files to be rewritten all the time, which is particularly bad if we have the example of NFS in mind. Not to mention that two python processes trying to write *different* data to the same .pyc file at the same time are going to create a mess, ending in a segfault the next time the broken .pyc is loaded. It's overall a mess, so let's play it safe. |
|
|
msg61587 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2008-01-23 16:04 |
If code objects grew a __module__ attribute (which functions already have), wouldn't it be just a matter of falling back on sys.modules[my_code_object.__module__].__file__ when my_code_object.co_filename points to a non-existent file? |
|
|
msg79167 - (view) |
Author: Jean-Paul Calderone (exarkun) *  |
Date: 2009-01-05 17:11 |
This is causing problems for me as well. The attached patch no longer applies cleanly to trunk. I've attached an updated version which addresses the conflicts. The new behavior fixes the issues I have with the current behavior. It'd be great to have it applied. > If code objects grew a __module__ attribute (which functions already > have), wouldn't it be just a matter of falling back on > sys.modules[my_code_object.__module__].__file__ when > my_code_object.co_filename points to a non-existent file? It'd be nice if it wasn't necessary to check to see if co_filename referred to an existing file. Can we have a solution which creates one definitive, correct way to determine the source file? |
|
|
msg79173 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-01-05 17:41 |
As Armin said, I think it's safer and simpler not to rewrite the pyc file when the filenames have been changed. (if you thing changing the filenames can have a significant performance impact, you may want to benchmark it) |
|
|
msg79175 - (view) |
Author: Jean-Paul Calderone (exarkun) *  |
Date: 2009-01-05 17:49 |
New version of the patch which doesn't rewrite pyc files attached. |
|
|
msg79279 - (view) |
Author: Antoine Pitrou (pitrou) *  |
Date: 2009-01-06 19:17 |
Committed to trunk and py3k, and backported to 2.6 and 3.0. Thanks! |
|
|