Issue 1180193: broken pyc files (original) (raw)

Created on 2005-04-10 13:10 by arigo, last changed 2022-04-11 14:56 by admin. This issue is now closed.

Files
File name Uploaded Description Edit
update_co_filename.diff zseil,2007-04-24 10:01 patch against trunk revision 54933
update_co_filename.diff exarkun,2009-01-05 17:11
update_co_filename.diff exarkun,2009-01-05 17:49
Messages (13)
msg24985 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2005-04-10 13:10
In a number of situations, the .pyc files can become "corrupted" in a subtle way: the co_filename attribute of the code objects it contains become wrong. This can occur if we move or rename directories, or if we access the same set of files from two different locations (e.g. over NFS). This corruption doesn't prevent the .pyc files from working, but the interpreter looses the reference to the source file. It causes trouble in tracebacks, in the inspect module, etc. A simple fix would be to use the following logic when importing a .py file: if there is a corresponding .pyc file, in addition to checking the timestamp, check the co_filename attribute of the loaded object. If it doesn't point to the original .py file, discard the code object and ignore the .pyc file. Alternatively, we could force all co_filenames to point to the .py file when loading the .pyc file. I'll write a patch for whichever alternative seems better.
msg24986 - (view) Author: Martin v. Löwis (loewis) * (Python committer) Date: 2007-03-28 12:01
I fail to see the corruption. It is quite desirable and normal to only ship pyc files - that the file name they refer to is actually present is not a requirement at all.
msg24987 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-03-28 13:40
What I called "corruption" is the situation where both the .py and the .pyc files are present, but the filename stored in the .pyc co_filenames is no longer the valid absolute path of the corresponding .py file, for any reason (renaming, NFS views, etc.). This situation causes the tracebacks and the inspect module to fail to locate the .py file, which I consider a bug.
msg24988 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2007-04-03 07:16
This problem is reported quite often in the tracker, although it shows up in different places: http://www.python.org/sf/1666807 http://www.python.org/sf/1051638 I closed those bugs as duplicates of this one. The logging package is also affected: http://www.python.org/sf/1669498 http://www.python.org/sf/1633605 http://www.python.org/sf/1616422
msg24989 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-04-03 11:31
If you ask me, I think that when the importing system finds both a .py and a .pyc for a module, then it should ignore all co_filename and replace them with the real path of the .py file. I can't see any point of not doing so. There are many other quirks caused by .pyc files accidentally remaining around, but we cannot fix them all as long as the .pyc files are at the same time a cache for performance reason and a redistributable program format (e.g. if "rm x.py" or "svn up" deletes a .py file, then the module is still importable via the .pyc left behind, a great way to oversee the fact that imports elsewhere in the project need to be updated).
msg24990 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2007-04-03 13:46
Wouldn't your first solution be simpler? Changing all co_filenames would require either changing various marhal.c functions, or traversing the code object returned by import.c/read_compiled_module(). Discarding the compiled code when the file names don't match would be simpler and only require minor changes in import.c/load_source_module().
msg24991 - (view) Author: Ziga Seilnacht (zseil) * (Python committer) Date: 2007-04-24 10:01
Here is a patch that implements arigo's last suggestion. File Added: update_co_filename.diff
msg24992 - (view) Author: Armin Rigo (arigo) * (Python committer) Date: 2007-05-02 19:42
It's an obscure detail, but I think that the .pyc file should not be rewritten again after we fix the co_filenames. Fixing the co_filenames is a very very cheap operation, and I can imagine cases where the same .py files are accessed from what appears to be two different paths, e.g. over NFS - this would cause .pyc files to be rewritten all the time, which is particularly bad if we have the example of NFS in mind. Not to mention that two python processes trying to write *different* data to the same .pyc file at the same time are going to create a mess, ending in a segfault the next time the broken .pyc is loaded. It's overall a mess, so let's play it safe.
msg61587 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2008-01-23 16:04
If code objects grew a __module__ attribute (which functions already have), wouldn't it be just a matter of falling back on sys.modules[my_code_object.__module__].__file__ when my_code_object.co_filename points to a non-existent file?
msg79167 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-01-05 17:11
This is causing problems for me as well. The attached patch no longer applies cleanly to trunk. I've attached an updated version which addresses the conflicts. The new behavior fixes the issues I have with the current behavior. It'd be great to have it applied. > If code objects grew a __module__ attribute (which functions already > have), wouldn't it be just a matter of falling back on > sys.modules[my_code_object.__module__].__file__ when > my_code_object.co_filename points to a non-existent file? It'd be nice if it wasn't necessary to check to see if co_filename referred to an existing file. Can we have a solution which creates one definitive, correct way to determine the source file?
msg79173 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-05 17:41
As Armin said, I think it's safer and simpler not to rewrite the pyc file when the filenames have been changed. (if you thing changing the filenames can have a significant performance impact, you may want to benchmark it)
msg79175 - (view) Author: Jean-Paul Calderone (exarkun) * (Python committer) Date: 2009-01-05 17:49
New version of the patch which doesn't rewrite pyc files attached.
msg79279 - (view) Author: Antoine Pitrou (pitrou) * (Python committer) Date: 2009-01-06 19:17
Committed to trunk and py3k, and backported to 2.6 and 3.0. Thanks!
History
Date User Action Args
2022-04-11 14:56:10 admin set github: 41838
2009-01-06 19:17:28 pitrou set status: open -> closedresolution: accepted -> fixedmessages: +
2009-01-06 17:59:48 pitrou set resolution: acceptedversions: + Python 3.1, Python 2.7, - Python 2.6
2009-01-05 17:49:16 exarkun set files: + update_co_filename.diffmessages: +
2009-01-05 17:41:56 pitrou set messages: +
2009-01-05 17:11:10 exarkun set files: + update_co_filename.diffnosy: + exarkunmessages: + keywords: + patch
2009-01-05 16:55:35 amaury.forgeotdarc link issue4845 superseder
2008-01-23 16:04:20 pitrou set nosy: + pitroumessages: +
2008-01-05 20:17:24 christian.heimes set type: enhancementversions: + Python 2.6
2005-04-10 13:10:52 arigo create