[Python-Dev] how important is setting co_filename for a module being imported to what file is set to? (original) (raw)

Brett Cannon brett at python.org
Mon Aug 31 04:43:48 CEST 2009


On Sun, Aug 30, 2009 at 19:34, Guido van Rossum<guido at python.org> wrote:

On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon<brett at python.org> wrote:

On Sun, Aug 30, 2009 at 17:24, Guido van Rossum<guido at python.org> wrote:

On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon<brett at python.org> wrote:

I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: testimport.

Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.testincorrectcodename. Turns out that import resets cofilename on a code object to file before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change cofilename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the cofilename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set cofilename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.import and importlib.import differ and just not worry about it? ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting cofilename to match file (i.e. the name under which the module is being imported) would be a nice service in this case. In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario. Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's "file" location. Blah. Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile...

Definitely, which is why I will do it, just not tonight as I am tired of compatibility fixing for now. =)

I will either add an argument to marshal.loads to specify an overriding file path or add an imp.exec that takes a file path argument to override the code object with. Remember, there are many code objects created from one pyc file. Adding it to marshal.load*() makes sense because then it's usable for other purposes too, and that attacks the issue from the root.

That was my thinking.

(in import.c it's done by updatecompiledmodule() right after readcompiledmodule(), which is a thin wrapper around marshal.load()) I'm not sure how imp.exec would make sure that introspection of the loaded code objects always gets the right thing.

Basically it would be imp.exec(module, code, path) and it would tweak the code object before execution based on introspecting what the module had set for file. But might as well add the support to marshal.

(I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-)

So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But testimport tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff. It's no fun -- see the layers of #ifdefs in openexclusive() in import.c. (Though I think you won't need to worry about VMS. :-) But it's somewhat important to get it right from a security POV. I would use os.open() and wrap an io.BufferedWriter around it.

I will have to see what of that is implemented in C or in Python. I have always tried to keep all pure Python code out of importlib for bootstrapping reasons in order to keep the possibility of using importlib as the implementation of import. But maybe I should not be worrying about that right at the moment and instead do what keeps the code simple.

-Brett



More information about the Python-Dev mailing list