[Python-Dev] PEP 338 issue finalisation (was Re: 2.5 PEP) (original) (raw)

Guido van Rossum guido at python.org
Thu Feb 16 16:07:41 CET 2006


On 2/16/06, Nick Coghlan <ncoghlan at gmail.com> wrote:

Guido van Rossum wrote: > Do you have unit tests for everything? I believe I fixed a bug in the > code that reads a bytecode file (it wasn't skipping the timestamp).

[Hey, I thought I sent that just to you. Is python-dev really interested in this?]

I haven't worked the filesystem based tests into the unit tests yet, and even the manual tests I was using managed to leave out compiled bytecode files (as you noticed). I'll fix that.

Given I do my testing on Linux, I probably still would have forgotten the 'rb' mode definitions on the relevant calls to open() though. . .

But running the unit tests on Windows would have revealed the problem.

> +++ pep-0338.txt (working copy) > - The optional argument initglobals may be used to pre-populate > + The optional argument initglobals may be a dictionary used to pre-populate > the globals dictionary before the code is executed. The supplied > dictionary will not be modified.

I just realised that anything that's a legal argument to "dict.update" will work. I'll fix the function description in the PEP (and the docs patch as well).

I'm not sure that's a good idea -- you'll never be able to switch to a different implementation then.

> --- runpy.py Wed Feb 15 15:56:07 2006 > def getdata(self, pathname): > ! # XXX Unfortunately PEP 302 assumes text data :-( > ! return open(pathname).read()

Hmm. The PEP itself requests that a string be returned from getdata(), but doesn't require that the file be opened in text mode. Perhaps the PEP 302 emulation should use binary mode here? Otherwise there could be strange data corruption bugs on Windows.

But PEP 302 shows as its only example reading from a file with a .txt extension. Adding spurious \r characters is also data corruption. We should probably post to python-dev a request for clarification of PEP 302, but in the mean time I vote for text mode.

> --- 337,349 ---- > > # This helper is needed as both the PEP 302 emulation and the > # main file execution functions want to read compiled files > + # XXX marshal can also raise EOFError; perhaps that should be > + # turned into ValueError? Some callers expect ValueError. > def readcompiledfile(compiledfile): > magic = compiledfile.read(4) > if magic != imp.getmagic(): > raise ValueError("File not compiled for this Python version") > + compiledfile.read(4) # Throw away timestamp > return marshal.load(compiledfile)

I'm happy to convert EOFError to ValueError here if you'd prefer (using the string representation of the EOFError as the message in the ValueError). Or did you mean changing the behaviour in marshal itself?

No -- the alternative is to catch EOFError in _read_compiled_file()'s caller, but that seems worse. You should check marshal.c if it can raise any other errors (perhaps OverflowError?).

Also, perhaps it makes more sense to return None instead of raising ValueError? Since you're always catching it? (Or are you?)

> --- 392,407 ---- > loader = getloader(modname) > if loader is None: > raise ImportError("No module named " + modname) > + # XXX getcode() is an optional loader feature. Is that okay? > code = loader.getcode(modname)

If the loader doesn't provide access to the code object or the source code, then runpy can't really do anything useful with that module (e.g. if its a C builtin module). Given that PEP 302 states that if you provide getsource() you should also provide getcode(), this check should be sufficient to let runpy.runmodule get to everything it can handle.

OK. But a loader could return None from get_code() -- do you check for that? (I don't have the source handy here.)

A case could be made for converting the attribute error to an ImportError, I guess. . .

I'm generally not keen on that; leave it.

> filename = getfilename(loader, modname) > if runname is None: > runname = modname > + # else: > + # XXX Should we also set sys.modules[runname] = sys.modules[modname]? > + # I know of code that does "import main". It should probably > + # get the substitute main rather than the original main, > + # if runname != modname > return runmodulecode(code, initglobals, runname, > filename, loader, asscript)

Hmm, good point. How about a different solution, though: in runmodulecode, I could create a new module object and put it temporarily in sys.modules, and then remove it when done (restoring the original module if necessary).

That might work too.

What happens when you execute "foo.py" as main and then (perhaps indirectly) something does "import foo"? Does a second copy of foo.py get loaded by the regular loader?

That would mean any module with code that looks up "sys.modules[name]" would still work when run via runpy.runmodule or runpy.runfile.

Yeah, except if they do that, they're likely to also assign to that. Well, maybe that would just work, too...

I also realised that sys.argv[0] should be restored to its original value, too.

Yup.

I'd then change the "asscript" flag to "altersys" and have it affect both of the above operations (and grab the import lock to avoid other import or runmodulecode operations seeing the altered version of sys.modules).

Makes sense.

I do wonder if runpy.py isn't getting a bit over-engineered -- it seems a lot of the functionality isn't actually necessary to implement -m foo.bar, and the usefulness in other circumstances is as yet unproven. What do you think of taking a dose of YAGNI here? (Especially since I notice that most of the public APIs are very thin layers over exec or execfile -- people can just use those directly.)

> --- 439,457 ---- > > Returns the resulting top level namespace dictionary > First tries to run as a compiled file, then as a source file > + XXX That's not the same algorithm as used by regular import; > + if the timestamp in the compiled file is not equal to the > + source file's mtime, the compiled file is ignored > + (unless there is no source file -- then the timestamp > + is ignored)

They're doing different things though - the import process uses that algorithm to decide which filename to use (.pyo, .pyc or .py). This code in runfile is trying to decide whether the supplied filename points to a compiled file or a source file without a tight coupling to the specific file extension used (e.g. so it works for Unix Python scripts that rely on the shebang line to identify which interpreter to use to run them). I'll add a comment to that effect.

Ah, good point. So you never go from foo.py to foo.pyc, right?

Another problem that occurred to me is that the module isn't thread safe at the moment. The PEP 302 emulation isn't protected by the import lock, and the changes to sys.argv in runmodulecode will be visible across threads (and may clobber each other or the original if multiple threads invoke the function).

Another reason to consider cutting it down to only what's needed by -m; -m doesn't need thread-safety (I think).

On that front, I'll change getpathloader to acquire and release the import lock, and the same for runmodulecode when "altersys" is set to True.

OK, just be very, very careful. The import lock is not a regular mutex and if you don't release it you're stuck forever. Just use try/finally...

-- --Guido van Rossum (home page: http://www.python.org/~guido/)



More information about the Python-Dev mailing list