[Python-Dev] Should there be a way or API for retrieving from a code object a loader method and package file where the code comes from? (original) (raw)

R. Bernstein rocky at panix.com
Tue Dec 23 17:36:48 CET 2008


Paul Moore writes:

2008/12/23 <rocky at gnu.org>:

What is wanted is a uniform way get and describe a file location from a code object that takes into account the file might be a member of an archive.

But a code object may not have come from a file.

Right. That's why I mentioned for example "eval" and "exec" that you cite below. So remove the "file" in what is cited above. Replace with: "a unform way to get information (not necessarily just the source text) about the location/origin of code from a code object.

Ignoring the interactive prompt (not because it's unimportant, just because people have a tendency to assume it's the only special case :-)) you need to consider code loaded via a PEP302 importer from (say) a sqlite database, or code created using compile(), or possibly even more esoteric means.

So I'm not sure your request is clearly specified.

Is the above any more clear?

Are there even guidelines for saying what string goes into a code object's co_filename? Clearly it should be related to the source code that generated the code, and there are various conventions that seem to exist when the code comes from an "eval" or an "exec".

I'm not aware of guidelines - the documentation for compile() says "The filename argument should give the file from which the code was read; pass some recognizable value if it wasn't read from a file ('' is commonly used)" which is pretty non-commital.

But empirically it seems as though there's some variation. It could be an absolute file or a file with no root directory specified. (But is it possible to have things like "." and ".."?). And in the case of a member of a package what happens? Should it be just the member without the package? Or should it include the package name like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/tracer.py ?

Or be unspecified? If left unspecified as I gather it is now, it makes it more important to have some sort of common routine to be able to pick out the archive part in a filesystem from the member name inside the archive.

I think you need to be clear on why you want to know this information. Once it's clear what you're trying to achieve, it will be easier to say what the options are.

This is what I wrote originally (slightly modified):

A use case here I am thinking of here is in a stack trace or a debugger, or a tool which wants to show in great detail, information from a code object obtained possibly via a frame object.

I find it kind of sucky to see in a traceback: "" as opposed to the text (or prefix of the text) of the actual string that was passed. Or something that has been referred to as a "pseudo-file" like /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg/foo/bar.py when it is really member foo/bar.py of zipped egg /usr/lib/python2.5/site-packages/tracer-0.1.0-py2.5.egg.

(As a separate issue, it seems that zipimporter file locations inside setuptools may have a problem.)

Inside a debugger or an IDE, it is conceivable a person might want loader, and module information, and if the code is part of an archive file, then member information. (If part of an eval string then, the eval string.)

It sounds like you're trying to propose a stronger convention, to be enforced in the future.

Well, I wasn't sure if there was one. But I gather from what you write, there isn't. :-)

Yes, I would suggest a stronger convention. Or a more up-front statement that none is desired/forthcoming.

(At least, your suggestion of producing stack traces implies that you want stack trace code not to have to deal with the current situation). When PEP 302 was being developed, we were looking at similar issues. That's why I pointed you at get_source() - it was the best we could do with all the various conflicting requirements, and the fact that it's optional is because we had to cater for cases where there simply wasn't a meaningful answer. Frankly, backward compatibility requirements kill a lot of the options here.

Maybe what you want is a pair of linked conventions:

- co_filename (or a replacement) returns a (notionally opaque, but

in practice a filename for file-based cases) token representing "the file or other object the code came from"

This would be nice.

-  xxx.get_source_code(token) is a function (I don't know where,

xxx is a placeholder for some "suitable" module) which, given such a token, returns the source, or None if there's no viable concept of "the source".

There always is a viable concept of a source. It's whatever was done to get the code. For example, if it was via an eval then the source was the eval function and a string, same for exec. If it's via database access, well that then and some summary info about what's known about that.

Or maybe you want a (possibly separate) attribute of a code object, which holds a string containing a human-readable (but quite possibly not machine-parseable) value representing the "place the code came from" - co_filename is essentially this at the moment, and maybe your complaint is merely that you don't find its contents sufficiently human-readable in the case of the zipimport module (in which case you might want to search some of the archives for the discussions on the constraints imposed on zipimport, because objects on sys.path must be strings and cannot be arbitrary objects...)

There are two problems. One is displaying location information in an unambiguous way -- the pseudo-file above is ambiguous and so is since there's no guarentee that OS's make to not name a file that. The second problem is programmatically getting information such as a debugger or an IDE might do so that the information can be conveyed back to a user who might want to inspect surrounding source code or modules.

I'm sorry if this is a little rambling. I can appreciate that there's some sort of issue that you see here, but I don't yet see any practical way of changing things that would help. And as always, there's backward compatibility to consider - existing code isn't going to change, so new code has to be prepared to handle that.

I hope this is of some help,

Yes, thanks. At least I now have a clearer idea of the state of where things stand.

Paul.



More information about the Python-Dev mailing list