[Python-Dev] Rewrite of import in Python source (sans docs) is complete (original) (raw)
Calvin Spealman ironfroggy at gmail.com
Sun Jan 14 22:23:09 CET 2007
- Previous message: [Python-Dev] Rewrite of import in Python source (sans docs) is complete
- Next message: [Python-Dev] Rewrite of import in Python source (sans docs) is complete
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I am really looking into get into hacking on CPython and I'm keenly interested in your security work (my top reason for hoping i can make PyCon. keeping fingers crossed!), so if you need help with this to focus on other things, I'd be delighted to try my hand at the task. Do you have some docs up anywhere of what directionyou hope this to go in from here?
On 1/5/07, Brett Cannon <brett at python.org> wrote:
Finally, after a few months worth of work, I have finally gotten far enough in my import rewrite that I am willing to stick my neck out and say it is semantically complete! You can find it in the sandbox under importinpy.
So, details of this implementation. I implemented PEP 302 importers/loaders for built-in, frozen, extension, .py, and .pyc files along with rewriting the steps import goes through to do an import. I also developed an API for .py/.pyc file handling so that there is a generic filesystem importer/loader and a separate handler for .py/.pyc files. This should allow for (relatively) easy selective overriding of just how .py/.pyc files are stored ( e.g., introducing a database backend) or how variants on .py/.pyc files are handled (e.g., Quixote's .ptl format). This code has extensive tests and so I am fairly confident that it does what is expected of an import rewrite. There are actually more lines in the test file than the implementation. There is also a mock implementation used for testing. Was interesting doing this in such a test-driven, XP style of only coding what I needed. I have run this code through the entire regression test suite and that is where you find out subtle differences between this implementation and the built-in import (you can see for yourself with the regrtest.sh shell script). First testpkg will fail because currently the new import adds a loader attribute on all modules (that might change for security reasons) and testpkg is an old, stdout comparing test. Second, testrunpy fails because I have not implemented getcode on the filesystem loader which is required by runpy. Both are shallow issues that can be dealt with. Third, and the hardest difference to deal with, is that you will get some warnings that print out that you normally don't see. This is because warnings.warn and its stacklevel argument don't have the effect people are used to when importing a deprecated module. Before you could set stacklevel to 2 and it would look like it came from the import statement. But now, with import written in Python and thus on the call stack compared to being in C and thus not showing up, two levels back is still in the import code. I really don't know how this should be dealt with short of saying that the rule of thumb with 2 stack levels back for a warning does not work when done at the import level. It is not blazing fast at the moment. Some things, like the built-in and frozen importers/loaders could be rewritten in C without huge issue. I am also sure I have made some stupid design decisions at various points in the code. But there is benchmarking code in the sandbox called importbench and it showed up a 10x speed slowdown on a Linux box I was using in mid to late December when doing a fresh import of certain types (I don't remember exactly which kind off the top of my head). Because of this current slowness I don't know if people want to rush into trying to make this the default import implementation quite yet or if this is not too big of a thing since the common case of pulling out of sys.modules is not that much slower. I know I am currently not planning on devoting the time to bootstrap it in as I have my security work to finish first along with other Python stuff that seems more pressing to me. And since (I think) I don't need to bootstrap it in order to finish my security work I can't justify spending work time on it. But I can rearrange priorities if people really want to pursue this (especially if I can get some help with it). As with the module's name, it is currently named 'importer', but that is bad since it conflicts with the idea of importers from PEP 302. I was thinking importlib, but I wanted to wait and see what other people thought. Don't know if you guys are okay with me checking this in without having it vetted by the community first like we prefer all new modules to do. I have not done the LaTeX docs yet. I think that is all of the details that I can think of. I am still working towards implementing the security needed so that an application that embeds Python can execute arbitrary code securely. Giving a talk at PyCon on the topic for anyone interested. Special thanks needs to go to Paul Moore who I talked to through most of the design of the code. Nick Coghlan also provided some handy feedback. And Neal Norwitz for bugging about wanting something like this done. Plus thanks to everyone who has shown support. -Brett
Python-Dev mailing list Python-Dev at python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/ironfroggy%40gmail.com
-- Read my blog! I depend on your acceptance of my opinion! I am interesting! http://ironfroggy-code.blogspot.com/
- Previous message: [Python-Dev] Rewrite of import in Python source (sans docs) is complete
- Next message: [Python-Dev] Rewrite of import in Python source (sans docs) is complete
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]