[Python-Dev] requirements for moving import over to importlib? (original) (raw)

Brett Cannon brett at python.org
Tue Feb 7 21:07:24 CET 2012


I'm going to start this off with the caveat that hg.python.org/sandbox/bcannon#bootstrap_importlib is not completely at feature parity, but getting there shouldn't be hard. There is a FAILING file that has a list of the tests that are not passing because importlib bootstrapping and a comment as to why (I think) they are failing. But no switch would ever happen until the test suite passes.

Anyway, to start this conversation I'm going to open with why I think removing most of the C code in Python/import.c and replacing it with importlib/_bootstrap.py is a positive thing.

One is maintainability. Antoine mentioned how if change occurs everyone is going to have to be able to fix code in importlib, and that's the point! I don't know about the rest of you but I find Python code easier to work with than C code (and if you don't you might be subscribed to the wrong mailing list =). I would assume the ability to make changes or to fix bugs will be a lot easier with importlib than import.c. So maintainability should be easier when it comes to imports.

Two is APIs. PEP 302 introduced this idea of an API for objects that can perform imports so that people can control it, enhance it, introspect it, etc. But as it stands right now, import.c implements none of PEP 302 for any built-in import mechanism. This mostly stems from positive thing #1 I just mentioned. but since I was able to do this code from scratch I was able to design for (and extend) PEP 302 compliance in order to make sure the entire import system was exposed cleanly. This means it is much easier now to write a custom importer for quirky syntax, a different storage mechanism, etc.

Third is multi-VM support. IronPython, Jython, and PyPy have all said they would love importlib to become the default import implementation so that all VMs have the same implementation. Some people have even said they will use importlib regardless of what CPython does simply to ease their coding burden, but obviously that still leads to the possibility of subtle semantic differences that would go away if all VMs used the same implementation. So switching would lead to one less possible semantic difference between the various VMs.

So, that is the positives. What are the negatives? Performance, of course.

Now I'm going to be upfront and say I really did not want to have this performance conversation now as I have done NO profiling or analysis of the algorithms used in importlib in order to tune performance (e.g. the function that handles case-sensitivity, which is on the critical path for importing source code, has a platform check which could go away if I instead had platform-specific versions of the function that were assigned to a global variable at startup). I also know that people have a bad habit of latching on to micro-benchmark numbers, especially for something like import which involves startup or can easily be measured. I mean I wrote importlib.test.benchmark to help measure performance changes in any algorithmic changes I might make, but it isn't a real-world benchmark like what Unladen Swallow gave us (e.g. the two start-up benchmarks that use real-world apps -- hg and bzr -- aren't available on Python 3 so only normal_startup and nosite_startup can be used ATM).

IOW I really do not look forward to someone saying "importlib is so much slower at importing a module containing pass" when (a) that never happens, and (b) most programs do not spend their time importing but instead doing interesting work.

For instance, right now importlib does python -c "import decimal" (which, BTW, is the largest module in the stdlib) 25% slower on my machine with a pydebug build (a non-debug build would probably be in my favor as I have more Python objects being used in importlib and thus more sanity checks). But if you do something (very) slightly more interesting like python -m calendar where is a slight amount of work then importlib is currently only 16% slower. So it all depends on how we measure (as usual).

So, if there is going to be some baseline performance target I need to hit to make people happy I would prefer to know what that (real-world) benchmark is and what the performance target is going to be on a non-debug build. And if people are not worried about the performance then I'm happy with that as well. =) -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://mail.python.org/pipermail/python-dev/attachments/20120207/1b50b1cb/attachment.html>



More information about the Python-Dev mailing list