(original) (raw)

On Wed, Feb 8, 2012 at 20:26, Nick Coghlan <ncoghlan@gmail.com> wrote:

On Thu, Feb 9, 2012 at 2:09 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
> I guess my point was: why is there a function call in that case? The
> "import" statement could look up sys.modules directly.
> Or the built-in \_\_import\_\_ could still be written in C, and only defer
> to importlib when the module isn't found in sys.modules.
> Practicality beats purity.

I quite like the idea of having builtin \_\_import\_\_ be a \*very\* thin
veneer around importlib that just does the "is this in sys.modules
already so we can just return it from there?" checks and delegates
other more complex cases to Python code in importlib.

Poking around in importlib.\_\_import\_\_ \[1\] (as well as
importlib.\_gcd\_import), I'm thinking what we may want to do is break
up the logic a bit so that there are multiple helper functions that a
C version can call back into so that we can optimise certain simple
code paths to not call back into Python at all, and others to only do
so selectively.

Step 1: separate out the "fromlist" processing from \_\_import\_\_ into a
separate helper function

def \_process\_fromlist(module, fromlist):
\# Perform any required imports as per existing code:
\# http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/\_bootstrap.py#l987

Fine by me.

Step 2: separate out the relative import resolution from \_gcd\_import
into a separate helper function.

def \_resolve\_relative\_name(name, package, level):
assert hasattr(name, 'rpartition')
assert hasattr(package, 'rpartition')
assert level > 0
name = # Recalculate as per the existing code:
\# http://hg.python.org/cpython/file/aba513307f78/Lib/importlib/\_bootstrap.py#l889
return name

I was actually already thinking of exposing this as importlib.resolve\_name() so breaking it out makes sense.

I also think it might be possible to expose a sort of importlib.find\_module() that does nothing more than find the loader for a module (if available).

Step 3: Implement builtin \_\_import\_\_ in C (pseudo-code below):

def \_\_import\_\_(name, globals={}, locals={}, fromlist=\[\], level=0):
if level > 0:
name = importlib.\_resolve\_relative\_import(name)

try:
module = sys.modules\[name\]
except KeyError:

# Not cached yet, need to invoke the full import machinery
\# We already resolved any relative imports though, so
\# treat it as an absolute import
return importlib.\_\_import\_\_(name, globals, locals, fromlist, 0)
\# Got a hit in the cache, see if there's any more work to do
if not fromlist:
\# Duplicate relevant importlib.\_\_import\_\_ logic as C code
\# to find the right module to return from sys.modules
elif hasattr(module, "\_\_path\_\_"):
importlib.\_process\_fromlist(module, fromlist)
return module

This would then be similar to the way main.c already works when it
interacts with runpy - simple cases are handled directly in C, more
complex cases get handed over to the Python module.

I suspect that if people want the case where you load from bytecode is fast then this will have to expand beyond this to include C functions and/or classes which can be used as accelerators; while this accelerates the common case of sys.modules, this (probably) won't make Antoine happy enough for importing a small module from bytecode (importing large modules like decimal are already fast enough).