[Python-ideas] PEP for executing a module in a package containing relative imports (original) (raw)

Brett Cannon brett at python.org
Fri Apr 20 05:38:42 CEST 2007


Some of you might remember a discussion that took place on this list about not being able to execute a script contained in a package that used relative imports (read the PEP if you don't quite get what I am talking about). The PEP below proposes a solution (along with a counter-solution).

Let me know what you think. I especially want to hear which proposal people prefer; the one in the PEP or the one in the Open Issues section. Plus I wouldn't mind suggestions on a title for this PEP. =)


PEP: XXX Title: XXX Version: Revision:52916Revision: 52916 Revision:52916 Last-Modified: Date:2006−12−0411:59:42−0800(Mon,04Dec2006)Date: 2006-12-04 11:59:42 -0800 (Mon, 04 Dec 2006) Date:2006120411:59:420800(Mon,04Dec2006) Author: Brett Cannon Status: Draft Type: Standards Track Content-Type: text/x-rst Created: XXX-Apr-2007

Abstract

Because of how name resolution works for relative imports in a world where PEP 328 is implemented, the ability to execute modules within a package ceases being possible. This failing stems from the fact that the module being executed as the "main" module replaces its __name__ attribute with "__main__" instead of leaving it as the actual, absolute name of the module. This breaks import's ability to resolve relative imports from the main module into absolute names.

In order to resolve this issue, this PEP proposes to change how a module is delineated as the module that is being executed as the main module. By leaving the __name__ attribute in a module alone and setting a module attribute named __main__ to a true value for the main module (and thus false in all others), proper relative name resolution can occur while still having a clear way for a module to know if it is being executed as the main module.

The Problem

With the introduction of PEP 328, relative imports became dependent on the __name__ attribute of the module performing the import. This is because the use of dots in a relative import are used to strip away parts of the calling module's name to calcuate where in the package hierarchy a relative import should fall (prior to PEP 328 relative imports could fail and would fall back on absolute imports which had a chance of succeeding).

For instance, consider the import from .. import spam made from the bacon.ham.beans module (bacon.ham.beans is not a package itself, i.e., does not define __path__). Name resolution of the relative import takes the caller's name (bacon.ham.beans), splits on dots, and then slices off the last n parts based on the level (which is 2). In this example both ham and beans are dropped and spam is joined with what is left (bacon). This leads to the proper import of the module bacon.spam.

This reliance on the __name__ attribute of a module when handling realtive imports becomes an issue with executing a script within a package. Because the executing script is set to '__main__', import cannot resolve any relative imports. This leads to an ImportError if you try to execute a script in a package that uses any relative import.

For example, assume we have a package named bacon with an __init__.py file containing::

from . import spam

Also create a module named spam within the bacon package (it can be an empty file). Now if you try to execute the bacon package (either through python bacon/__init__.py or python -m bacon) you will get an ImportError about trying to do a relative import from within a non-package. Obviously the import is valid, but because of the setting of __name__ to '__main__' import thinks that bacon/__init__.py is not in a package since no dots exist in __name__. To see how the algorithm works, see importlib.Import._resolve_name() in the sandbox [#importlib]_.

Currently a work-around is to remove all relative imports in the module being executed and make them absolute. This is unfortunate, though, as one should not be required to use a specific type of resource in order to make a module in a package be able to be executed.

The Solution

The solution to the problem is to not change the value of __name__ in modules. But there still needs to be a way to let executing code know it is being executed as a script. This is handled with a new module attribute named __main__.

When a module is being executed as a script, __main__ will be set to a true value. For all other modules, __main__ will be set to a false value. This changes the current idiom of::

if name == 'main': ...

to::

if main: ...

The current idiom is not as obvious and could cause confusion for new programmers. The proposed idiom, though, does not require explaining why __name__ is set as it is.

With the proposed solution the convenience of finding out what module is being executed by examining sys.modules['__main__'] is lost. To make up for this, the sys module will gain the main attribute. It will contain a string of the name of the module that is considered the executing module.

A competing solution is discussed in Open Issues_.

Transition Plan

Using this solution will not work directly in Python 2.6. Code is dependent upon the semantics of having __name__ set to '__main__'. There is also the issue of pre-existing global variables in a module named __main__. To deal with these issues, a two-step solution is needed.

First, a Py3K deprecation warning will be raised during AST generation when a global variable named __main__ is defined. This will help with the detection of code that would reset the value of __main__ for a module. Without adding a warning when a global variable is injected into a module, though, it is not fool-proof. But this solution should cover the vast majority of variable rebinding problems.

Second, 2to3 [#2to3]_ will gain a rule to transform the current if __name__ == '__main__': ... idiom to the new one. While it will not help with code that checks __name__ outside of the idiom, that specific line of code makes up a large proporation of code that every looks for __name__ set to '__main__'.

Open Issues

A counter-proposal to introducing the __main__ attribute on modules was to introduce a built-in with the same name. The value of the built-in would be the name of the module being executed (just like the proposed sys.main). This would lead to a new idiom of::

if name == main: ...

The perk of this idiom over the one proposed earlier is that the general semantics does not differ greatly from the current idiom.

The drawback is that the syntactic difference is subtle; the dropping of quotes around "main". Some believe that for existing Python programmers bugs will be introduced where the quotation marks will be put on by accident. But one could argue that the bug would be discovered quickly through testing as it is a very shallow bug.

The other pro of this proposal over the earlier one is the alleviation of requiring import code to have to set the value of __main__. By making it a built-in variable import does not have to care about __main__ as executing the code itself will pick up the built-in __main__ itself. This simplies the implementation of the proposal as it only requires setting a built-in instead of changing import to set an attribute on every module that has exactly one module have a different value (much like the current implementation has to do to set __name__ in one module to '__main__').

References

.. [#2to3] 2to3 tool (http://svn.python.org/view/sandbox/trunk/2to3/) [ViewVC]

.. [#importlib] importlib (http://svn.python.org/view/sandbox/trunk/import_in_py/importlib.py?view=markup) [ViewVC]

Copyright

This document has been placed in the public domain.

.. Local Variables: mode: indented-text indent-tabs-mode: nil sentence-end-double-space: t fill-column: 70 coding: utf-8 End:



More information about the Python-ideas mailing list