[Python-Dev] Internal namespace proposal (original) (raw)

David Hopwood david.nospam.hopwood at blueyonder.co.uk
Thu Jul 27 04:19:40 CEST 2006


[This message is cc:d to the e-lang list, but please take any replies to python-dev at python.org.]

Brett Cannon wrote:

On 7/19/06, Ka-Ping Yee <cap-talk at zesty.ca> wrote:

OMG!!! Is all i can say at the moment. Very excited.

This is very encouraging. Thanks to ?!ng, Michael Chermside and others for making the case for capabilities.

Also realize that I am using object-capabilities to secure the interpreter, not objects. That will be enough of a challenge to do for now. Who knows, maybe some day Python can support object-capabilities at the object level, but for now I am just trying to isolate and protect individual interpreters in the same process.

I think that the alternative of providing object-granularity protection domains straight away is more practical than you suggest, and I'd like to at least make sure that this possibility has been thoroughly explored.

Below is a first-cut proposal for enforcing namespace restrictions, i.e. support for non-public attributes and methods, on Python objects and modules. It is not sufficient by itself to provide capability security, but it could be the basis for doing that at object granularity.

(Note that this proposal would only affect sandboxed/restricted interpreters, at least for the time being. The encapsulation it provides is also useful for reasons other than security, and I think there is nothing about it that would be unreasonable to apply to an unrestricted interpreter, but for compatibility, that would have to be enabled by a future option or similar.)

Internal namespace proposal

Existing Python code tends to use a convention where the names of attributes and methods intended only for internal use are prefixed by '_'. This convention comes from PEP 8 <http://www.python.org/dev/peps/pep-0008/>, which says:

In addition, the following special forms using leading or trailing

underscores are recognized (these can generally be combined with any case

convention):

- _single_leading_underscore: weak "internal use" indicator. E.g. "from M

import *" does not import objects whose name starts with an underscore.

- single_trailing_underscore_: used by convention to avoid conflicts with

Python keyword, e.g.

Tkinter.Toplevel(master, class_='ClassName')

- __double_leading_underscore: when naming a class attribute, invokes name

mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).

- double_leading_and_trailing_underscore: "magic" objects or

attributes that live in user-controlled namespaces. E.g. init,

import or file. Never invent such names; only use them

as documented.

I propose that the "internal" status of names beginning with _ (including those beginning with __) should be enforced in restricted interpreters. This is better than introducing a new annotation, because it will do the right thing for existing code that follows this part of PEP 8.

More precisely:

A restricted interpreter refuses access to any object attribute or method with a name beginning with '_' (by throwing a new exception type 'InternalAccessException'), unless the access is from a method and its static target is that method's first argument variable.

Also, a restricted interpreter refuses access to any module-global variable or module-global function with a name beginning with '_' (by throwing 'InternalAccessException'), unless the access is statically from the same module.

(A method's first argument is usually called 'self', but that's just a convention. By "static target", I mean that to access an internal attribute _foo in a method with first argument 'self', you must write "self._foo"; attempting to access "x._foo" will fail even if 'x' happens to be the same object as 'self'. This allows such accesses to be reported at compile-time, rather than only at run-time.)

I am using the term "internal" rather than "private" or "protected", because these semantics are not the same as either "private" or "protected" in C++ or Java. In Python with this change, an object can only access its own internal methods and attributes. In C++ and Java, an object can access private and protected members of other objects of the same class. The rationale for this difference is explained below.

The use of _single vs __double underscores encodes a useful distinction that would not change. Ignoring the point in the previous paragraph, a _single underscore is similar to "protected" in languages like C++ and Java, while a __double underscore is similar to "private". This is purely a consequence of the name mangling: if a class X and its subclass Y both name an attribute __foo, then we will end up with two attributes _X__foo and _Y__foo in instances of Y, which is the desired behaviour for private attributes. In the case of an attribute called _foo, OTOH, there can be only one such attribute per object, which is the desired behaviour for protected attributes. The name mangling also ensures that an object will not accidentally access a private attribute inherited from a superclass.

However, in the same example, an instance of Y can still deliberately access the copy of the attribute inherited from X by specifying _X__foo. There is no security problem here, because Y cannot do anything as a result that it could not have done by copying X's code, rather than inheriting from it. Notice that this is only true because we restrict an object to only accessing its own internal attributes and methods; if we followed C++'s semantics where an object can access protected members of any superclass, this would break security.

(Java solves this problem by applying a more complicated access rule for protected members, which I considered to be unintuitive. More details on request.)

dict is an internal attribute. This means that an object can only directly reflect on itself. I know that there are other means of reflection (e.g. using the 'inspect' module); blocking these or making them safe is a separate issue.

If desired, it would be safe to add a 'publicdict' attribute to each object, or a 'publicdict(object)' built-in. This would return a read-only dict, probably created lazily if needed, giving access only to public (non-internal) attributes and methods.

init is an internal method. This is as it should be, because it should not be possible to call init on an existing object; only to have init implicitly called when a new object is constructed.

repr and str are internal under these rules, and probably shouldn't be. Existing classes may expose private state in the strings returned by repr or str, but in principle, there is nothing unsafe about being able to convert the public state of an object to a string. OTOH, this functionality is usually accessed via the built-ins 'repr' and 'str', which we could perhaps allow to access 'repr' and 'str' as a special case.

-- David Hopwood <david.nospam.hopwood at blueyonder.co.uk>



More information about the Python-Dev mailing list