[Python-Dev] doc for new restricted execution design for Python (original) (raw)

Brett Cannon brett at python.org
Thu Jun 22 02:33:38 CEST 2006


I have been working on a design doc for restricted execution of Python as part of my dissertation for getting Python into Firefox to replace JavaScript on the web. Since this is dealing with security and messing that up can be costly, I am sending it to the list for any possible feedback.

I have already run the ideas past Neal, Guido, Jeremy, and Alex and everyone seemed to think the design was sound (thanks to them and Will for attending my meeting on it and giving me feedback that helped to shape this doc), so hopefully there are no major issues with the design itself. There are a couple of places (denoted with XXX) where there is an open issue still. Feedback on those would be great.

Anyway, here it is. I am going to be offline most of tomorrow so I probably won't get back to comments until Friday.

And just in case people are wondering, I plan on doing the implementation in the open on a branch within Python's repository so if this design works out it will end up in the core (as for when that would land, I don't know, but hopefully for 2.6).


  Restricted Execution for Python

#######################################

About This Document

This document is meant to lay out the general design for re-introducing a restriced execution model for Python. This document should provide one with enough information to understand the goals for restricted execution, what considerations were made for the design, and the actual design itself. Design decisions should be clear and explain not only why they were chosen but possible drawbacks from taking that approach.

Goal

A good restricted execution model provides enough protection to prevent malicious harm to come to the system, and no more. Barriers should be minimized so as to allow most code that does not do anything that would be regarded as harmful to run unmodified.

An important point to take into consideration when reading this document is to realize it is part of my (Brett Cannon's) Ph.D. dissertation. This means it is heavily geared toward the restricted execution when the interpreter is working with Python code embedded in a web page. While great strides have been taken to keep the design general enough so as to allow all previous uses of the 'rexec' module [#rexec]_ to be able to use the new design, it is not the focused goal. This means if a design decision must be made for the embedded use case compared to sandboxing Python code in a Python application, the former will win out.

Throughout this document, the term "resource" is to represent anything that deserves possible protection. This includes things that have a physical representation (e.g., memory) to things that are more abstract and specific to the interpreter (e.g., sys.path).

When referring to the state of an interpreter, it is either "trusted" or "untrusted". A trusted interpreter has no restrictions imposed upon any resource. An untrusted interpreter has at least one, possibly more, resource with a restriction placed upon it.

.. contents::

Use Cases /////////////////////////////

All use cases are based on how many untrusted or trusted interpreters are running in a single process.

When the Interpreter Is Embedded

Single Untrusted Interpreter

This use case is when an application embeds the interpreter and never has more than one interpreter running.

The main security issue to watch out for is not having default abilities be provided to the interpreter by accident. There must also be protection from leaking resources that the interpreter needs for general use underneath the covers into the untrusted interpreter.

Multiple Untrusted Interpreters

When multiple interpreters, all untrusted at varying levels, need to be running within a single application. This is the key use case that this proposed design is targetted for.

On top of the security issues from a single untrusted interpreter, there is one additional worry. Resources cannot end up being leaked into other interpreters where they are given escalated rights.

Stand-Alone Python

When someone has written a Python program that wants to execute Python code in an untrusted interpreter(s). This is the use case that 'rexec' attempted to fulfill.

The added security issues for this use case (on top of the ones for the other use cases) is preventing something from the trusted interpreter leaking into an untrusted interpreter and having elevated permissions. With the multiple untrusted interpreters one did not have to worry about preventing actions from occurring that are disallowed for all untrusted interpreters. With this use case you do have to worry about the binary distinction between trusted and untrusted interpreters running in the same process.

Resources to Protect /////////////////////////////

XXX Threading? XXX CPU?

Filesystem

The most obvious facet of a filesystem to protect is reading from it. One does not want what is stored in /etc/passwd to get out. And one also does not want writing to the disk unless explicitly allowed for basically the same reason; if someone can write /etc/passwd then they can set the password for the root account.

But one must also protect information about the filesystem. This includes both the filesystem layout and permissions on files. This means pathnames need to be properly hidden from an untrusted interpreter.

Physical Resources

Memory should be protected. It is a limited resource on the system that can have an impact on other running programs if it is exhausted. Being able to restrict the use of memory would help alleviate issues from denial-of-service (DoS) attacks.

Networking

Networking is somewhat like the filesystem in terms of wanting similar protections. You do not want to let untrusted code make tons of socket connections or accept them to do possibly nefarious things (e.g., acting as a zombie).

You also want to prevent finding out information about the network you are connected to. This includes doing DNS resolution since that allows one to find out what addresses your intranet has or what subnets you use.

Interpreter

One must make sure that the interpreter is not harmed in any way. There are several ways to possibly do this. One is generating hostile bytecode. Another is some buffer overflow. In general any ability to crash the interpreter is unacceptable.

There is also the issue of taking it over. If one is able to gain control of the overall process through the interpreter than heightened abilities could be gained.

Types of Security ///////////////////////////////////////

As with most things, there are multiple approaches one can take to tackle a problem. Security is no exception. In general there seem to be two approaches to protecting resources.

Resource Hiding

By never giving code a chance to access a resource, you prevent it from be (ab)used. This is the idea behind resource hiding. This can help minimize security checks by only checking if someone should be given a resource. By having possession of a resource be what determines if one should be allowed to use it you minimize the checks to only when a resource is handed out.

This can be viewed as a passive system for security. Once a resource has been given to code there are no more checks to make sure the security model is being violated.

The most common implementation of resource hiding is capabilities. In this type of system a resource's reference acts as a ticket that represents the right to use the resource. Once code has a reference it is considered to have full use of that resource it represents and no further security checks are performed.

To allow customizable restrictions one can pass references to wrappers of resources. This allows one to provide custom security to resources instead of requiring an all-or-nothing approach.

The problem with capabilities is that it requires a way to control access to references. In languages such as Java that use a capability-based security system, namespaces provide the protection. By having private attributes and compartmentalized namespaces, references cannot be reached without explicit permission.

For instance, Java has a ClassLoader class that one can call to have return a reference that is desired. The class does a security check to make sure the code should be allowed to access the resource, and then returns a reference as appropriate. And with private attributes in objects and packages not providing global attributes you can effectively hide references to prevent security breaches.

To use an analogy, imagine you are providing security for your home. With capabilities, security came from not having any way to know where your house is without being told where it was; a reference to its location. You might be able to ask a guard (e.g., Java's ClassLoader) for a map, but if they refuse there is no way for you to guess its location without being told. But once you knew where it was, you had complete use of the house.

And that complete access is an issue with a capability system. If someone played a little loose with a reference for a resource then you run the risk of it getting out. Once a reference leaves your hands it becomes difficult to revoke the right to use that resource. A capability system can be designed to do a check every time a reference is handed to a new object, but that can be difficult to do properly when grafting a new way to handle resources on to an existing system such as Python since the check is no longer at a point for requesting a reference but also at plain assignment time.

Resource Crippling

Another approach to security is to provide constant, proactive security checking of rights to use a resource. One can have a resource perform a security check every time someone tries to use a method on that resource. This pushes the security check to a lower level; from a reference level to the method level.

By performing the security check every time a resource's method is called the worry of a resource's reference leaking out to insecure code is alleviated since the resource cannot be used without authorizing it regardless of whether even having the reference was granted. This does add extra overhead, though, by having to do so many security checks.

FreeBSD's jail system provides a system similar to this. Various system calls allow for basic usage, but knowing of the system call is not enough to grant usage. Every call of a system call requires checking that the proper rights have been granted to the use in order to allow for the system call to perform its action.

An even better example in FreeBSD's jail system is its protection of sockets. One can only bind a single IP address to a jail. Any attempt to do more or perform uses with the one IP address that is granted is prevented. The check is performed at every call involving the one granted IP address.

Using our home analogy, everyone in the world can know where your home is. But to access any door in your home, you have to pass a security check. The overhead is higher and slows down your movement in your home, but not caring if perfect strangers know where your home is prevents the worry of your address leaking out to the world.

The 'rexec' Module ///////////////////////////////////////

The 'rexec' module [#rexec]_ was based on the design used by Safe-Tcl [#safe-tcl]_. The design was essentially a capability system. Safe-Tcl allowed you to launch a separate interpreter where its global functions were specified at creation time. This prevented one from having any abilities that were not explicitly provided.

For 'rexec', the Safe-Tcl model was tweaked to better match Python's situation. An RExec object represented a restricted environment. Imports were checked against a whitelist of modules. You could also restrict the type of modules to import based on whether they were Python source, bytecode, or C extensions. Built-ins were allowed except for a blacklist of built-ins to not provide. Several other protections were provided; see documentation for the complete list.

With an RExec object created, one could pass in strings of code to be executed and have the result returned. One could execute code based on whether stdin, stdout, and stderr were provided or not.

The ultimate undoing of the 'rexec' module was how access to objects that in normal Python require no direct action to reach was handled. Importing modules requires a direct action, and thus can be protected against directly in the import machinery. But for built-ins, they are accessible by default and require no direct action to access in normal Python; you just use their name since they are provided in all namespaces.

For instance, in a restricted interpreter, one only had to do del __builtins__ to gain access to the full set of built-ins. Another way is through using the gc module: gc.get_referrers(''.__class__.__bases__[0])[6]['file']. While both of these could be fixed (the former a bug in 'rexec' and the latter not allowing gc to be imported), they are examples of things that do not require proactive actions on the part of the programmer in normal Python to gain access to tends to leak out. An unfortunate side-effect of having all of that wonderful reflection in Python.

There is also the issue that 'rexec' was written in Python which provides its own problems.

Much has been learned since 'rexec' was written about how Python tends to be used and where security issues tend to appear. Essentially Python's dynamic nature does not lend itself very well to passive security measures since the reflection abilities in the language lend themselves to getting around non-proactive security checks.

The Proposed Approach ///////////////////////////////////////

In light of where 'rexec' succeeded and failed along with what is known about the two main types of security and how Python tends to operate, the following is a proposal on how to secure Python for restricted execution.

First, security will be provided at the C level. By taking advantage of the language barrier of accessing C code from Python without explicit allowance (i.e., ignoring ctypes [#ctypes]_), direct manipulation of the various security checks can be substantially reduced and controlled.

Second, all proactive actions that code can do to gain access to resources will be protected through resource hiding. By having to go through Python to get to something (e.g., modules), a security check can be put in place to deny access as appropriate (this also ties into the separation between interpreters, discussed below).

Third, any resource that is usually accessible by default will use resource crippling. Instead of worrying about hiding a resource that is available by default (e.g., 'file' type), security checks within the resource will prevent misuse. Crippling can also be used for resources where an object could be desired, but not at its full capacity (e.g., sockets).

Performance should not be too much of an issue for resource crippling. It's main use if for I/O types; files and sockets. Since operations on these types are I/O bound and not CPU bound, the overhead for doing the security check should be a wash overall.

Fourth, the restrictions separating multiple interpreters within a single process will be utilized. This helps prevent the leaking of objects into different interpreters with escalated privileges. Python source code modules are reloaded for each interpreter, preventing an object that does not have resource crippling from being leaked into another interpreter unless explicitly allowed. C extension modules are shared by not reloading them between interpreters, but this is considered in the security design.

Fifth, Python source code is always trusted. Damage to a system is considered to be done from either hostile bytecode or at the C level. Thus protecting the interpreter and extension modules is the great worry, not Python source code. Python bytecode files, on the other hand, are considered inherently unsafe and will never be imported directly.

Attempts to perform an action that is not allowed by the security policy will raise an XXX exception (or subclass thereof) as appropriate.

Implementation Details

XXX prefix/module name; Restrict, Secure, Sandbox? Different tense? XXX C APIs use abstract names (e.g., string, integer) since have not decided if Python objects or C types (e.g., PyStringObject vs. char *) will be used

Support for untrusted interpreters will be a compilation flag. This allows the more common case of people not caring about protections to not have a performance hindrance when not desired. And even when Python is compiled for untrusted interpreter restrictions, when the running interpreter is trusted, there will be no accidental triggers of protections. This means that developers should be liberal with the security protections without worrying about there being issues for interpreters that do not need/want the protection.

At the Python level, the restricted built-in will be set based on whether the interpreter is untrusted or not. This will be set for all interpreters, regardless of whether untrusted interpreter support was compiled in or not.

For setting what is to be protected, the XXX for the untrusted interpreter must be passed in. This makes the protection very explicit and helps make sure you set protections for the exact interpreter you mean to.

The functions for checking for permissions are actually macros that take in at least an error return value for the function calling the macro. This allows the macro to return for the caller if the check failed and cause the XXX exception to be propagated. This helps eliminate any coding errors from incorrectly checking a return value on a rights-checking function call. For the rare case where this functionality is disliked, just make the check in a utility function and check that function's return value (but this is strongly discouraged!).

API

Memory

Protection

An memory cap will be allowed.

Modification to pymalloc will be needed to properly keep track of the allocation and freeing of memory. Same goes for the macros around the system malloc/free system calls. This provides a platform-independent system for protection instead of relying on the operating system providing a service for capping memory usage of a process. Also allows the protection to be at the interpreter level instead of at the process level.

Why

Protecting excessive memory usage allows one to make sure that a DoS attack against the system's memory is prevented.

Possible Security Flaws

If code makes direct calls to malloc/free instead of using the proper PyMem_*() macros then the security check will be circumvented. But C code is supposed to use the proper macros or pymalloc and thus this issue is not with the security model but with code not following Python coding standards.

API

CPU

XXX Needed? Difficult to get right for all platforms. Would have to be very platform-specific.

Reading/Writing Files

Protection

The 'file' type will be resource crippled. The user may specify files or directories that are acceptable to be opened for reading/writing, or both.

All operations that either read, write, or provide info on a file will require a security check to make sure that it is allowed for the file that the 'file' object represents. This includes the 'file' type's constructor not raising an IOError stating a file does not exist but XXX instead so that information about the filesystem is not improperly provided.

The security check will be done for all 'file' objects regardless of where the 'file' object originated. This prevents issues if the 'file' type or an instance of it was accidentally made available to an untrusted interpreter.

Why

Allowing anyone to be able to arbitrarily read, write, or learn about the layout of your filesystem is extremely dangerous. It can lead to loss of data or data being exposed to people whom should not have access.

Possible Security Flaws

Assuming that the method-level checks are correct and control of what files/directories is not exposed, 'file' object protection is secure, even when a 'file' object is leaked from a trusted interpreter to an untrusted one.

API

Extension Module Importation

Protection

A whitelist of extension modules that may be imported must be provided. A default set is given for stdlib modules known to be safe.

A check in the import machinery will check that a specified module name is allowed based on the type of module (Python source, Python bytecode, or extension module). Python bytecode files are never directly imported because of the possibility of hostile bytecode being present. Python source is always trusted based on the assumption that all resource harm is eventually done at the C level, thus Python code directly cannot cause harm. Thus only C extension modules need to be checked against the whitelist.

The requested extension module name is checked in order to make sure that it is on the whitelist if it is a C extension module. If the name is not correct an XXX exception is raised. Otherwise the import is allowed.

Even if a Python source code module imports a C extension module in a trusted interpreter it is not a problem since the Python source code module is reloaded in the untrusted interpreter. When that Python source module is freshly imported the normal import check will be triggered to prevent the C extension module from becoming available to the untrusted interpreter.

For the 'os' module, a special restricted version will be used if the proper C extension module providing the correct abilities is not allowed. This will default to '/' as the path separator and provide as much reasonable abilities as possible from a pure Python module.

The 'sys' module is specially addressed in Changing the Behaviour of the Interpreter_.

By default, the whitelisted modules are:

Why

Because C code is considered unsafe, its use should be regulated. By using a whitelist it allows one to explicitly decide that a C extension module should be considered safe.

Possible Security Flaws

If a trusted C extension module imports an untrusted C extension module and make it an attribute of the trust module there will be a breach in security. Luckily this a rarity in extension modules.

There is also the issue of a C extension module calling the C API of an untrusted C extension module.

Lastly, if a trusted C extension module is loaded in a trusted interpreter and then loaded into an untrusted interpreter then there is no possible checks during module initialization for possible security issues for resources opened during initialization of the module if such checks exist in the init*() function.

All of these issues can be handled by never blindly whitelisting a C extension module. Added support for dealing with C extension modules comes in the form of Extension Module Crippling_.

API

Extension Module Crippling

Protection

By providing a C API for checking for allowed abilities, modules that have some useful functionality can do proper security checks for those functions that could provide insecure abilities while allowing safe code to be used (and thus not fully deny importation).

Why

Consider a module that provides a string processing ability. If that module provides a single convenience function that reads its input string from a file (with a specified path), the whole module should not be blocked from being used, just that convenience function. By whitelisting the module but having a security check on the one problem function, the user can still gain access to the safe functions. Even better, the unsafe function can be allowed if the security checks pass.

Possible Security Flaws

If a C extension module developer incorrectly implements the security checks for the unsafe functions it could lead to undesired abilities.

API

Use PyXXX_Trusted() to protect unsafe code from being executed.

Hostile Bytecode

Protection

The code object's constructor is not callable from Python. Importation of .pyc and .pyo files is also prohibited.

Why

Without implementing a bytecode verification tool, there is no way of making sure that bytecode does not jump outside its bounds, thus possibly executing malicious code. It also presents the possibility of crashing the interpreter.

Possible Security Flaws

None known.

API

None.

Changing the Behaviour of the Interpreter

Protection

Only a subset of the 'sys' module will be made available to untrusted interpreters. Things to allow from the sys module:

Why

Filesystem information must be removed. Any settings that could possibly lead to a DoS attack (e.g., sys.setrecursionlimit()) or risk crashing the interpreter must also be removed.

Possible Security Flaws

Exposing something that could lead to future security problems (e.g., a way to crash the interpreter).

API

None.

Socket Usage

Protection

Allow sending and receiving data to/from specific IP addresses on specific ports.

Why

Allowing arbitrary sending of data over sockets can lead to DoS attacks on the network and other machines. Limiting accepting data prevents your machine from being attacked by accepting malicious network connections. It also allows you to know exactly where communication is going to and coming from.

Possible Security Flaws

If someone managed to influence the used DNS server to influence what IP addresses were used after a DNS lookup.

API

Network Information

Protection

Limit what information can be gleaned about the network the system is running on. This does not include restricting information on IP addresses and hosts that are have been explicitly allowed for the untrusted interpreter to communicate with.

Why

With enough information from the network several things could occur. One is that someone could possibly figure out where your machine is on the Internet. Another is that enough information about the network you are connected to could be used against it in an attack.

Possible Security Flaws

As long as usage is restricted to only what is needed to work with allowed addresses, there are no security issues to speak of.

API

Filesystem Information

Protection

Do not allow information about the filesystem layout from various parts of Python to be exposed. This means blocking exposure at the Python level to:

Why

Exposing information about the filesystem is not allowed. You can figure out what operating system one is on which can lead to vulnerabilities specific to that operating system being exploited.

Possible Security Flaws

Not finding every single place where a file path is exposed.

API

Threading

XXX Needed?

Stdin, Stdout, and Stderr

Protection

By default, sys.stdin, sys.stdout, and sys.stderr will be set to instances of cStringIO. Allowing use of the normal stdin, stdout, and stderr will be allowed. XXX Or perhaps stdin and friends should just be blocked and all you get is sys.stdin and friends set to cStringIO.

Why

Interference with stdin, stdout, or stderr should not be allowed unless desired.

Possible Security Flaws

Unless cStringIO instances can be used maliciously, none to speak of. XXX Use StringIO instances instead for even better security?

API

Adding New Protections

Protection

Allow for extensibility in the security model by being able to add new types of checks. This allows not only for Python to add new security protections in a backwards-compatible fashion, but to also have extension modules add their own as well.

An extension module can introduce a group for its various values to check, with a type being a specific value within a group. The "Python" group is specifically reserved for use by the Python core itself.

Why

We are all human. There is the possibility that a need for a new type of protection for the interpreter will present itself and thus need support. By providing an extensible way to add new protections it helps to future-proof the system.

It also allows extension modules to present their own set of security protections. That way one extension module can use the protection scheme presented by another that it is dependent upon.

Possible Security Flaws

Poor definitions by extension module users of how their protections should be used would allow for possible exploitation.

API

XXX Could also have PyXXXExtended prefix instead for the following functions

References ///////////////////////////////////////

.. [#rexec] The 'rexec' module (http://docs.python.org/lib/module-rexec.html)

.. [#safe-tcl] The Safe-Tcl Security Model (http://research.sun.com/technical-reports/1997/abstract-60.html)

.. [#ctypes] 'ctypes' module (http://docs.python.org/dev/lib/module-ctypes.html)



More information about the Python-Dev mailing list