[Python-Dev] The pysandbox project is broken (original) (raw)
Victor Stinner victor.stinner at gmail.com
Tue Nov 12 22:16:55 CET 2013
- Previous message: [Python-Dev] [RELEASED] Python 3.3.3 release candidate 2
- Next message: [Python-Dev] The pysandbox project is broken
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Hi,
After having work during 3 years on a pysandbox project to sandbox untrusted code, I now reached a point where I am convinced that pysandbox is broken by design. Different developers tried to convinced me before that pysandbox design is unsafe, but I had to experience it myself to be convineced.
It would also be nice to help developers looking for a sandbox for their application. Please tell me if you know sandbox projects for Python so I can redirect users of pysandbox to a safer solution. I already know PyPy sandbox.
I would like to share my experience because I know that other developers are using sandboxes in production and that there is a real need for sandboxing.
Origin of pysandbox
In 2010, a developper called Tav wrote a sandbox called "safelite.py": the sandbox hides sensitive attributes to separate a trusted namespace and an untrusted namespace. Tav challenged Python core developers to break his sandbox and... the sandbox was quickly broken. Even if it was quickly broken, I was conviced that Tav found something interesting and that there is a real need for sandboxing Python. I continued his work by putting more protections on the untrusted namespace. I published pysandbox 1.0 in june 2010.
History of pysandbox
pysandbox was used to build an IRC bot on a french Python channel. The bot executed Python code in the sandbox. The bot was mainly used by hackers to test the sandbox to try to find a vulnerability. It was nice to have such IRC bot on an Python help channel.
Three month later after the release of pysandbox 1.0, the first vulnerability was found: it was possible to modify the builtins dictionary to hack the sandbox functions and so escape from the sandbox. I had to blacklist common instructions like "dict.pop()" or "del dict[key]" to protect the builtins dictionary. I had prefer to use a custom type for builtins but CPython requires a real dictionary: Python/ceval.c has inlined version of PyDict_GetItem. For your information, I modified CPython 3.3 to accept arbitrary mapping types for builtins.
Just after this fix, another vulnerability was found: it was still possible to modify builtins using dict.init() method. The access to this method was also blocked.
Seven months later, new vulnerabilities. The "timeout" protection was removed because it is not effective on CPU intensive functions implemented in C. And to workaround a known bug in CPython crashing the interpreter, the access to the type.bases attribute was also blocked. But this protection has to be disabled on CPython 2.5 because of another CPython bug... The access to func_defaults/defaults attributes of a function was also blocked to protect the sandbox, even if it was not exploitable to escape from the sandbox.
Recent events
A few weeks ago, a security challenge targeted pysandbox. In less then one day, two vulnerabilities were found. First, the compile() builtin function was used to read line by line of an arbitrary file on the disk using a syntax error: the line is displayed in the traceback. Second, a context manager was used to retrieve a traceback object: from traceback.tb_frame, it was possible to navigate in the frames (using frame.f_back) to retrieve a frame of the trusted namespace, and then use f_globals attribute of the frame to retrieve a global name. Game over.
I fixed these two vulnerabilities in pysandbox 1.5.1: compile() is now blocked by default, and the access to traceback.tb_frame, frame.f_back and frame.f_globals has been blocked.
I also started to work on a new design of pysandbox (version currently called "pysandbox 1.6", might become pysandbox 2.0 later): run untrusted code in a subprocess to have a safer design. Using a subprocess, it becomes easier to limit the memory usage, setup a real timeout, limit bytes written to stdout, limit the size of data send to and received from the child process, etc. But my main motivation was to not crash the whole application if the untrusted code exploits a know Python bug to crash the process. They are (too) many ways to crash Python using common types and functions...
The problem is that after each release it becomes harder to write Python code in the sandbox. For example it becomes very hard to give access to objects from the trusted namespace to the untrusted namespace, because the whole object must be serialized to be passed to the child process. It becomes also harder to debug bugs in the sandboxeded code because the traceback feature doesn't work well in the sandbox.
Pysandbox is broken
In my opinion, the compile() vulnerabilty is the proof that it is not possible to put a sandbox in CPython. Blocking access to the open() builtin function and the file type constructor are not enough if unrelated functions can give access indirectly to the file system. Having read access on the file system is a critical vulnerability in pysandbox and modifying CPython to not print the source code line in a traceback is also not acceptable.
I now agree that putting a sandbox in CPython is the wrong design. There are too many ways to escape the untrusted namespace using the various introspection features of the Python language. To guarantee the safetely of a security product, the code should be carefuly audited and the code to review must be as small as possible. Using pysandbox, the "code" is the whole Python core which is a really huge code base. For example, the Python and Objects directories of Python 3.4 contain more than 126,000 lines of C code.
The security of pysandbox is the security of its weakest part. A single bug is enough to escape the whole sandbox.
Attackers had original and different ideas like hacking builtins, using warnings, context manager, syntax errors, arbitrary bytecode, etc. It is hard to protect the untrusted namespace for all these different Python features.
It might be possible to invest a lot of time to put enough protections to protect the untrusted namespace, but it leads to my second point: pysandbox cannot be used in practice.
pysandbox cannot be used in practice
To protect the untrusted namespace, pysandbox installs a lot of different protections. Because of all these protections, it becomes hard to write Python code. Basic features like "del dict[key]" are denied. Passing an object to a sandbox is not possible to sandbox, pysandbox is unable to proxify arbitary objects.
For something more complex than evaluating "1+(2*3)", pysandbox cannot be used in practice, because of all these protections. Individual protections cannot be disabled, all protections are required to get a secure sandbox.
So what should be used to sandbox Python?
I developed pysandbox for fun in my free time. But I was contacted by different companies interested to use pysandbox in production on their web application. So I think that there is a real need to execute arbitrary untrusted code.
I now think that putting a sandbox directly in Python cannot be secure. To build a secure sandbox, the whole Python process must be put in an external sandbox. There are for example projects using Linux SECCOMP security feature to isolate the Python process.
PyPy has a similar design, it implemented something similar to SECCOMP but in a portable way.
Please tell me if you know sandbox projects for Python so I can redirect users of pysandbox to a safer solution. I already know PyPy sandbox.
Victor
- Previous message: [Python-Dev] [RELEASED] Python 3.3.3 release candidate 2
- Next message: [Python-Dev] The pysandbox project is broken
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]