[Python-Dev] The pysandbox project is broken

Nick Coghlan ncoghlan at gmail.com
Wed Nov 13 07:54:30 CET 2013


On 13 Nov 2013 09:56, "Josiah Carlson" <josiah.carlson at gmail.com> wrote:
>
> Python-dev is for the development of the Python core language, the
CPython runtime, and libraries. Your sandbox, despite using and requiring
deep knowledge of the runtime, is not developing those things. If you had a
series of requests for the language or runtime that would make your job
easier, then your thread would be on-topic.

While it may seem off-topic at first glance, pysandbox started out as
Victor's attempt to prove those of us that were saying this wouldn't work
wrong when he proposed replacing the long dead rexec and Bastion with
something more robust.

I actually applaud his decision to post his final conclusion to the list,
even though it wasn't the outcome he was hoping for. Negative data is still
data :)

Cheers,
Nick.

>
> I replied off-list because I didn't want to contribute to the off-topic
posting, but if posting on-list is required for you to pay attention, so be
it.
>
> - Josiah
>
> On Nov 12, 2013 2:51 PM, "Victor Stinner" <victor.stinner at gmail.com>
wrote:
>>
>> 2013/11/12 Josiah Carlson <josiah.carlson at gmail.com>:
>> > I'm replying off-list because I didn't want to bother the other folks
in
>> > python-dev (also, your post might have been better on python-list, but
I
>> > digress).
>>
>> I don't understand why you are writing to me directly. I won't reply
>> if you don't write publicly on python-dev.
>>
>> Summary of my email: it's not possible to write a sandbox in CPython.
>> So it's very specific to CPython internals. I'm not subscribed to
>> python-list.
>>
>> Victor
>>
>> >
>> > Long story short, I think that you are right, and I think that you are
>> > wrong.
>> >
>> > I think that you are right that your current pysandbox implementation
is
>> > likely broken by design. You are starting from a completely working
Python
>> > runtime, then eliminating/hiding/blocking certain features. This makes
it a
>> > game of whack-a-mole, for every vulnerability you fix, a new one comes
up
>> > later. The only way to fix this problem is to change your design.
>> >
>> > If you wanted to do it right, instead of removing things that are
>> > vulnerable, start by defining what is safe, and expose only those safe
>> > things. As an example, you did the right thing by splitting your main
and
>> > subprocess into two pieces. But you don't need to serialize your
objects
>> > from the trusted namespace to give access to the sandbox (that exposes
your
>> > "trusted" objects to the sandbox in a raw manner, in obvious
preparation for
>> > exploitation). Instead you would just expose a proxy object whose
method
>> > calls/attribute references are made across your pipe (or socket, or
>> > whatever) to the trusted controlling process. Is it slower? Yes. Does
it
>> > matter? Not if it keeps the sandbox secure.
>> >
>> > Now if you start by saying, "what is allowed?", the most obvious
destination
>> > is that you will more or less end up writing your own Python runtime.
That's
>> > not necessarily a bad thing, as if you know that a new runtime is your
>> > destination, you can look for a viable alternate-language runtime to
begin
>> > with to short-circuit your work. The best option that I can come up
with at
>> > this point is Javascript as a destination language, as there are
several
>> > Python to Javascript compilers out there, Javascript is sandboxed by
design,
>> > and you can arbitrarily eliminate portions of the py->js compilation
>> > opportunities to eliminate attack vectors (specifically keeping only
those
>> > that you know won't lead to an attack).
>> >
>> > Another option is Lua, though I don't really know of any viable Python
to
>> > Lua transpilers out there.
>> >
>> > Good luck with whatever you decide to do.
>> >
>> > Regards,
>> >  - Josiah
>>
>> >
>> >
>> >
>> > On Tue, Nov 12, 2013 at 1:16 PM, Victor Stinner <
victor.stinner at gmail.com>
>> > wrote:
>> >>
>> >> Hi,
>> >>
>> >> After having work during 3 years on a pysandbox project to sandbox
>> >> untrusted code, I now reached a point where I am convinced that
>> >> pysandbox is broken by design. Different developers tried to convinced
>> >> me before that pysandbox design is unsafe, but I had to experience it
>> >> myself to be convineced.
>> >>
>> >> It would also be nice to help developers looking for a sandbox for
>> >> their application. Please tell me if you know sandbox projects for
>> >> Python so I can redirect users of pysandbox to a safer solution. I
>> >> already know PyPy sandbox.
>> >>
>> >> I would like to share my experience because I know that other
>> >> developers are using sandboxes in production and that there is a real
>> >> need for sandboxing.
>> >>
>> >>
>> >> Origin of pysandbox
>> >> ===================
>> >>
>> >> In 2010, a developper called Tav wrote a sandbox called "safelite.py":
>> >> the sandbox hides sensitive attributes to separate a trusted namespace
>> >> and an untrusted namespace. Tav challenged Python core developers to
>> >> break his sandbox and... the sandbox was quickly broken. Even if it
>> >> was quickly broken, I was conviced that Tav found something
>> >> interesting and that there is a real need for sandboxing Python. I
>> >> continued his work by putting more protections on the untrusted
>> >> namespace. I published pysandbox 1.0 in june 2010.
>> >>
>> >>
>> >> History of pysandbox
>> >> ====================
>> >>
>> >> pysandbox was used to build an IRC bot on a french Python channel. The
>> >> bot executed Python code in the sandbox. The bot was mainly used by
>> >> hackers to test the sandbox to try to find a vulnerability. It was
>> >> nice to have such IRC bot on an Python help channel.
>> >>
>> >> Three month later after the release of pysandbox 1.0, the first
>> >> vulnerability was found: it was possible to modify the __builtins__
>> >> dictionary to hack the sandbox functions and so escape from the
>> >> sandbox. I had to blacklist common instructions like "dict.pop()" or
>> >> "del dict[key]" to protect the __builtins__ dictionary. I had prefer
>> >> to use a custom type for __builtins__ but CPython requires a real
>> >> dictionary: Python/ceval.c has inlined version of PyDict_GetItem. For
>> >> your information, I modified CPython 3.3 to accept arbitrary mapping
>> >> types for __builtins__.
>> >>
>> >> Just after this fix, another vulnerability was found: it was still
>> >> possible to modify __builtins__ using dict.__init__() method. The
>> >> access to this method was also blocked.
>> >>
>> >> Seven months later, new vulnerabilities. The "timeout" protection was
>> >> removed because it is not effective on CPU intensive functions
>> >> implemented in C. And to workaround a known bug in CPython crashing
>> >> the interpreter, the access to the type.__bases__ attribute was also
>> >> blocked. But this protection has to be disabled on CPython 2.5 because
>> >> of another CPython bug... The access to func_defaults/__defaults__
>> >> attributes of a function was also blocked to protect the sandbox, even
>> >> if it was not exploitable to escape from the sandbox.
>> >>
>> >>
>> >> Recent events
>> >> ==============
>> >>
>> >> A few weeks ago, a security challenge targeted pysandbox. In less then
>> >> one day, two vulnerabilities were found. First, the compile() builtin
>> >> function was used to read line by line of an arbitrary file on the
>> >> disk using a syntax error: the line is displayed in the traceback.
>> >> Second, a context manager was used to retrieve a traceback object:
>> >> from traceback.tb_frame, it was possible to navigate in the frames
>> >> (using frame.f_back) to retrieve a frame of the trusted namespace, and
>> >> then use f_globals attribute of the frame to retrieve a global name.
>> >> Game over.
>> >>
>> >> I fixed these two vulnerabilities in pysandbox 1.5.1: compile() is now
>> >> blocked by default, and the access to traceback.tb_frame, frame.f_back
>> >> and frame.f_globals has been blocked.
>> >>
>> >> I also started to work on a new design of pysandbox (version currently
>> >> called "pysandbox 1.6", might become pysandbox 2.0 later): run
>> >> untrusted code in a subprocess to have a safer design. Using a
>> >> subprocess, it becomes easier to limit the memory usage, setup a real
>> >> timeout, limit bytes written to stdout, limit the size of data send to
>> >> and received from the child process, etc.  But my main motivation was
>> >> to not crash the whole application if the untrusted code exploits a
>> >> know Python bug to crash the process. They are (too) many ways to
>> >> crash Python using common types and functions...
>> >>
>> >> The problem is that after each release it becomes harder to write
>> >> Python code in the sandbox. For example it becomes very hard to give
>> >> access to objects from the trusted namespace to the untrusted
>> >> namespace, because the whole object must be serialized to be passed to
>> >> the child process. It becomes also harder to debug bugs in the
>> >> sandboxeded code because the traceback feature doesn't work well in
>> >> the sandbox.
>> >>
>> >>
>> >> Pysandbox is broken
>> >> ===================
>> >>
>> >> In my opinion, the compile() vulnerabilty is the proof that it is not
>> >> possible to put a sandbox in CPython. Blocking access to the open()
>> >> builtin function and the file type constructor are not enough if
>> >> unrelated functions can give access indirectly to the file system.
>> >> Having read access on the file system is a critical vulnerability in
>> >> pysandbox and modifying CPython to not print the source code line in a
>> >> traceback is also not acceptable.
>> >>
>> >> I now agree that putting a sandbox in CPython is the wrong design.
>> >> There are too many ways to escape the untrusted namespace using the
>> >> various introspection features of the Python language. To guarantee
>> >> the safetely of a security product, the code should be carefuly
>> >> audited and the code to review must be as small as possible. Using
>> >> pysandbox, the "code" is the whole Python core which is a really huge
>> >> code base. For example, the Python and Objects directories of Python
>> >> 3.4 contain more than 126,000 lines of C code.
>> >>
>> >> The security of pysandbox is the security of its weakest part. A
>> >> single bug is enough to escape the whole sandbox.
>> >>
>> >> Attackers had original and different ideas like hacking __builtins__,
>> >> using warnings, context manager, syntax errors, arbitrary bytecode,
>> >> etc. It is hard to protect the untrusted namespace for all these
>> >> different Python features.
>> >>
>> >> It might be possible to invest a lot of time to put enough protections
>> >> to protect the untrusted namespace, but it leads to my second point:
>> >> pysandbox cannot be used in practice.
>> >>
>> >>
>> >> pysandbox cannot be used in practice
>> >> ====================================
>> >>
>> >> To protect the untrusted namespace, pysandbox installs a lot of
>> >> different protections. Because of all these protections, it becomes
>> >> hard to write Python code. Basic features like "del dict[key]" are
>> >> denied. Passing an object to a sandbox is not possible to sandbox,
>> >> pysandbox is unable to proxify arbitary objects.
>> >>
>> >> For something more complex than evaluating "1+(2*3)", pysandbox cannot
>> >> be used in practice, because of all these protections. Individual
>> >> protections cannot be disabled, all protections are required to get a
>> >> secure sandbox.
>> >>
>> >>
>> >> So what should be used to sandbox Python?
>> >> =========================================
>> >>
>> >> I developed pysandbox for fun in my free time. But I was contacted by
>> >> different companies interested to use pysandbox in production on their
>> >> web application.  So I think that there is a real need to execute
>> >> arbitrary untrusted code.
>> >>
>> >> I now think that putting a sandbox directly in Python cannot be
>> >> secure. To build a secure sandbox, the whole Python process must be
>> >> put in an external sandbox. There are for example projects using Linux
>> >> SECCOMP security feature to isolate the Python process.
>> >>
>> >> PyPy has a similar design, it implemented something similar to SECCOMP
>> >> but in a portable way.
>> >>
>> >> Please tell me if you know sandbox projects for Python so I can
>> >> redirect users of pysandbox to a safer solution. I already know PyPy
>> >> sandbox.
>> >>
>> >>
>> >> Victor
>> >> _______________________________________________
>> >> Python-Dev mailing list
>> >> Python-Dev at python.org
>> >> https://mail.python.org/mailman/listinfo/python-dev
>> >> Unsubscribe:
>> >>
https://mail.python.org/mailman/options/python-dev/josiah.carlson%40gmail.com
>> >
>> >
>
>
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> https://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
https://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20131113/9e92b490/attachment-0001.html>


More information about the Python-Dev mailing list