Creating a reliable sandboxed Python environment

Mon May 25 22:44:34 EDT 2015

On Tue, May 26, 2015 at 12:24 PM,  <davidfstr at gmail.com> wrote:
> I believe it is not possible to limit such operations at the Python level. The best you could do is try replacing all the standard library modules, but that is again just a blacklist - it won't prevent a determined attacker from doing things like constructing their own 'code' object and executing it.
>
> It might be necessary to isolate the Python process at the operating system level.
> * A chroot jail on Linux & OS X can limit access to the filesystem. Again this is just a blacklist.
> * No obvious way to block socket creation. Again this would be just a blacklist.
> * No obvious way to detect unapproved system calls and block them.
>
> In the limit, I could dynamically spin up a virtual machine and execute the Python program in the machine. However that's extremely expensive in computational time.
>
> Has anyone on this list attempted to sandbox Python programs in a serious fashion? I'd be interested to hear your approach.

Yes, I had a project along similar lines to yours, a few years back.
We wanted to let our end users customize our service using a Python
script. Our conclusions were:

1) As you say, it is fundamentally not possible to make this work at
the Python level.
2) It's extremely difficult to do at any other level, too.
3) Python is a great language, despite my then-boss's dislike of it.
4) Lua isn't as great a language, but it's much easier to sandbox.
5) Unicode is important, even if my then-boss took a lot of convincing
on that one. (Was a big point in Python's favour, and against Lua.)
6) Efficient transfer of complex structured data across a process
boundary is difficult.
7) Letting end users script your system safely is a fundamentally hard problem.

We ended up abandoning Python altogether and using ECMAScript (with
Google's V8 interpreter) as our scripting language, and even then, we
had to do all sorts of things to make it safe. (And I wouldn't bet my
life on it being safe even now. Not even sure I'd bet my data or
uptime on it being safe, either.)

My recommendation to you: If you absolutely have to run untrusted
Python code, don't concern yourself with *anything* that the Python
code can and can't do. You'll end up making gross and ugly hacks that
stop people from doing legitimate things, in an attempt to prevent
abuses. Instead, *just* guard yourself at the OS level - a chroot jail
to protect what matters, iptables rules to prevent anything going to
the outside world, run as a non-significant user with minimal
permissions, ulimit everything so they can't hurt you. Whatever it
takes, make it so that you could protect C code, because trust me,
it'll be less headaches than trying to sandbox anything at the Python
level. Or, worse, you won't get headaches, you'll just have a flawed
security model that eventually gets exploited.

There are a couple of alternatives. You could go for a really extreme
protection system and actually spin up a virtual machine, where
they're welcome to do whatever they like, and it'll run inside X
amount of memory and Y amount of CPU. Pretty costly (the overhead of a
full OS for every client), but it'll work. Or you could go to the
other extreme, and instead of actually permitting arbitrary Python
code, you instead allow a "Python-like syntax" wherein people can
manipulate the input. You'd need to then create some special hacks to
allow file I/O, so this probably wouldn't work for your scenario, but
imagine writing a sed-like program that accepts Python code. You could
do something like this:

for line in input:
    print(evaluate_user_code(line), file=output)

where evaluate_user_code() is a protected evaluator, like
ast.literal_eval() but additionally allowing access to one name
"line", which obviously would be the line in question.

But for your case, I think that'd require too many hacks to be useful.

ChrisA