[Python-Dev] Adding a builtins parameter to eval(), exec() and __import__().

Nick Coghlan ncoghlan at gmail.com
Thu Mar 8 13:52:36 CET 2012


On Thu, Mar 8, 2012 at 10:06 PM, Mark Shannon <mark at hotpy.org> wrote:
> I don't think it cleans up import, but I'll defer to Brett on that.
> I've included __import__() along with exec and eval as it is a place where
> new namespaces can be introduced into an execution.
> There may be others I haven't though of.

runpy is another one.

However, the problem I see with "builtins" as a separate argument is
that it would be a lie.

The element that's most interesting about locals vs globals vs
builtins is the scope of visibility of their contents.

When I call out to another function in the same module, locals are not
shared, but globals and builtins are.

When I call out to code in a *different* module, neither locals nor
globals are shared, but builtins are still common.

So there are two ways this purported extra "builtins" parameter could work:

1. Sandboxing - you try to genuinely give the execution context a
different set of builtins that's shared by all code executed, even
imports from other modules.  However, I assume this isn't what you
meant, since it is the domain of sandboxing utilities like Victor's
pysandbox and is known to be incredibly difficult to get right (hence
the demise of both rexec and Bastion and recent comments about known
segfault vulnerabilities that are tolerable in the normal case of
merely processing untrusted data with trusted code but anathema to a
robust CPython native sandboxing scheme that can still cope even when
the code itself is untrusted).

2. chained globals - just an extra namespace that's chained behind the
globals dictionary for name lookup, not actually shared with code
invoked from other modules.

The second approach is potentially useful, but:

1. "builtins" is *not* the right name for it (because any other code
invoked will still be using the original builtins)
2. it's already trivial to achieve such chained lookups in 3.3 by
passing a collections.ChainMap instance as the globals parameter:
http://docs.python.org/dev/library/collections#collections.ChainMap

collections.ChainMap also has the virtue of working with any current
API that accepts a globals argument and can be extended to an
arbitrary level of chaining, whereas this suggestion requires that all
such APIs be expanded to accept a third parameter, and could still
only chain lookups one additional step in doing so.

So a big -1 from me.

Cheers,
Nick.

P.S. I've referenced this talk before, but Tim Dawborn's effort from
PyCon AU last year about the sandboxing setup for
http://www.ncss.edu.au/ should be required viewing for anyone wanting
to understand the kind of effort it takes to fairly comprehensively
protect host servers from attacks when executing arbitrary untrusted
Python code on CPython. Implementing such protection is certainly
*possible* (since Tim's talk is all about one way to do it), but it's
not easy, and Tim's approach uses Linux OS level sandboxing rather
than rather than relying on a Python language level sandbox. This was
largely due to a university requirement that the sandbox solution be
language agnostic, but it also serves to protect the sandbox from the
documented attacks against the CPython interpreter. Tim reviews a few
interesting attempts to break the sandbox around the 5 minute mark in
https://www.youtube.com/watch?v=y-WPPdhTKBU. (I did suggest he grab
our test_crashers directory to see what happened when they were run in
the sandbox, but I doubt it would be much more interesting than merely
calling "sys.exit()")

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia


More information about the Python-Dev mailing list