[Python-Dev] doc for new restricted execution design for Python

Brett Cannon brett at python.org
Wed Jul 5 20:56:09 CEST 2006


On 7/4/06, Ka-Ping Yee <python-dev at zesty.ca> wrote:
>
> Hi Brett,
>
> Here are some comments on the description of the restricted execution
> model that you posted.
>
> > When referring to the state of an interpreter, it is either "trusted" or
> > "untrusted".  A trusted interpreter has no restrictions imposed upon any
> > resource.  An untrusted interpreter has at least one, possibly more,
> resource
> > with a restriction placed upon it.
>
> In response to Guido's comment about confusing the words "trusted" and
> "untrusted", how about "empowered" and "restricted"?


Maybe.  I am really starting to lean towards trusted and sandboxed.

> When the Interpreter Is Embedded
> > ================================
> >
> > Single Untrusted Interpreter
> > ----------------------------
> >
> > This use case is when an application embeds the interpreter and never
> has more
> > than one interpreter running.
> >
> > The main security issue to watch out for is not having default abilities
> be
> > provided to the interpreter by accident.
>
> I'd rather rephrase this in the opposite direction.  The onus shouldn't
> be on the application to hunt down each possible dangerous authority and
> deactivate them all one by one.  The main security issue is to let the
> application choose which abilities it wants the restricted interpreter
> to have, and then ensure that the restricted interpreter gets only those
> abilities.


Right.  I am thinking more of an implementation screw up that somehow
provides access to an object that has escalated rights.

> Multiple Untrusted Interpreters
> > -------------------------------
> >
> > When multiple interpreters, all untrusted at varying levels, need to be
> > running within a single application.  This is the key use case that this
> > proposed design is targetted for.
> >
> > On top of the security issues from a single untrusted interpreter,
> > there is one additional worry.  Resources cannot end up being leaked
> > into other interpreters where they are given escalated rights.
>
> What is your model here for communication between interpreters?  If two
> interpreters can communicate, any attempt to "prevent leakage" of
> resources is meaningless.  When you say "leaked into other interpreters"
> are you talking about a Python object leaking or something else at a
> lower level?


I am talking about Python objects.

As for communication, I was planning on something included directly in
globals or some custom object to handle that.  I have not been focusing on
that aspect so far.

Suppose for example that the application wants to embed two interpreters,
> P and Q, and that the application wants P to be able to write files but
> Q to be restricted against writing files.  When you say "leaked" above,
> that suggests to me that you want to prevent something like
>
>     # code running in P
>     import spam
>     f = open('/home/doofus/.ssh/authorized_keys', 'a')
>     spam.f = f
>
>     # code running in Q
>     import spam
>     spam.f.write('blargh')
>
> The above example supposes that P and Q can communicate through a
> shared module, spam, where they can pass Python objects.


Right.  But Python modules are separate per interpreter and only C extension
modules are in any way shared between interpreters.  But sharing an open
file like that is bad and why C extension modules must be whitelisted to be
used.

But notice that even if you prevent them from passing Python objects
> like open files, any form of communication is sufficient to leak
> resources:
>
>     # code running in P
>     def add_key(key):
>         f = open('/home/doofus/.ssh/authorized_keys', 'a')
>         f.write(key + '\n')
>         f.close()
>
>     import socket
>     s = socket.socket()
>     s.bind(('', 6666))
>     s.listen(1)
>     ns, addr = s.accept()
>     add_key(ns.recv(100))
>
>
>     # code running in Q
>     import webbrowser
>     webbrowser.open('http://localhost:6666/zebra')
>
> As long as P can listen for instructions from Q, it can give Q
> the power to write to the filesystem.


Right, which is why sockets and files are restricted and turned off by
default.  You have to give explicit permission to use either resource.

> Filesystem
> > ===================
> >
> > The most obvious facet of a filesystem to protect is reading from it.
> > One does not want what is stored in ``/etc/passwd`` to get out.  And
> > one also does not want writing to the disk unless explicitly allowed
> > for basically the same reason; if someone can write ``/etc/passwd``
> > then they can set the password for the root account.
>
> There's a big difference between modifying (or erasing) an existing file
> and writing a new file (e.g. for temporary storage).  If i give you a
> little filesystem of your own to play in, and it starts out empty, you
> can put whatever you want in it without violating my secrecy or the
> integrity of my files.
>
> I think you should be talking about this in terms of specifically
> what abilities you want to be able to allow, based on examples of
> real-life applications.


Fair enough.  But since you have the ability to only list files
specifically, you can give temporary file access by giving access to such a
non-existent file for writing.  If you don't like an existing file then you
don't get access to it.

> Physical Resources
> > ===================
> >
> > Memory should be protected.  It is a limited resource on the system
> > that can have an impact on other running programs if it is exhausted.
> > Being able to restrict the use of memory would help alleviate issues
> > from denial-of-service (DoS) attacks.
>
> > Networking
> > ===================
> >
> > Networking is somewhat like the filesystem in terms of wanting similar
> > protections.  You do not want to let untrusted code make tons of socket
> > connections or accept them to do possibly nefarious things (e.g., acting
> > as a zombie).
> >
> > You also want to prevent finding out information about the network you
> are
> > connected to.  This includes doing DNS resolution since that allows one
> > to find out what addresses your intranet has or what subnets you use.
>
> Again, it's risky to describe only individual cases of things to
> prevent.  What networking abilities are safe or necessary for the
> kinds of applications you have in mind?  Start from nothing and
> work up from there.


That's the plan.  I am planning to go through socket function by function
and explicitly allow access as warranted and block everything else.  It is
not going to be "let's block DNS and allow everything else".  Sorry if that
wasn't clear.  This is mostly just to say "I plan on restricting this kind
of stuff, here is an example".

> Interpreter
> > ===================
> >
> > One must make sure that the interpreter is not harmed in any way.
> > There are several ways to possibly do this.  One is generating
> > hostile bytecode.  Another is some buffer overflow.  In general any
> > ability to crash the interpreter is unacceptable.
>
> This is hard for me to understand.  What exactly do you trust and
> not trust?  It seems to me that crashing an interpreter is only a
> problem if a single interpreter is running both trusted and untrusted
> code -- then if the untrusted code crashes the interpreter, the
> trusted code suffers.
>
> But there doesn't seem to be any such thing in your model.  Each
> interpreter is either trusted or untrusted.  If the interpreter is
> trusted, and the code running in it causes it to crash, i assume
> you would consider that to be the code's "own fault", right?
> And if the interpreter is untrusted, and the code running in it
> causes it to crash, then the code has only harmed itself.
>
> It seems to me that we need only be concerned about crashing when
> the crash of an embedded interpreter will bring down its host
> application, or there are multiple interpreters embedded at once
> and one interpreter causes another interpreter to crash.


Right.  But being embedded, won't any segfault of an interpreter bring down
the embedded application?

But you are correct, I am only concerned with preventing a crash of a
sandboxed interperter.

> Resource Hiding
> > =============================
> [...]
> > This can be viewed as a passive system for security.
> [...]
> > Resource Crippling
> > =============================
> > Another approach to security is to provide constant, proactive security
> > checking of rights to use a resource.
>
> I think you have this backwards.  Resource hiding is proactive:
> before untrusted code has a chance to abuse anything, you decide
> what you want to allow it to do.  It defaults to no access, and
> only gets access to resources you have proactively decided to provide.


I am using "proactive" as in constantly checking the security model.

Resource crippling is the opposite: it begins by giving carte blanche
> to the untrusted code, then you run around trying to plug holes
> by stopping everything you don't want.  This is a lot more work,
> and it is also much more dangerous.  If you forget to plug even
> one hole, you're hosed.


Yeah, I know, which is why I am only bothering with 'file' and 'socket'.

Back to what you wrote about resource hiding:
>
> > This can be viewed as a passive system for security.  Once a resource
> > has been given to code there are no more checks to make sure the
> > security model is being violated.
>
> This last sentence doesn't make any sense.  If you decided to give
> the resource, how is using the resource a violation?  Either you
> want to enable the resource or you don't.  If you want to enable
> it, give it; if you don't, don't give it.  As a criticism of the
> resource hiding approach, it's a red herring -- there's no way
> to interpret this sentence that doesn't make it also an
> unfalsifiable criticism of any possible security model.


Yeah, I figured that out after I wrote this.

> The most common implementation of resource hiding is capabilities.

> > In this type of system a resource's reference acts as a ticket that
> > represents the right to use the resource.  Once code has a reference
> > it is considered to have full use of that resource it represents and
> > no further security checks are performed.
>
> Same thing.  What "further security checks" are we worried about?
> Woult it check to see whether we've authorized the interpreter to
> have access to the resource ... which we already know to be true?
>
> > To allow customizable restrictions one can pass references to wrappers
> of
> > resources.  This allows one to provide custom security to resources
> instead of
> > requiring an all-or-nothing approach.
>
> The ability to customize security restrictions is an important
> advantage of the resource hiding approach, since resource crippling
> requires that the architect of the security model anticipate every
> possible security restriction that future programmers might need.
>
> Using resource crippling is analogous to removing "def" from the
> language and requiring Python programmers to only use functions
> that are provided in the built-in modules instead of writing their
> own functions.
>
> > To use an analogy, imagine you are providing security for your home.
> > With capabilities, security came from not having any way to know
> > where your house is without being told where it was; a reference
> > to its location.  You might be able to ask a guard (e.g., Java's
> > ClassLoader) for a map, but if they refuse there is no way for you
> > to guess its location without being told.  But once you knew where
> > it was, you had complete use of the house.
>
> This analogy is only fair if you compare it to the same analogy for
> the resource crippling approach.  Resource crippling doesn't get you
> any finer-grained control either!  The comparison story is:
>
>     With resource crippling, security comes from having a guard
>     at the door to your house.  When a Python interpreter comes
>     up to the door, the guard checks to see if the interpreter
>     has permission to enter the house, and if it does, then it
>     gets complete use of the house.
>
> Why is the granularity of control described as the whole house
> in the resource-hiding story, but as each door in the house in
> the resource-crippling story?


Because, as you said above, if you want someone to have the resource (the
house, or in more concrete terms, a 'file') you just give it to them.  If
you cripple it, though, you might provide a 'file' object but restrict how
many bytes are written.

But I also realize that resource hiding handles this by providing a wrapper
that provides the protection.

> And that complete access is an issue with a capability system.
> > If someone played a little loose with a reference for a resource
> > then you run the risk of it getting out.
>
> Could you be more specific about what you mean by "it getting out"?


Out of a trusted interpreter and ending up in a sandboxed interpreter some
how.

If you mean getting from a trusted interpreter to an untrusted
> interpreter -- then how is a resource going to travel between
> interpreters?


Beats me, but I am always scared of Armin and Samuele.  =)

It seems that your criticisms are aimed at resource crippling being a "plug
holes as needed but if you foul up you are screwed" with resource hiding
being more "fix the fundamental issues and just don't present access to
resources you don't want to give access to (or wrap accordingly)".  And in
general I agree with this assessment.  But I also realize that Python was
not designed for security in mind and there seems to be new ways to get
access to 'file'.  If I felt confident that I could find and hide 'file' as
needed, I would go that route immediately.  But I don't think I can (and
Armin has said this as well).

If you think you can help figure out every place a reference to 'file' can
be found through the standard interpreter, then fine, let's go that way.  I
just don't have faith this can be done effectively.

-Brett


Or if not, then are you thinking of a situation in which one
> piece of code is trusted with the resource, but another piece of
> code is not, and both are running in the same interpreter?
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060705/8fe64cfc/attachment.htm 


More information about the Python-Dev mailing list