[Python-Dev] doc for new restricted execution design for Python

Tue Jul 4 12:32:00 CEST 2006

Hi Brett,

Here are some comments on the description of the restricted execution
model that you posted.

> When referring to the state of an interpreter, it is either "trusted" or
> "untrusted".  A trusted interpreter has no restrictions imposed upon any
> resource.  An untrusted interpreter has at least one, possibly more, resource
> with a restriction placed upon it.

In response to Guido's comment about confusing the words "trusted" and
"untrusted", how about "empowered" and "restricted"?

> When the Interpreter Is Embedded
> ================================
>
> Single Untrusted Interpreter
> ----------------------------
>
> This use case is when an application embeds the interpreter and never has more
> than one interpreter running.
>
> The main security issue to watch out for is not having default abilities be
> provided to the interpreter by accident.

I'd rather rephrase this in the opposite direction.  The onus shouldn't
be on the application to hunt down each possible dangerous authority and
deactivate them all one by one.  The main security issue is to let the
application choose which abilities it wants the restricted interpreter
to have, and then ensure that the restricted interpreter gets only those
abilities.

> Multiple Untrusted Interpreters
> -------------------------------
>
> When multiple interpreters, all untrusted at varying levels, need to be
> running within a single application.  This is the key use case that this
> proposed design is targetted for.
>
> On top of the security issues from a single untrusted interpreter,
> there is one additional worry.  Resources cannot end up being leaked
> into other interpreters where they are given escalated rights.

What is your model here for communication between interpreters?  If two
interpreters can communicate, any attempt to "prevent leakage" of
resources is meaningless.  When you say "leaked into other interpreters"
are you talking about a Python object leaking or something else at a
lower level?

Suppose for example that the application wants to embed two interpreters,
P and Q, and that the application wants P to be able to write files but
Q to be restricted against writing files.  When you say "leaked" above,
that suggests to me that you want to prevent something like

    # code running in P
    import spam
    f = open('/home/doofus/.ssh/authorized_keys', 'a')
    spam.f = f

    # code running in Q
    import spam
    spam.f.write('blargh')

The above example supposes that P and Q can communicate through a
shared module, spam, where they can pass Python objects.

But notice that even if you prevent them from passing Python objects
like open files, any form of communication is sufficient to leak
resources:

    # code running in P
    def add_key(key):
        f = open('/home/doofus/.ssh/authorized_keys', 'a')
        f.write(key + '\n')
        f.close()

    import socket
    s = socket.socket()
    s.bind(('', 6666))
    s.listen(1)
    ns, addr = s.accept()
    add_key(ns.recv(100))

    # code running in Q
    import webbrowser
    webbrowser.open('http://localhost:6666/zebra')

As long as P can listen for instructions from Q, it can give Q
the power to write to the filesystem.

> Filesystem
> ===================
>
> The most obvious facet of a filesystem to protect is reading from it.
> One does not want what is stored in ``/etc/passwd`` to get out.  And
> one also does not want writing to the disk unless explicitly allowed
> for basically the same reason; if someone can write ``/etc/passwd``
> then they can set the password for the root account.

There's a big difference between modifying (or erasing) an existing file
and writing a new file (e.g. for temporary storage).  If i give you a
little filesystem of your own to play in, and it starts out empty, you
can put whatever you want in it without violating my secrecy or the
integrity of my files.

I think you should be talking about this in terms of specifically
what abilities you want to be able to allow, based on examples of
real-life applications.

> Physical Resources
> ===================
>
> Memory should be protected.  It is a limited resource on the system
> that can have an impact on other running programs if it is exhausted.
> Being able to restrict the use of memory would help alleviate issues
> from denial-of-service (DoS) attacks.

> Networking
> ===================
>
> Networking is somewhat like the filesystem in terms of wanting similar
> protections.  You do not want to let untrusted code make tons of socket
> connections or accept them to do possibly nefarious things (e.g., acting
> as a zombie).
>
> You also want to prevent finding out information about the network you are
> connected to.  This includes doing DNS resolution since that allows one
> to find out what addresses your intranet has or what subnets you use.

Again, it's risky to describe only individual cases of things to
prevent.  What networking abilities are safe or necessary for the
kinds of applications you have in mind?  Start from nothing and
work up from there.

> Interpreter
> ===================
>
> One must make sure that the interpreter is not harmed in any way.
> There are several ways to possibly do this.  One is generating
> hostile bytecode.  Another is some buffer overflow.  In general any
> ability to crash the interpreter is unacceptable.

This is hard for me to understand.  What exactly do you trust and
not trust?  It seems to me that crashing an interpreter is only a
problem if a single interpreter is running both trusted and untrusted
code -- then if the untrusted code crashes the interpreter, the
trusted code suffers.

But there doesn't seem to be any such thing in your model.  Each
interpreter is either trusted or untrusted.  If the interpreter is
trusted, and the code running in it causes it to crash, i assume
you would consider that to be the code's "own fault", right?
And if the interpreter is untrusted, and the code running in it
causes it to crash, then the code has only harmed itself.

It seems to me that we need only be concerned about crashing when
the crash of an embedded interpreter will bring down its host
application, or there are multiple interpreters embedded at once
and one interpreter causes another interpreter to crash.

> Resource Hiding
> =============================
[...]
> This can be viewed as a passive system for security.
[...]
> Resource Crippling
> =============================
> Another approach to security is to provide constant, proactive security
> checking of rights to use a resource.

I think you have this backwards.  Resource hiding is proactive:
before untrusted code has a chance to abuse anything, you decide
what you want to allow it to do.  It defaults to no access, and
only gets access to resources you have proactively decided to provide.

Resource crippling is the opposite: it begins by giving carte blanche
to the untrusted code, then you run around trying to plug holes
by stopping everything you don't want.  This is a lot more work,
and it is also much more dangerous.  If you forget to plug even
one hole, you're hosed.

Back to what you wrote about resource hiding:

> This can be viewed as a passive system for security.  Once a resource
> has been given to code there are no more checks to make sure the
> security model is being violated.

This last sentence doesn't make any sense.  If you decided to give
the resource, how is using the resource a violation?  Either you
want to enable the resource or you don't.  If you want to enable
it, give it; if you don't, don't give it.  As a criticism of the
resource hiding approach, it's a red herring -- there's no way
to interpret this sentence that doesn't make it also an
unfalsifiable criticism of any possible security model.

> The most common implementation of resource hiding is capabilities.
> In this type of system a resource's reference acts as a ticket that
> represents the right to use the resource.  Once code has a reference
> it is considered to have full use of that resource it represents and
> no further security checks are performed.

Same thing.  What "further security checks" are we worried about?
Woult it check to see whether we've authorized the interpreter to
have access to the resource ... which we already know to be true?

> To allow customizable restrictions one can pass references to wrappers of
> resources.  This allows one to provide custom security to resources instead of
> requiring an all-or-nothing approach.

The ability to customize security restrictions is an important
advantage of the resource hiding approach, since resource crippling
requires that the architect of the security model anticipate every
possible security restriction that future programmers might need.

Using resource crippling is analogous to removing "def" from the
language and requiring Python programmers to only use functions
that are provided in the built-in modules instead of writing their
own functions.

> To use an analogy, imagine you are providing security for your home.
> With capabilities, security came from not having any way to know
> where your house is without being told where it was; a reference
> to its location.  You might be able to ask a guard (e.g., Java's
> ClassLoader) for a map, but if they refuse there is no way for you
> to guess its location without being told.  But once you knew where
> it was, you had complete use of the house.

This analogy is only fair if you compare it to the same analogy for
the resource crippling approach.  Resource crippling doesn't get you
any finer-grained control either!  The comparison story is:

    With resource crippling, security comes from having a guard
    at the door to your house.  When a Python interpreter comes
    up to the door, the guard checks to see if the interpreter
    has permission to enter the house, and if it does, then it
    gets complete use of the house.

Why is the granularity of control described as the whole house
in the resource-hiding story, but as each door in the house in
the resource-crippling story?

> And that complete access is an issue with a capability system.
> If someone played a little loose with a reference for a resource
> then you run the risk of it getting out.

Could you be more specific about what you mean by "it getting out"?

If you mean getting from a trusted interpreter to an untrusted
interpreter -- then how is a resource going to travel between
interpreters?

Or if not, then are you thinking of a situation in which one
piece of code is trusted with the resource, but another piece of
code is not, and both are running in the same interpreter?

-- ?!ng