[Python-Dev] In defense of Capabilities [was: doc for new restricted execution design for Python]

Thu Jul 6 00:51:49 CEST 2006

In response to Ka-Ping's comments on the subject of "Resource Hiding"
vs "Resource Crippling", Brett says:

> It seems that your criticisms are aimed at resource crippling
> being a "plug holes as needed but if you foul up you are screwed"
> with resource hiding being more "fix the fundamental issues and
> just don't present access to resources you don't want to give
> access to (or wrap accordingly)".  And in general I agree with
> this assessment.  But I also realize that Python was not designed
> for security in mind and there seems to be new ways to get access
> to 'file'.  If I felt confident that I could find and hide 'file'
> as needed, I would go that route immediately.  But I don't think I
> can (and Armin has said this as well).

I agree completely. "Resource Hiding" (specifically, Capabilities)
has the cleanest concept, yet there are valid reasons to worry
about implementing it in Python. However, I would like to point out
one other advantage that capabilities has over "Resource Crippling".

Resource Crippling implements the restrictions as changes in the
underlying C-level objects capable of performing dangerous operations.
That means that the restrictions are implemented by the interpreter.
The advantage is obvious: we can trust the interpreter. But the
disadvantages are (1) it's slow to fix (requires a bugfix release
followed by everyone in the world upgrading), and (2) it cannot be
extended by the user.

With resource crippling, you will need to decide just what kind of
restrictions the file type will implement. I believe you are
planning to restrict to a list of known filenames and known
directories for reading and for writing. (Actually, you said mode
string, but I presume that you won't maintain separate lists for
'r' and 'rb' modes.) Then there was discussion of whether the
directories ought to be recursive, whether the total number of
files opened ought to be restricted, whether the total size written
should be restricted, and even whether the size should be measured
in bytes or blocks. Such conversations could go on for a long time,
and in the end you must make some compromises.

If you were using capabilities, you would need to ensure that
restricted interpreters could only get the file object that they
were given. But then _all_ of these fancy versions of the
restrictions would be immediately supported: it would be up to the
users to create secure wrappers implementing the specific
restrictions desired.

I really like this feature of capabilities: that they can be
extended (well, restricted) by the user, not just by the language
implementer. That's a powerful feature, and I don't want to give
it up. But on the other hand, I don't quite see how to maintain
it -- here are my best ideas, perhaps they will help.

Python already has one essential ingredient for capabilities:
unforgable references. But it fails in two other ways: having
secure data, and having no auxiliary means of accessing
objects.

Python's powerful introspection and visible implementation
(eg: __dict__) make it impossible to encapsulate data in an
object in a way that prevents malicious users from accessing
it. But that is actually surprisingly easy to fix. Just create
a new type (maybe a new metaclass), implemented in C, which
contains private data and a means to control access to it. You
would provide a dict which would be stored privately without
access from Python, and then provide methods and attributes
along with a Python function for evaluating access to each. The
type would ensure that the access test was evaluated with
access to the private dict before any method or attribute was
accessed. Such a construct is simple enough that I believe we
could implement it and be reasonably confident that it was
reliably secure. (I have played with this in the past and been
reasonably pleased with the results.) Adding restrictions would
then incur some performance penalties, but that seems
unproblematic.

That leaves the other problem: auxiliary means of accessing
objects. There are things like gc.get_objects(). In the special
case of file, which is a type that's also dangerous, there are
tricks like "object().__class__.__subclasses__()". I would love
to believe that we could plug all of these holes, but experience
(rexec) proves otherwise. For something like sockets, I am
fairly sure that there's a clear bottleneck (the low-level
socket module), but still numerous existing libraries that use
this low-level module without being handed a capability.

So this is where my alternative plan starts to fall apart. Your
(Brett's) plan to use resource crippling for these kinds of
restrictions involves putting C code around all direct access
to files, sockets, or whatever resource is being accessed.
Perhaps instead of your C code doing the security checks
directly, it could make sure that the objects returned were
contained within the correct secure wrappers. That's OK so far
as it goes, but the C checks are per interpreter instance, so
how do we get them to apply the correct wrappers? Maybe the
interpreter maintains (in C) a stack of wrappers per-thread and
provides a means for stack frames (functions) to register
wrappers? We would wind up wrapping the wrappers, but that's
just the nature of capabilities. Perhaps it would look
something like this:

     def invoke_user_function():
         PyXXX.set_file_wrapper(ReadOnlyRestriction)
         PyXXX.set_socket_wrapper(
                 SingleDomainRestriction('example.com'))
         untrusted_object.do_stuff()

   ...

To sum up: I agree that you cannot rely on prevent all the
possible "python tricks", but I still think that capabilities
are a superior solution. I'd like to find a way to achieve
the user-customizability of capabilities without throwing
out the entire standard library -- maybe some hybrid of
"resource hiding" and "resource crippling". I can almost see
a way to achieve it, but there are still a few flaws.

-- Michael Chermside