[Python-Dev] In defense of Capabilities [was: doc for new restricted execution design for Python]

Thu Jul 6 01:26:08 CEST 2006

On 7/5/06, Michael Chermside <mcherm at mcherm.com> wrote:
>
> In response to Ka-Ping's comments on the subject of "Resource Hiding"
> vs "Resource Crippling", Brett says:
>
> > It seems that your criticisms are aimed at resource crippling
> > being a "plug holes as needed but if you foul up you are screwed"
> > with resource hiding being more "fix the fundamental issues and
> > just don't present access to resources you don't want to give
> > access to (or wrap accordingly)".  And in general I agree with
> > this assessment.  But I also realize that Python was not designed
> > for security in mind and there seems to be new ways to get access
> > to 'file'.  If I felt confident that I could find and hide 'file'
> > as needed, I would go that route immediately.  But I don't think I
> > can (and Armin has said this as well).
>
> I agree completely. "Resource Hiding" (specifically, Capabilities)
> has the cleanest concept, yet there are valid reasons to worry
> about implementing it in Python. However, I would like to point out
> one other advantage that capabilities has over "Resource Crippling".
>
> Resource Crippling implements the restrictions as changes in the
> underlying C-level objects capable of performing dangerous operations.
> That means that the restrictions are implemented by the interpreter.
> The advantage is obvious: we can trust the interpreter. But the
> disadvantages are (1) it's slow to fix (requires a bugfix release
> followed by everyone in the world upgrading), and (2) it cannot be
> extended by the user.
>
> With resource crippling, you will need to decide just what kind of
> restrictions the file type will implement. I believe you are
> planning to restrict to a list of known filenames and known
> directories for reading and for writing. (Actually, you said mode
> string, but I presume that you won't maintain separate lists for
> 'r' and 'rb' modes.) Then there was discussion of whether the
> directories ought to be recursive, whether the total number of
> files opened ought to be restricted, whether the total size written
> should be restricted, and even whether the size should be measured
> in bytes or blocks. Such conversations could go on for a long time,
> and in the end you must make some compromises.
>
> If you were using capabilities, you would need to ensure that
> restricted interpreters could only get the file object that they
> were given. But then _all_ of these fancy versions of the
> restrictions would be immediately supported: it would be up to the
> users to create secure wrappers implementing the specific
> restrictions desired.

I agree.  I would prefer this way of doing it.  But as I have said, making
sure that 'file' does not get out into the wild is tough.

I really like this feature of capabilities: that they can be
> extended (well, restricted) by the user, not just by the language
> implementer. That's a powerful feature, and I don't want to give
> it up. But on the other hand, I don't quite see how to maintain
> it -- here are my best ideas, perhaps they will help.
>
> Python already has one essential ingredient for capabilities:
> unforgable references. But it fails in two other ways: having
> secure data, and having no auxiliary means of accessing
> objects.

Right.  Private attributes only exist at the C level.

Python's powerful introspection and visible implementation
> (eg: __dict__) make it impossible to encapsulate data in an
> object in a way that prevents malicious users from accessing
> it. But that is actually surprisingly easy to fix. Just create
> a new type (maybe a new metaclass), implemented in C, which
> contains private data and a means to control access to it. You
> would provide a dict which would be stored privately without
> access from Python, and then provide methods and attributes
> along with a Python function for evaluating access to each. The
> type would ensure that the access test was evaluated with
> access to the private dict before any method or attribute was
> accessed. Such a construct is simple enough that I believe we
> could implement it and be reasonably confident that it was
> reliably secure. (I have played with this in the past and been
> reasonably pleased with the results.) Adding restrictions would
> then incur some performance penalties, but that seems
> unproblematic.

Right, but as you mention below this still does not protect the C level
objects such as 'file'.  If you can prevent references to 'file' from
getting out, you can change open() to return wrapped instances of 'file'
with the desired security measures in place.

That leaves the other problem: auxiliary means of accessing
> objects. There are things like gc.get_objects(). In the special
> case of file, which is a type that's also dangerous, there are
> tricks like "object().__class__.__subclasses__()". I would love
> to believe that we could plug all of these holes, but experience
> (rexec) proves otherwise. For something like sockets, I am
> fairly sure that there's a clear bottleneck (the low-level
> socket module), but still numerous existing libraries that use
> this low-level module without being handed a capability.

Right, but for modules that cheat, you just don't add to the whiltelist.
This is why you should never blindly add modules to the whitelist of
extension modules that you can import.

So this is where my alternative plan starts to fall apart. Your
> (Brett's) plan to use resource crippling for these kinds of
> restrictions involves putting C code around all direct access
> to files, sockets, or whatever resource is being accessed.

Just 'file' and sockets, nothing else.  Everything else is protected by not
allowing importation of the module.

Perhaps instead of your C code doing the security checks
> directly, it could make sure that the objects returned were
> contained within the correct secure wrappers. That's OK so far
> as it goes, but the C checks are per interpreter instance, so
> how do we get them to apply the correct wrappers? Maybe the
> interpreter maintains (in C) a stack of wrappers per-thread and
> provides a means for stack frames (functions) to register
> wrappers? We would wind up wrapping the wrappers, but that's
> just the nature of capabilities. Perhaps it would look
> something like this:
>
>      def invoke_user_function():
>          PyXXX.set_file_wrapper(ReadOnlyRestriction)
>          PyXXX.set_socket_wrapper(
>                  SingleDomainRestriction('example.com'))
>          untrusted_object.do_stuff()
>
>    ...
>
> To sum up: I agree that you cannot rely on prevent all the
> possible "python tricks", but I still think that capabilities
> are a superior solution.

And I have never disagreed with this.  It is just a "practicality vs.
purity" thing.

I'd like to find a way to achieve
> the user-customizability of capabilities without throwing
> out the entire standard library -- maybe some hybrid of
> "resource hiding" and "resource crippling". I can almost see
> a way to achieve it, but there are still a few flaws.

It's already a hybrid solution with the import protections for capabilities
and the 'file' constructor lock-out for crippling.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-dev/attachments/20060705/5fb3b1cb/attachment-0001.html