[Python-Dev] Internal namespace proposal

Thu Jul 27 04:19:40 CEST 2006

[This message is cc:d to the e-lang list, but please take any replies to
python-dev at python.org.]

Brett Cannon wrote:
> On 7/19/06, Ka-Ping Yee <cap-talk at zesty.ca> wrote:
> 
>> OMG!!!  Is all i can say at the moment.  Very excited.

This is very encouraging. Thanks to ?!ng, Michael Chermside and others for making
the case for capabilities.

> Also realize that I am using object-capabilities to secure the interpreter,
> not objects.  That will be enough of a challenge to do for now.  Who knows,
> maybe some day Python can support object-capabilities at the object level,
> but for now I am just trying to isolate and protect individual interpreters
> in the same process.

I think that the alternative of providing object-granularity protection domains
straight away is more practical than you suggest, and I'd like to at least make
sure that this possibility has been thoroughly explored.

Below is a first-cut proposal for enforcing namespace restrictions, i.e. support
for non-public attributes and methods, on Python objects and modules. It is not
sufficient by itself to provide capability security, but it could be the
basis for doing that at object granularity.

(Note that this proposal would only affect sandboxed/restricted interpreters,
at least for the time being. The encapsulation it provides is also useful
for reasons other than security, and I think there is nothing about it that
would be unreasonable to apply to an unrestricted interpreter, but for
compatibility, that would have to be enabled by a __future__ option or similar.)

Internal namespace proposal
===========================

Existing Python code tends to use a convention where the names of attributes
and methods intended only for internal use are prefixed by '_'. This convention
comes from PEP 8 <http://www.python.org/dev/peps/pep-0008/>, which says:

#  In addition, the following special forms using leading or trailing
#  underscores are recognized (these can generally be combined with any case
#  convention):
#
#   - _single_leading_underscore: weak "internal use" indicator.  E.g. "from M
#     import *" does not import objects whose name starts with an underscore.
#
#   - single_trailing_underscore_: used by convention to avoid conflicts with
#     Python keyword, e.g.
#
#     Tkinter.Toplevel(master, class_='ClassName')
#
#   - __double_leading_underscore: when naming a class attribute, invokes name
#     mangling (inside class FooBar, __boo becomes _FooBar__boo; see below).
#
#   - __double_leading_and_trailing_underscore__: "magic" objects or
#     attributes that live in user-controlled namespaces.  E.g. __init__,
#     __import__ or __file__.  Never invent such names; only use them
#     as documented.

I propose that the "internal" status of names beginning with _ (including
those beginning with __) should be enforced in restricted interpreters. This
is better than introducing a new annotation, because it will do the right
thing for existing code that follows this part of PEP 8.

More precisely:

  A restricted interpreter refuses access to any object attribute or method
  with a name beginning with '_' (by throwing a new exception type
  'InternalAccessException'), unless the access is from a method and its
  static target is that method's first argument variable.

  Also, a restricted interpreter refuses access to any module-global
  variable or module-global function with a name beginning with '_' (by
  throwing 'InternalAccessException'), unless the access is statically from
  the same module.

(A method's first argument is usually called 'self', but that's just a convention.
By "static target", I mean that to access an internal attribute _foo in a
method with first argument 'self', you must write "self._foo"; attempting to
access "x._foo" will fail even if 'x' happens to be the same object as 'self'.
This allows such accesses to be reported at compile-time, rather than only at
run-time.)

I am using the term "internal" rather than "private" or "protected", because
these semantics are not the same as either "private" or "protected" in C++ or
Java. In Python with this change, an object can only access its own internal
methods and attributes. In C++ and Java, an object can access private and protected
members of other objects of the same class. The rationale for this difference is
explained below.

The use of _single vs __double underscores encodes a useful distinction that
would not change. Ignoring the point in the previous paragraph, a _single
underscore is similar to "protected" in languages like C++ and Java, while a
__double underscore is similar to "private". This is purely a consequence of
the name mangling: if a class X and its subclass Y both name an attribute
__foo, then we will end up with two attributes _X__foo and _Y__foo in instances
of Y, which is the desired behaviour for private attributes. In the case of an
attribute called _foo, OTOH, there can be only one such attribute per object,
which is the desired behaviour for protected attributes. The name mangling
also ensures that an object will not *accidentally* access a private attribute
inherited from a superclass.

However, in the same example, an instance of Y can still deliberately access
the copy of the attribute inherited from X by specifying _X__foo. There is no
security problem here, because Y cannot do anything as a result that it could
not have done by copying X's code, rather than inheriting from it. Notice that
this is only true because we restrict an object to only accessing its own
internal attributes and methods; if we followed C++'s semantics where an object
can access protected members of any superclass, this would break security.

(Java solves this problem by applying a more complicated access rule for
protected members, which I considered to be unintuitive. More details on request.)

__dict__ is an internal attribute. This means that an object can only directly
reflect on itself. I know that there are other means of reflection (e.g. using
the 'inspect' module); blocking these or making them safe is a separate issue.

If desired, it would be safe to add a 'publicdict' attribute to each object, or
a 'publicdict(object)' built-in. This would return a *read-only* dict, probably
created lazily if needed, giving access only to public (non-internal) attributes
and methods.

__init__ is an internal method. This is as it should be, because it should not
be possible to call __init__ on an existing object; only to have __init__
implicitly called when a new object is constructed.

__repr__ and __str__ are internal under these rules, and probably shouldn't be.
Existing classes may expose private state in the strings returned by __repr__
or __str__, but in principle, there is nothing unsafe about being able to
convert the public state of an object to a string. OTOH, this functionality
is usually accessed via the built-ins 'repr' and 'str', which we could perhaps
allow to access '__repr__' and '__str__' as a special case.

-- 
David Hopwood <david.nospam.hopwood at blueyonder.co.uk>