Will python never intend to support private, protected and public?

Sun Oct 2 07:52:13 EDT 2005

bokr at oz.net (Bengt Richter) writes:
> I decided to read this thread today, and I still don't know exactly
> what your requirements are for "private" whatevers. No one seems to
> have discussed what you could do with properties and __getattribute__
> and metaclasses and decorators, yet there are many possibilities
> that might satisfy some or maybe all your requirements. I probably
> missed something...

Well, it's a discussion of why a certain feature might be useful, not
that it's required.  Mike Meyer points out some reasons it might be
hard to do smoothly without changing Python semantics in a deep way
(i.e. Python 3.0 or later).  

The basic notion is that if x.private_var is a private instance
variable on x, then the only functions that are allowed to touch it
are bound methods on x.  Also there has to be some way to stop the
client (the "client" means code that calls methods on x) from
injecting new methods into x's class definition.

Alternatively, if x and y are objects, and the app calls x.foo(y),
then privacy means x cannot see any of y's private variables no matter
what x does (this is sort of the applet model).

Python used to have a module called rexec/Bastion that did something
like this, but required the client to be wrapped in a special
proxy-like object, and it turned out to still have hard-to-fix holes,
so it was removed.  Maybe if Python objects are organized a bit
differently in PyPy, something like Bastion can be done again.

> What "privately" seems to mean is that within a method using
> private_var, you'd still like the code to read self.private_var,
> and not be translated to self.__some_kind_of_extreme_mangling_private_var
> even if the mangling had vanishing collision probability, like a name
> built from a secure hash GUID algorithm, right?

Only if clients were somehow prevented from examining self.__dict__
to find the GUID-like name.

> Also, if an outside access to an instance said inst.private_var = 123
> that would be ok, and bind a public attribute of that name as usual
> without interfering with the methods using the bona fide self.private_var?
> Or should it be an attribute error (which would thus leak the name even
> if it didn't make access possible).

If the mangled name were really long and random, the chance of such a
collision would be negligible.  But I think extreme mangling isn't the
way to go about this.  The right way is with some version of
__getattr__ that's harder to circumvent.

> I take it that a clever subclassing trick (like a post somewhere in
> this thread showed) should not provide access, but how about the
> inspect module?  Or an expression like instance.__dict__[1] if that
> was the sneaky spelling of instance.private_var? (note that integers
> are not allowed for instance attributes, even via get/setattr. (BTW
> & OTOH, interestingly type.__setattribute__(cls, att) allows integer
> att) Where do you want to draw the line?

If it's like Java, stuff like inspect should only work through special
interfaces in the interpreter and not be available to code in the app
unless some flags were set.

> I'm not trying to be provocative, just to get the requirements defined ;-)
> 
> What if you could define a class something like
> 
>     class C(object):
>         # privatize factory makes callable to make the changes and clean up after itself
>         __metaclass__ = privatize(method_foo='private_var1 pv2', method_bar='pv2')
>         ...
> 
> and have method_foo access self.private_var1 and self.pv2 as
> effectively private variables, and method_bar only self.pv2 and
> everything else work as usual? Would that meet your requirements?

I'm not sure; metaclasses confuse me too much.  But I think what I'm
imagining just can't be done in the current definition of Python.
Otherwise rexec/Bastion would be a special case of it and wouldn't
have had to be removed.  It would take a language change to fix it.

> I'm not sure I could do this with just a tricky metaclass, but I am
> pretty sure if you allowed byte-code-munging decoration calls from
> the metaclass to mess with method_foo and method_bar (as specified
> in the privatize factory call args), I could.

Yeah, in the extreme, you could just implement OOP as closures instead
of class instances, like Scheme does.

> This would still leave the access door open via something like
> type(instance).__dict__['method_foo'].func_code.co_consts[1][id(instance),
> 'priv_var'] so, again, where do you want to draw the line. Starting
> and communicating with a slave debugger in another (privilaged)
> process?

Ugh, yes, I guess that would allow reaching inside closures and
getting the variables out.  But it looks CPython-specific.  Maybe with
PyPy, there could be a way to prevent that.

> But the bottom line question is, would you actually use this privacy feature?

I have a crypto library right now that implements private vars by
putting the sensitive objects in a completely separate Python
interpreter from the client app, and communicating with proxy objects
in the client by RPC through sockets.  I couldn't think of a more
secure way to do it.  But of course the RPC overhead is pretty severe
compared with a normal method call.

> Or maybe, what are your real requirements? ;-)

Well, there are several levels being discussed:

1) Something that fixes the broken name mangling in the current
system, but still doesn't try to defeat intentional unmangling.
Currently, if you have a class with the same name as one of its
superclasses, the name mangling can fail even its existing purpose of
preventing accidental collisions.

2) Something that makes code audits more reliable, by guaranteeing
that private variables are inaccessible outside the class definition,
maybe excepting if the client code does something very bizarre like
using bytecode munging.  Using a documented name mangling feature
would not count as bizarre--it's there to be used, so you have to
assume that it's being used.

3) Something that makes serious effort to enforce privacy against
determined attackers, like Java does.  This can be used to implement
either privilege separation inside an app (like my crypto library now
uses a separate process and RPC for that purpose) or as a container
for potentially hostile foreign code (applet sandbox).

I'm not really calling for anything to be added to Python (especially
at levels 2 or 3), since the difficulties are pretty substantial.  I'm
mainly trying to answer the people who say such a feature is useless.