data hiding/namespace pollution

Mon Oct 31 07:34:05 EST 2005

On Mon, 31 Oct 2005 10:35:19 +0000, Alex Hunsley wrote:

> There's no really specific questions in this post, but I'm looking for 
> people's thought on the issues within...
> 
> 
> The two main versions I've encountered for data pseudo-hiding 
> (encapsulation) in python are:
> 
> method 1:
> 
> _X  - (single underscore) - just cosmetic, a convention to let someone
>        know that this data should be private.

Not quite.

In modules, names starting with one or more underscore (_X, __X, etc.) are
not copied over when you import the module using "from module import *".

In classes, instance._X is just a convention "this is private, don't touch
unless you really have to".

> method 2:
> 
> __X - (double underscore) - mangles the name (in a predictable way).
>        Avoids name pollution.

Again, not quite: this only occurs for attributes, not names in modules.

> How often does either tend to get used? Personally, I'd be a little 
> worried about using method 1, because namespace clashes could happen. Is 
> this overly paranoid?

You are no more likely to have instance._attribute clash as you are to
have instance.attribute clash.

In fact, since each class is its own namespace, it is only an issue if you
are subclassing. And that is an argument for better documentation: if you
tell people your class uses semi-private attribute _X, and they still
accidentally over-write it, that is their fault exactly as if they
accidentally over-wrote public methods like .append().

> Also, I presume that rather than people writing their own manual getter 
> and setter methods, they tend to use either overloading on __getattr__ 
> and __setattr__, or the Property class (which itself uses aforementioned 
>   methods). Overloading __getattr__ etc. seems more attractive to me, as 
> then I can capture access to unknown names, and raise an exception!

You don't need to overload __getattr__ to raise an exception when you
access unknown names:

py> class Parrot:
...     canSpeak = True  # note mixed case
...
py> p = Parrot()
>>> p.canspeak
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
AttributeError: Parrot instance has no attribute 'canspeak'

In any case, that sounds like you are just making work for yourself.
What are you doing, manually keeping a list of "allowed" attributes which
you check before hand?

# warning: untested
class Spanish_Inquisition():
    ALLOWED = ['comfy_chair', 'shrubbery']
    def __getattr__(self, name):
        if name in self.ALLOWED:
            return self.__dict__[name]
        raise ValueError("No such attribute")

Yuck yuck yuck. Slow, unnecessary, and of course you might think you know
what attributes your class needs, but you can never predict when your
class's users will want to add attributes you never thought of.

There is, at least, an argument in favour of using that technique for
enforcing something like attribute declarations:

    def __setattr__(self, name, value):
        if name in self.ALLOWED:
            self.__dict__[name] = value
        else:
            raise ValueError("That attribute hasn't been declared.")

although that just leads into the whole "bondage and domination language"
can of worms.

In any case, you already have a perfectly good list of attributes.
Actually, two lists, one for class attributes and one for instance
attributes:

instance.__class__.__dict__.keys() 
instance.__dict__.keys() 

Keeping two lots of the same data around is usually a recipe for trouble.
Just wait until you delete an attribute, and then forget to remove it from
your ALLOWED list, and watch the fun and games when you start getting
unexpected errors.

> (I really don't like the idea of random attribute name typos going 
> unnoticed when accessing attributes in a class!)

This is no more a problem than getting random name typos when accessing
any objects in Python. In many people's experience, it is mostly -- but
not always -- those who don't use Python very much who worry about the
lack of declarations. In practice, if you are testing your code
sufficiently, you won't miss the lack of declarations. Declarations are
only good for picking up a tiny subset of bugs, and proper testing will
pick those same bugs -- and many more -- without the need for declaring
variables and/or attributes.

No doubt there will be some who disagree. Let me postscript my comments
with YMMV, and remind folks that even if declarations are the best thing
since the transistor, Python currently doesn't have them and all the
arguing in the world won't change that.

-- 
Steven.