ANNOUNCE: Thesaurus - a recursive dictionary subclass using attributes

Wed Dec 12 18:12:58 EST 2012

On Wed, Dec 12, 2012 at 3:20 PM, Dave Cinege <dave at cinege.com> wrote:
> On Wednesday 12 December 2012 15:42:36 Ian Kelly wrote:
>
>> def __getattribute__(self, name):
>>     if name.startswith('__') and name.endswith('__'):
>>         return super(Thesaurus, self).__getattribute__(name)
>>     return self.__getitem__(name)
>
> Ian,
>
> Tested, and works as you advertised.
>
> Isn't super() depreciated? I've replaced it with this:
> -return super(Thesaurus, self).__getattribute__(name)
> +return dict.__getattribute__(self, name)

It's not deprecated.  Some people consider it harmful, and others
disagree.  I was once in the former camp but have shifted somewhat
toward the latter.

> Aside from a more palatable result in the python shell for otherwise bad
> code...does this get me anything else? Is it really worth the performance hit
> of 2 string comparisons for every getattribute call?

It could affect real code, not just interactive code.  Any time you
unthinkingly choose an attribute name that happens to be the name of a
dict method (e.g. 'items', which otherwise seems like a rather
innocent variable name), that's a potential bug.  Depending on how
that attribute is subsequently accessed, the bug might not even be
noticed immediately.

The performance hit compared to the __getattr__ version on my system
is about 1.3 microseconds per call, as measured by timeit, or around
40%.  For comparison, the performance hit of using the __getattr__
version versus just using a global variable is about 1.7 microseconds
per call, or around 4000%.  For my own use, I don't consider that
substantial enough to worry about, as I'm not in the business of
writing code that would be making hundreds of thousands of accesses
per second.

> Should the idea of implementing what Thesaurus does in mainline python ever
> happen, those 10 lines of code will likely spark a 3 month jihad about how to
> properly do in python which up until now hasn't been something you do in
> python.

The basic idea of proxying attribute access on a dict to key lookup is
pretty common, actually.  It likely won't ever make it into the
standard library because 1) there's no clear agreement on what it
should look like; 2) it's easy to roll your own; and 3) it looks too
much like JavaScript.  That last probably isn't valid; attribute
proxying is annoying and cumbersome when it automatically happens on
every single object in the language; it's much more manageable when
you have a single type like Thesaurus that you can use only in the
instances where you actually want it.

> To me for i in range(len(l)) seems like simpler, faster, tighter code for this
> now. It's duly noted that enumerate() is more python and I'm an old fart that
> still thinks too much in state machine. I've add except Exception per your
> advise.

Your intuition about what "seems faster" can lead you astray.  Using Python 2.7:

>>> timerD = timeit.Timer('for i in range(len(seq)): x = seq[i]', 'seq = range(5)')
>>> timerE = timeit.Timer('for i, x in enumerate(seq): pass', 'seq = range(5)')
>>> min(timerD.repeat(3))
0.8711640725291545
>>> min(timerE.repeat(3))
0.7172601545726138

Of course, that's running each loop a million times, so the difference
here really is pretty negligible.