ANNOUNCE: Thesaurus - a recursive dictionary subclass using attributes
Ian Kelly
ian.g.kelly at gmail.com
Wed Dec 12 18:12:58 EST 2012
On Wed, Dec 12, 2012 at 3:20 PM, Dave Cinege <dave at cinege.com> wrote:
> On Wednesday 12 December 2012 15:42:36 Ian Kelly wrote:
>
>> def __getattribute__(self, name):
>> if name.startswith('__') and name.endswith('__'):
>> return super(Thesaurus, self).__getattribute__(name)
>> return self.__getitem__(name)
>
> Ian,
>
> Tested, and works as you advertised.
>
> Isn't super() depreciated? I've replaced it with this:
> -return super(Thesaurus, self).__getattribute__(name)
> +return dict.__getattribute__(self, name)
It's not deprecated. Some people consider it harmful, and others
disagree. I was once in the former camp but have shifted somewhat
toward the latter.
> Aside from a more palatable result in the python shell for otherwise bad
> code...does this get me anything else? Is it really worth the performance hit
> of 2 string comparisons for every getattribute call?
It could affect real code, not just interactive code. Any time you
unthinkingly choose an attribute name that happens to be the name of a
dict method (e.g. 'items', which otherwise seems like a rather
innocent variable name), that's a potential bug. Depending on how
that attribute is subsequently accessed, the bug might not even be
noticed immediately.
The performance hit compared to the __getattr__ version on my system
is about 1.3 microseconds per call, as measured by timeit, or around
40%. For comparison, the performance hit of using the __getattr__
version versus just using a global variable is about 1.7 microseconds
per call, or around 4000%. For my own use, I don't consider that
substantial enough to worry about, as I'm not in the business of
writing code that would be making hundreds of thousands of accesses
per second.
> Should the idea of implementing what Thesaurus does in mainline python ever
> happen, those 10 lines of code will likely spark a 3 month jihad about how to
> properly do in python which up until now hasn't been something you do in
> python.
The basic idea of proxying attribute access on a dict to key lookup is
pretty common, actually. It likely won't ever make it into the
standard library because 1) there's no clear agreement on what it
should look like; 2) it's easy to roll your own; and 3) it looks too
much like JavaScript. That last probably isn't valid; attribute
proxying is annoying and cumbersome when it automatically happens on
every single object in the language; it's much more manageable when
you have a single type like Thesaurus that you can use only in the
instances where you actually want it.
> To me for i in range(len(l)) seems like simpler, faster, tighter code for this
> now. It's duly noted that enumerate() is more python and I'm an old fart that
> still thinks too much in state machine. I've add except Exception per your
> advise.
Your intuition about what "seems faster" can lead you astray. Using Python 2.7:
>>> timerD = timeit.Timer('for i in range(len(seq)): x = seq[i]', 'seq = range(5)')
>>> timerE = timeit.Timer('for i, x in enumerate(seq): pass', 'seq = range(5)')
>>> min(timerD.repeat(3))
0.8711640725291545
>>> min(timerE.repeat(3))
0.7172601545726138
Of course, that's running each loop a million times, so the difference
here really is pretty negligible.
More information about the Python-list
mailing list