Mutable objects which define __hash__ (was Re: Why are tuples immutable?)

Bengt Richter bokr at oz.net
Thu Dec 30 05:06:00 EST 2004


On Thu, 30 Dec 2004 17:36:57 +1000, Nick Coghlan <ncoghlan at iinet.net.au> wrote:

>Bengt Richter wrote:
>> Essentially syntactic sugar to avoid writing id(obj) ? (and to get a little performance
>> improvement if they're written in C). I can't believe this thread came from the
>> lack of such sugar ;-)
>
>The downside of doing it that way is you have no means of getting from the id() 
>stored as a key back to the associated object. Meaningful iteration (including 
>listing of contents) becomes impossible. Doing the id() call at the Python level 
>instead of internally to the interpreter is also relatively expensive.
ISTM

    d[id(obj)] = obj, classifier_func(obj)

gets around the iteration problem (IIRC a very similar suggestion was somewhere in thread).
But if the id call is a significant portion of the cycle budget, yeah, might want to
"pursue" a collections solution ;-)

>
>> Or, for that matter, (if you are the designer) giving the objects an
>> obj.my_classification attribute (or indeed, property, if dynamic) as part
>> of their initialization/design?
>
>The main mutable objects we're talking about here are Python lists. Selecting an 
and really non-mutated Python lists?
>alternate classification schemes using a subclass is the current recommended 
>approach - this thread is about alternatives to that.
I'm getting the impression your meaning of "classification" is less about classifying
objects according their interesting features than how to associate the resulting
kind-of-thing info with the objects for more efficient access that recalculating.
In which case ISTM to be an optimization problem that depends intimately on the
particular features of interest in the data, etc.
>
>I generally work with small enough data sets that I just use lists for 
>classification (sorting test input data into inputs which worked properly, and 
>those which failed for various reasons). However, I can understand wanting to 
>use a better data structure when doing frequent membership testing, *without* 
>having to make fundamental changes to an application's object model.
>
The DYFR thing ever lurks ;-)

>> Or subclass your graph node so you can do something readable like
>>     if node.is_leaf: ...
>> instead of
>>     if my_obj_classification[id(node)] == 'leaf': ...
>I'd prefer:
>   if node in leaf_nodes:
>     ...
Which is trivial to code, except for optimization issues, right? ;-)

>
>Separation of concerns suggests that a class shouldn't need to know about all 
>the different ways it may be classified. And mutability shouldn't be a barrier 
>to classification of an object according to its current state.
Agreed. I didn't mean to imply otherwise. I did mention possibly
memoizing classification functions as an optimization approach ;-)

>
>>>Hence why I suggested Antoon should consider pursuing collections.identity_dict 
>>>and collections.identity_set if identity-based lookup would actually address his 
>>>requirements. Providing these two data types seemed like a nice way to do an end 
>>>run around the bulk of the 'potentially variable hash' key problem.
>> 
>> I googled for those ;-) I guess pursuing meant implementing ;-)
>
>Yup. After all, the collections module is about high-performance datatypes for 
>more specific purposes than the standard builtins. identity_dict and 
>identity_set seem like natural fits for dealing with annotation and 
>classification problems where you don't want to modify the class definitions for 
>the objects being annotated or classified.
Well, at least they ought to be comparatively easy to do.
>
>I don't want the capability enough to pursue it, but Antoon seems reasonably 
>motivated :)
Let's see what happens ;-)

Regards,
Bengt Richter



More information about the Python-list mailing list