__hash__ and ordered vs. unordered collections

MRAB python at mrabarnett.plus.com
Mon Nov 20 14:31:07 EST 2017


On 2017-11-20 17:47, Josh B. wrote:
> Suppose we're implementing an immutable collection type that comes in unordered and ordered flavors. Let's call them MyColl and MyOrderedColl.
> 
> We implement __eq__ such that MyColl(some_elements) == MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).
> 
What if there are duplicate elements?

Should that be MyColl(some_elements) == MyOrderedColl(other_elements) 
iff len(some_elements) == len(other_elements) and set(some_elements) == 
set(other_elements)?

> But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff list(some_elements) == list(other_elements).
> 
> This works just like dict and collections.OrderedDict, in other words.
> 
> Since our collection types are immutable, let's say we want to implement __hash__.
> 
> We must ensure that our __hash__ results are consistent with __eq__. That is, we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements), then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).
> 
> Now for the question: Is this useful? I ask because this leads to the following behavior:
> 
>>>> unordered = MyColl([1, 2, 3])
>>>> ordered = MyOrderedColl([3, 2, 1])
>>>> s = {ordered, unordered}
>>>> len(s)
> 1
>>>> s = {ordered}
>>>> unordered in s
> True
>>>> # etc.
> 
> In other words, sets and mappings can't tell unordered and ordered apart; they're treated like the same values.
> 
> This is a bit reminiscent of:
> 
>>>> s = {1.0}
>>>> True in s
> True
>>>> d = {1: int, 1.0: float, True: bool}
>>>> len(d)
> 1
>>>> # etc.
> 
> The first time I encountered this was a bit of an "aha", but to be clear, I think this behavior is totally right.
> 
> However, I'm less confident that this kind of behavior is useful for MyColl and MyOrderedColl. Could anyone who feels more certain one way or the other please explain the rationale and possibly even give some real-world examples?
> 
If MyColl(some_elements) == MyOrderedColl(other_elements), then 
len({MyColl(some_elements), MyOrderedColl(other_elements)}) == 1 seems 
right.

As for which one is in the set:

 >>> {1, 1.0}
{1}
 >>> {1.0, 1}
{1.0}

So if MyColl(some_elements) == MyOrderedColl(other_elements), then 
{MyColl(some_elements), MyOrderedColl(other_elements)} == 
{MyColl(some_elements)}.



More information about the Python-list mailing list