insert unique data in a list

mattia gervaz at gmail.com
Mon Dec 14 18:24:31 EST 2009


Il Mon, 14 Dec 2009 21:53:38 +0000, Steven D'Aprano ha scritto:

> On Mon, 14 Dec 2009 17:13:24 +0000, mattia wrote:
> 
>> Il Sun, 13 Dec 2009 21:17:28 -0800, knifenomad ha scritto:
>> 
>>> On 12월14일, 오후12시42분, Steven D'Aprano
>>> <ste... at REMOVE.THIS.cybersource.com.au> wrote:
>>>> On Sun, 13 Dec 2009 17:19:17 -0800, knifenomad wrote:
>>>> > this makes the set type hashable.
>>>>
>>>> > class Set(set):
>>>> >     __hash__ = lambda self: id(self)
>>>>
>>>> That's a *seriously* broken hash function.
>>>>
>>>> >>> key = "voila"
>>>> >>> d = { Set(key): 1 }
>>>> >>> d
>>>>
>>>> {Set(['i', 'a', 'l', 'o', 'v']): 1}>>> d[ Set(key) ]
>>>>
>>>> Traceback (most recent call last):
>>>>   File "<stdin>", line 1, in <module>
>>>> KeyError: Set(['i', 'a', 'l', 'o', 'v'])
>>>>
>>>> --
>>>> Steven
>>> 
>>> of course it is broken as long as it uses it's instance id. i added
>>> this to notify that unhashable can become hashable implementing
>>> __hash__ inside the class. which probably set to None by default.
>> 
>> Ok, nice example, but I believe that using id() as the hash function
>> can lead to unexpected collisions.
> 
> No, you have that backwards. Using id() as the hash function means that
> equal keys will hash unequal -- rather than unexpected collisions, it
> leads to unexpected failure-to-collide-when-it-should-collide.
> 
> And it isn't a "nice example", it is a terrible example.
> 
> Firstly, the example fails to behave correctly. It simply doesn't work
> as advertised.
> 
> Secondly, and much worse, it encourages people to do something dangerous
> without thinking about the consequences. If it is so easy to hash
> mutable objects, why don't built-in lists and sets don't have a __hash__
> method? Do you think that the Python developers merely forgot?
> 
> No, there is good reason why mutable items shouldn't be used as keys in
> dicts or in sets, and this example simply papers over the reasons why
> and gives the impression that using mutable objects as keys is easy and
> safe when it is neither.
> 
> Using mutable objects as keys or set elements leads to surprising,
> unexpected behaviour and hard-to-find bugs. Consider the following set
> with lists as elements:
> 
> L = [1, 2]
> s = Set()  # set that allows mutable elements s.add(L)
> s.add([1, 2, 3])
> 
> So far so good. But what should happen now?
> 
> L.append(3)
> 
> The set now has two equal elements, which breaks the set invariant that
> it has no duplicates.
> 
> Putting the problem of duplicates aside, there is another problem:
> 
> L = [1, 2]
> s = Set([L])
> L.append(3)
> 
> There are two possibilities: either the hash function of L changes when
> the object changes, or it doesn't. Suppose it changes. Then the hash of
> L after the append will be different from the hash of L before the
> append, and so membership testing (L in s) will fail.
> 
> Okay, suppose we somehow arrange matters so that the hash of the object
> doesn't change as the object mutates. This will solve the problem above:
> we can mutate L as often as we like, and L in s will continue to work
> correctly. But now the hash of an object doesn't just depend on it's
> value, but on its history. That means that two objects which are
> identical can hash differently, and we've already seen this is a
> problem.
> 
> There is one final approach which could work: we give the object a
> constant hash function, so that all objects of that type hash
> identically. This means that the performance of searches and lookups in
> sets and dicts will fall to that of lists. There is no point in paying
> all the extra overhead of a dict to get behaviour as slow, or slower,
> than a list.
> 
> In other words, no matter what you do, using mutable objects as keys or
> set elements leads to serious problems that need to be dealt with. It
> simply isn't true that all you need to do to make mutable objects usable
> in dicts or sets is to add a hash function. That part is trivial.

I agree with you, and in fact I'm inserting tuples in my set. All the 
workaroun to use somehow mutable object are poor attemps to solve in a 
quick-and-dirty way a difficult problem like hashing. But I think that 
during the discussion we have slowly forgot the main topic of my 
question, that was insert unique objects in a container. Hash are good to 
retrieve items in constant time, and when we are dealing with collisions 
we have to provide some solutions, like chaining or open addressing. Note 
also that in hash collisions happen and the hash function is used to 
retrieve items, not to insert unique items.



More information about the Python-list mailing list