Sets in Python

Thu Sep 20 00:02:03 EDT 2007

On Sep 19, 7:17 pm, Steven D'Aprano <st... at REMOVE-THIS-
cybersource.com.au> wrote:
> On Wed, 19 Sep 2007 20:58:03 +0000, Karthik Gurusamy wrote:
> > While it's easy to explain the behavior, I think the decision to dis-
> > allow mutable items as keys is a bit arbitrary. There is no need for
> > dict to recompute hash
>
> What???
>
> Of course it does. How else can it look up the key? Because it (somehow)
> just recognizes that it has seen the key before? How? By magic?

You answered it yourself later. For a mapping service, hash is just
one way to do things. What you need is for each item in the
collection, a unique key.
How you go from the key to the value is not something a programmer
needs to know.
Your mind is set on thinking on hash alone and hence you don't see
beyond it.

>
> > (first of all, a user doesn't even need to know
> > if underneath 'hashing' is used -- the service is just a mapping between
> > one item to another item).
>
> The user doesn't need to know the mechanism, but the dict does. Dicts are
> implemented as hash tables. I suppose they could be implemented as
> something else (what? linear lists? some sort of tree?) but the same
> considerations must be made:

Oh yes. If the keys are all integers (or any set of items that can be
ordered), why not an avl. It has guaranteed O(log N) while a hash in
worst case is O(N). Why you want to tie yourself to the drawbacks of
one datastructure? Understand your goal is not to provide a hash; but
to provide a mapping service.

 the dict must be able to find keys it has
> seen before. How is the dict supposed to recognise the key if the key has
> changed?
>
> > Since we know hashing is used, all that is needed is, a well-defined way
> > to construct a hash out of a mutable. "Given a sequence, how to get a
> > hash" is the problem.
>
> Nonsense. That's not the problem. The problem is how to get exactly the
> same hash when the sequence has changed.

Yes, if you keep thinking hash is the only tool you got.

>
> In other words, if you have two mutable objects M1 and M2, then you
> expect:
>

No. I don't expect. I expect the hash to be different. Why do you keep
thinking it's the mappings responsibility to take care of a changing
key.

> hash(M1) == hash(M2) if and only if M1 and M2 are equal
> hash(M1) != hash(M2) if M1 and M2 are unequal
>
> but also:
>
> if M1 mutates to become equal to M2, hash(M1) must remain the same while
> still being different from hash(M2).
>
> That means that hash() now is a non-deterministic function. hash([1,2,3])
> will vary according to how the list [1,2,3] was constructed!
>
> Obviously something has to give, because not all of these things are
> mutually compatible.
>
> > If later the given sequence is different, that's
> > not the dict's problem.
>
> Data structures don't have problems. Programmers do. And language
> designers with sense build languages that minimize the programmers
> problems, not maximize them.

Yes, here you talk about a different goal altogether. Here comes the
'arbitrary' part I mentioned.

>
> > So if the list changes, it will result in a different hash and we will
> > get a hash-miss. I doubt this is in anyway less intuitive than dis-
> > allowing mutable items as keys.
>
> The choices for the language designer are:
>
> (1) Invent some sort of magical non-deterministic hash function which
> always does the Right Thing.

Nope, just say if the new sequence is different, you don't find the
item in the dict.

>
> (2) Allow the hash of mutable objects to change, which means you can use
> mutable objects as keys in dicts but if you change them, you can no
> longer find them in the dict. They'll still be there, using up memory,
> but you can't get to them.

In the new model, at the time of addition, you need to remember the
key at that time. If it's a list, you make a copy of the items.

>
> (3) Simply disallow mutable objects as keys.
>
> Alternative 1 is impossible, as we've seen, because the requirements for
> the Right Thing are not mutually compatible.
>
> Alternative (2) leads to hard-to-find, hard-to-diagnose bugs where you
> store objects in a dict only for them to mysteriously disappear later.
> Worse, it could lead to bugs like the following hypothetical:

Of course they can be reached with.. for k in dict...
>
> >>> M = [1, 2, 3]
> >>> D = {M: 'parrot'} # pretend this works
> >>> D
>
> {[1, 2, 3]: 'parrot'}>>> M.append(4)
> >>> D
>
> {[1, 2, 3, 4]: 'parrot'}>>> D[[1, 2, 3, 4]]

No, in the new way, the key still remains [1, 2, 3]
What was changed is M. Not the  key given to dict at the time of
addition.
Again I'm not describing today's behavior; it's in the new way.

>
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> KeyError: [1, 2, 3, 4]
>
> Try explaining that one to programmers: they can SEE the key in the dict
> when they print it, but they can't get it or delete it because the hash
> has changed.

No they don't. They see the key at the time of addition ([1,2,3])
>
> Alternative 3 is easy to deal with: simply don't use mutable objects as
> keys. That's what Python does. Sure, the programmer sometimes needs to
> work around the lack (convert the list into a tuple, or a string, or
> pickle it, whatever...) which on rare occasions is hard to do, but at
> least there are no mysterious, hard to track down bugs.

When I first saw key's must'be be mutable, it did appear to me to be
mysterious. There was unnecessary tighter coupling between
implementation details and the service exposed to the programmer. (As
I see it, the root cause of all this is, the dict does not make a copy
of the key at the time of item addition, it just makes a new reference
to the same object)

Karthik

>
> --
> Steven.