Dictionary .keys() and .values() should return a set [with Python 3000 in mind]

vatamane at gmail.com vatamane at gmail.com
Sat Jul 1 16:01:57 EDT 2006


This has been bothering me for a while. Just want to find out if it
just me or perhaps others have thought of this too: Why shouldn't the
keyset of a dictionary be represented as a set instead of a list? I
know that sets were introduced a lot later and lists/dictionaries were
used instead but I think "the only correct way" now is for the
dictionary keys and values to be sets. Presently {1:0,2:0,3:0}.keys()
will produce [1,2,3] but it could also produce [3,1,2] or [3,2,1] on a
different machine architecture. The Python documentation states that:
"""
Keys and values are listed in an _arbitrary_(my emphasis) order which
is non-random, varies across Python implementations, and depends on the
dictionary's history of insertions and deletions.
"""

So on the same machine it will be the case that:  {1:0, 2:0,
3:0}.keys() == {3:0, 2:0, 1:0}.keys() is True. But if there are 2 lists
of keys of the same dictionary, one is pickled on another machine, with
a different "arbitrary non-random" ordering, then the keys will not be
equal to each other. It seems like the problem could be solved by
returning a set instead.

The same thing goes for the values(). Here most people will argue that
values are not necessarily unique, so a list is more appropriate, but
in fact the values are unique it is just that more than one key could
map to the same value.  What is 'seen' in dictionary is the mapping
rule.  Also the values are not ordered and should not be indexable --
they should be a set. Just as  the keys(), one ordered list of values
from a dictionary on one machine will not necessarily be equal to
another list of values an on another machine, while in fact, they
should be.

On a more fundamental and general level, a dictionary is actually an
explicit function, also called a 'map'. A set of keys, traditionally
called a 'domain' are mapped to a set of values, traditionally called a
'range'. This mapping produces at most a surjective function (i.e. two
or more keys can map to same value and all the values are mapped to by
some keys).  Notice that the traditional counterparts to keys and
values are sets and not just lists. This seems like theory babble, but
in the case of Python staying 'true' to the theory is usually a
GoodThing(tm).

I love Python primarily because it does something in only one, correct,
and reasonable way. The same principle should probably apply to Python
itself including to its built-in data structures.

Of course the compatibilty will be broken. Any code relying on a
certain ordering of keys() or values() returned would need to be
updated. One could argue though that such code was not entirely correct
in the first place to asssume that, and should be fixed anyway.

Obviously this fix should not be in Python 2.X but perhaps would be
worth considering for Python 3000. What does everyone think?




More information about the Python-list mailing list