[Python-3000] Iterators for dict keys, values, and items == annoying :)

Guido van Rossum guido at python.org
Fri Mar 31 00:06:16 CEST 2006


On 3/30/06, Nick Coghlan <ncoghlan at gmail.com> wrote:
> Adam DePrince wrote:
> > There seemed to be a concensus in the community on the size of the view
> > proposal, and I'm reimplementing the PEP to reflect that.  But what I
> > can't resolve is the other anciliary issue: "To list or iter."  I'm not
> > yet ready to resolve that issue.  The views don't resolve it either, and
> > by their nature are biased towards the iter approach.  They provide
> > __iter__ because its light weight to do, but there is no way a light
> > weight view can provide you with ordering information from an unordered
> > datastore.  Now, as a means of resolving this conflict, I'm open to the
> > notion of a view implementing both __iter__ and an explicit .list method
> > to avoid any extra overhead in generating a list from an iter instead of
> > directly from the dict as we do now.
>
> Umm, the whole point of the views discussion is the realisation that "list or
> iterator" is a false dichotomy. The correct answer is "new iterable that looks
> like a container in its own right, but is really just a view of the original".
>
> As far as the value-based comparison goes, yes, in reality the view will need
> to have greater knowledge of the underlying container than in my sample
> classes in order to be sure of getting a consistent ordering from the
> underlying objects.
>
> As Python's own set and dict show, however, unordered collections can be
> legitimately compared by value.

Boy. I definitely need to make time to read this discussion and the PEP.

Java does it this way and I think we can do the same thing:

keys() and items() return views that behave like sets; values()
returns a view that behaves like a collection (aka multiset or bag).
Neither behaves like a list, which means that the order is unspecified
(even though of course iteration reveals an order, there's nothing
that says the order needs to remain the same).

We need to be careful with copying Java's definition of equality
though -- for sets, any two objects implementing the Set interface
must compare equal iff they are contained in each other (the standard
mathematical definition), and for collections they give two options:
reference equality (i.e. they are the same object) or some other
symmetric equality that however cannot equal a set or list containing
the same values (sets can only be equal to other sets, and lists only
to other lists).

That's quite different from Python's equality definitions, which don't
take interfaces into account but only concrete types. Adam correctly
pointed out the bugs in Nick's __eq__ implementation (depending on the
order), but this is easy enough to fix (though expensive to execute
since it would require casting keys and items to sets, and doing some
kind of multiset comparison for values).

But that doesn't answer the question about the following code

 a = {1: "one", 2: "two"}
 b = set(a.keys())
 c = a.keys()     # assuming keys() returns a "set view"
 print b == c

Here b is a concrete set object; c is a set view of a's keys. These
are presumably different types (since the concrete set contains its
own hash table while the set view just contains a reference to the
dict a). So Nick's definition of __eq__ makes them unequal. However
Java would consider them equal (since both implement the set
interface, and both contain the same elements).

We could say that sets and set-like objects should be comparable, but
that requires us to define exactly what we consider set-like; this is
difficult in a world of duck typing. It would also presumably require
us (out of fairness if anything) to allow sequences to be compared for
equality, so [1,2,3] and (1,2,3) would henceforth compare equal. And
similar for mappings. All with the same problems.

(And we'd also have to do it for collections/multisets/bags -- I don't
like Java's cop-out there, and in fact I'd like multisets to be
comparable to sets, with the semantics that a multiset can be equal to
a set only if the multiset has no duplicates.)

But do we really want this? It's a pretty serious change in basic
semantics of collection data types, *and* it requires us to find a way
to determine unequivocally whether something is a set,  sequence,
mapping, or multiset (and it can't be more than one!).

--
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-3000 mailing list