[Python-checkins] r53151 - peps/trunk/pep-3106.txt

Sat Dec 23 06:22:30 CET 2006

Author: guido.van.rossum
Date: Sat Dec 23 06:22:30 2006
New Revision: 53151

Modified:
   peps/trunk/pep-3106.txt
Log:
Clarified lots of issues, added more open issues.


Modified: peps/trunk/pep-3106.txt
==============================================================================

--- peps/trunk/pep-3106.txt	(original)
+++ peps/trunk/pep-3106.txt	Sat Dec 23 06:22:30 2006
@@ -15,9 +15,10 @@
 
 This PEP proposes to change the .keys(), .values() and .items()
 methods of the built-in dict type to return a set-like or
-multiset-like object whose contents are derived of the underlying
-dictionary rather than a list which is a copy of the keys, etc.; and
-to remove the .iterkeys(), .itervalues() and .iteritems() methods.
+multiset-like (== bag-like) object whose contents are derived of the
+underlying dictionary rather than a list which is a copy of the keys,
+etc.; and to remove the .iterkeys(), .itervalues() and .iteritems()
+methods.
 
 The approach is inspired by that taken in the Java Collections
 Framework [1]_.
@@ -31,40 +32,43 @@
 .itervalues() and .iteritems().  The idea is that code that currently
 (in 2.x) reads::
 
-    for x in d.iterkeys(): ...
+    for k, v in d.iteritems(): ...
 
 should be rewritten as::
 
-    for x in d.keys(): ...
+    for k, v in d.items(): ...
 
-and code that currently reads::
+(and similar for .itervalues() and .iterkeys(), except the latter is
+redundant since we can write that loop as ``for k in d``.)
+
+Code that currently reads::
 
     a = d.keys()    # assume we really want a list here
 
-should be rewritten as
+(etc.) should be rewritten as
 
     a = list(d.keys())
 
 There are (at least) two ways to accomplish this.  The original plan
 was to simply let .keys(), .values() and .items() return an iterator,
-i.e. exactly what iterkeys(), itervalues() and iteritems() return
-in Python 2.x.  However, the Java Collections Framework [1]_ suggests
+i.e. exactly what iterkeys(), itervalues() and iteritems() return in
+Python 2.x.  However, the Java Collections Framework [1]_ suggests
 that a better solution is possible: the methods return objects with
-set behavior (for .keys() and .items()) or multiset behavior (for
-.values()) that do not contain copies of the keys, values or items,
-but rather reference the underlying dict and pull their values out of
-the dict as needed.
+set behavior (for .keys() and .items()) or multiset (== bag) behavior
+(for .values()) that do not contain copies of the keys, values or
+items, but rather reference the underlying dict and pull their values
+out of the dict as needed.
 
 The advantage of this approach is that one can still write code like
 this::
 
-    a = d.keys()
-    for x in a: ...
-    for x in a: ...
-
-Effectively, iter(d.keys()) in Python 3.0 does what d.iterkeys() does
-in Python 2.x; but in most contexts we don't have to write the iter()
-call because it is implied by a for-loop.
+    a = d.items()
+    for k, v in a: ...
+    for k, v in a: ...
+
+Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what
+d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't
+have to write the iter() call because it is implied by a for-loop.
 
 The objects returned by the .keys() and .items() methods behave like
 sets with limited mutability; they allow removing elements, but not
@@ -83,9 +87,9 @@
     if a.keys() == b.keys(): ...
 
 and similarly for values.  (Two multisets are deemed equal if they
-have the same elements with the same cardinalities,
-e.g. the multiset {1, 2, 2} is equal to the multiset {2, 1, 2} but
-differs from the multiset {1, 2}.)
+have the same elements with the same cardinalities, e.g. the multiset
+{1, 2, 2} is equal to the multiset {2, 1, 2} but differs from the
+multiset {1, 2}.)
 
 These operations are thread-safe only to the extent that using them in
 a thread-unsafe way may cause an exception but will not cause
@@ -108,11 +112,13 @@
 Specification
 =============
 
-I'll try pseudo-code to specify the semantics::
+I'm using pseudo-code to specify the semantics::
 
     class dict:
 
-        # Omitting all other dict methods for brevity
+        # Omitting all other dict methods for brevity.
+        # The .iterkeys(), .itervalues() and .iteritems() methods
+        # will be removed.
 
         def keys(self):
             return d_keys(self)
@@ -151,18 +157,23 @@
         def clear(self):
             self.__d.clear()
 
-        def copy(self):
-            return set(self)
-
         # The following operations should be implemented to be
         # compatible with sets; this can be done by exploiting
-	# the above primitive operations:
+        # the above primitive operations:
         #
         #   <, <=, ==, !=, >=, > (returning a bool)
         #   &, |, ^, - (returning a new, real set object)
         #   &=, -= (updating in place and returning self; but not |=, ^=)
         #
         # as well as their method counterparts (.union(), etc.).
+        #
+        # To specify the semantics, we can specify x == y as:
+        #
+        #   set(x) == set(y)   if both x and y are d_keys instances
+        #   set(x) == y        if x is a d_keys instance
+        #   x == set(y)        if y is a d_keys instance
+        #
+        # and so on for all other operations.
 
     class d_items:
 
@@ -180,9 +191,13 @@
                 yield key, self.__d[key]
 
         def remove(self, (key, value)):
+            if (key, value) not in self:
+                raise KeyError((key, value))
             del self.__d[key]
 
         def discard(self, item):
+            # Defined in terms of 'in' and .remove() so overriding
+            # those will update discard appropriately.
             if item in self:
                 self.remove(item)
 
@@ -192,10 +207,55 @@
         def clear(self):
             self.__d.clear()
 
-        def copy(self):
-            return set(self)
-
-        # As well as the same set operations as mentioned for d_keys above.
+        # As well as the set operations mentioned for d_keys above.
+        # However the specifications suggested there will not work if
+        # the values aren't hashable.  Fortunately, the operations can
+        # still be implemented efficiently.  For example, this is how
+        # intersection can be specified:
+
+        def __and__(self, other):
+            if isinstance(other, (set, frozenset, d_keys)):
+                result = set()
+                for item in other:
+                    if item in self:
+                        result.add(item)
+                return result
+            if not isinstance(other, d_items):
+                return NotImplemented
+            d = {}
+            if len(other) < len(self):
+                self, other = other, self
+            for item in self:
+                if item in other:
+                    key, value = item
+                    d[key] = value
+            return d.items()
+
+        # And here is equality:
+
+        def __eq__(self, other):
+            if isinstance(other, (set, frozenset, d_keys)):
+                if len(self) != len(other):
+                    return False
+                for item in other:
+                    if item not in self:
+                        return False
+                return True
+            if not isinstance(other, d_items):
+                return NotImplemented
+            if len(self) != len(other):
+                return False
+            for item in self:
+                if item not in other:
+                    return False
+            return True
+
+        def __ne__(self, other):
+            # XXX Perhaps object.__ne__() should be defined this way.
+            result = self.__eq__(other)
+            if result is not NotImplemented:
+                result = not result
+            return result
 
     class d_values:
 
@@ -206,7 +266,9 @@
             return len(self.__d)
 
         def __contains__(self, value):
-            # Slow!  Do we even want to implement this?
+            # This is slow, and it's what "x in y" uses as a fallback
+            # if __contains__ is not defined; but I'd rather make it
+            # explicit that it is supported.
             for v in self:
                  if v == value:
                      return True
@@ -216,32 +278,71 @@
             for key in self.__d:
                 yield self.__d[key]
 
-        # Do we care about the following?
+        def __eq__(self, other):
+            if not isinstance(other, d_values):
+                return NotImplemented
+            if len(self) != len(other):
+                return False
+            # XXX Sometimes this could be optimized, but these are the
+            # semantics: we can't depend on the values to be hashable
+            # or comparable.
+            o = list(other)
+            for x in self:
+                try:
+                    o.remove(x)
+                except ValueError:
+                    return False
+            return True
+
+        def __ne__(self, other):
+            result = self.__eq__(other)
+            if result is not NotImplemented:
+                result = not result
+            return result
+
+Note that we don't implement .copy() -- the presence of a .copy()
+method suggests that the copy has the same type as the original, but
+that's not feasible without copying the underlying dict.  If you want
+a copy of a specific type, like list or set, you can just pass one
+of the above to the list() or set() constructor.
 
-        def pop(self):
-            return self.__d.popitem()[1]
 
-        def clear(self):
-            return self.__d.clear()
+Open Issues
+===========
 
-        def copy(self):
-            # XXX What should this return?
+I've left out the implementation of various set operations.  These
+could still present surprises.
 
-        # Should we bother implementing set-like operations on
-        # multisets?  If so, how about mixed operations on sets and
-        # multisets?  I'm not sure that these are worth the effort.
-
-I'm soliciting better names than d_keys, d_values and d_items; these
-classes will be public so that their implementations may be reused by
-the .keys(), .values() and .items() methods of other mappings.  (Or
-should they?)
+Should d_values have mutating methods (pop(), clear())?  Strawman: no.
 
+Should d_values implement set operations (as defined for multisets).
+Strawman: no.
 
-Open Issues
-===========
+Should d_keys, d_values and d_items have a public instance variable or
+method through which one can retrieve the underlying dict?  Strawman:
+yes (but what should it be called?).
+
+I'm soliciting better names than d_keys, d_values and d_items.  These
+classes could be public so that their implementations could be reused
+by the .keys(), .values() and .items() methods of other mappings.  Or
+should they?
+
+Should the d_keys, d_values and d_items classes be reusable?
+Strawman: yes.
+
+Should they be subclassable?  Strawman: yes (but see below).
+
+A particularly nasty issue is whether operations that are specified in
+terms of other operations (e.g. .discard()) must really be implemented
+in terms of those other operations; this may appear irrelevant but it
+becomes relevant if these classes are ever subclassed.  Historically,
+Python has a really poor track record of specifying the semantics of
+highly optimized built-in types clearly in such cases; my strawman is
+to continue that trend.  Subclassing may still be useful to *add* new
+methods, for example.
 
-Should the d_keys, d_values and d_items classes be reusable?  Should
-they be subclassable?
+I'll leave the decisions (especially about naming) up to whoever
+submits a working implementation.
 
 
 References