[Python-checkins] r53151 - peps/trunk/pep-3106.txt
guido.van.rossum
python-checkins at python.org
Sat Dec 23 06:22:30 CET 2006
Author: guido.van.rossum
Date: Sat Dec 23 06:22:30 2006
New Revision: 53151
Modified:
peps/trunk/pep-3106.txt
Log:
Clarified lots of issues, added more open issues.
Modified: peps/trunk/pep-3106.txt
==============================================================================
--- peps/trunk/pep-3106.txt (original)
+++ peps/trunk/pep-3106.txt Sat Dec 23 06:22:30 2006
@@ -15,9 +15,10 @@
This PEP proposes to change the .keys(), .values() and .items()
methods of the built-in dict type to return a set-like or
-multiset-like object whose contents are derived of the underlying
-dictionary rather than a list which is a copy of the keys, etc.; and
-to remove the .iterkeys(), .itervalues() and .iteritems() methods.
+multiset-like (== bag-like) object whose contents are derived of the
+underlying dictionary rather than a list which is a copy of the keys,
+etc.; and to remove the .iterkeys(), .itervalues() and .iteritems()
+methods.
The approach is inspired by that taken in the Java Collections
Framework [1]_.
@@ -31,40 +32,43 @@
.itervalues() and .iteritems(). The idea is that code that currently
(in 2.x) reads::
- for x in d.iterkeys(): ...
+ for k, v in d.iteritems(): ...
should be rewritten as::
- for x in d.keys(): ...
+ for k, v in d.items(): ...
-and code that currently reads::
+(and similar for .itervalues() and .iterkeys(), except the latter is
+redundant since we can write that loop as ``for k in d``.)
+
+Code that currently reads::
a = d.keys() # assume we really want a list here
-should be rewritten as
+(etc.) should be rewritten as
a = list(d.keys())
There are (at least) two ways to accomplish this. The original plan
was to simply let .keys(), .values() and .items() return an iterator,
-i.e. exactly what iterkeys(), itervalues() and iteritems() return
-in Python 2.x. However, the Java Collections Framework [1]_ suggests
+i.e. exactly what iterkeys(), itervalues() and iteritems() return in
+Python 2.x. However, the Java Collections Framework [1]_ suggests
that a better solution is possible: the methods return objects with
-set behavior (for .keys() and .items()) or multiset behavior (for
-.values()) that do not contain copies of the keys, values or items,
-but rather reference the underlying dict and pull their values out of
-the dict as needed.
+set behavior (for .keys() and .items()) or multiset (== bag) behavior
+(for .values()) that do not contain copies of the keys, values or
+items, but rather reference the underlying dict and pull their values
+out of the dict as needed.
The advantage of this approach is that one can still write code like
this::
- a = d.keys()
- for x in a: ...
- for x in a: ...
-
-Effectively, iter(d.keys()) in Python 3.0 does what d.iterkeys() does
-in Python 2.x; but in most contexts we don't have to write the iter()
-call because it is implied by a for-loop.
+ a = d.items()
+ for k, v in a: ...
+ for k, v in a: ...
+
+Effectively, iter(d.keys()) (etc.) in Python 3.0 will do what
+d.iterkeys() (etc.) does in Python 2.x; but in most contexts we don't
+have to write the iter() call because it is implied by a for-loop.
The objects returned by the .keys() and .items() methods behave like
sets with limited mutability; they allow removing elements, but not
@@ -83,9 +87,9 @@
if a.keys() == b.keys(): ...
and similarly for values. (Two multisets are deemed equal if they
-have the same elements with the same cardinalities,
-e.g. the multiset {1, 2, 2} is equal to the multiset {2, 1, 2} but
-differs from the multiset {1, 2}.)
+have the same elements with the same cardinalities, e.g. the multiset
+{1, 2, 2} is equal to the multiset {2, 1, 2} but differs from the
+multiset {1, 2}.)
These operations are thread-safe only to the extent that using them in
a thread-unsafe way may cause an exception but will not cause
@@ -108,11 +112,13 @@
Specification
=============
-I'll try pseudo-code to specify the semantics::
+I'm using pseudo-code to specify the semantics::
class dict:
- # Omitting all other dict methods for brevity
+ # Omitting all other dict methods for brevity.
+ # The .iterkeys(), .itervalues() and .iteritems() methods
+ # will be removed.
def keys(self):
return d_keys(self)
@@ -151,18 +157,23 @@
def clear(self):
self.__d.clear()
- def copy(self):
- return set(self)
-
# The following operations should be implemented to be
# compatible with sets; this can be done by exploiting
- # the above primitive operations:
+ # the above primitive operations:
#
# <, <=, ==, !=, >=, > (returning a bool)
# &, |, ^, - (returning a new, real set object)
# &=, -= (updating in place and returning self; but not |=, ^=)
#
# as well as their method counterparts (.union(), etc.).
+ #
+ # To specify the semantics, we can specify x == y as:
+ #
+ # set(x) == set(y) if both x and y are d_keys instances
+ # set(x) == y if x is a d_keys instance
+ # x == set(y) if y is a d_keys instance
+ #
+ # and so on for all other operations.
class d_items:
@@ -180,9 +191,13 @@
yield key, self.__d[key]
def remove(self, (key, value)):
+ if (key, value) not in self:
+ raise KeyError((key, value))
del self.__d[key]
def discard(self, item):
+ # Defined in terms of 'in' and .remove() so overriding
+ # those will update discard appropriately.
if item in self:
self.remove(item)
@@ -192,10 +207,55 @@
def clear(self):
self.__d.clear()
- def copy(self):
- return set(self)
-
- # As well as the same set operations as mentioned for d_keys above.
+ # As well as the set operations mentioned for d_keys above.
+ # However the specifications suggested there will not work if
+ # the values aren't hashable. Fortunately, the operations can
+ # still be implemented efficiently. For example, this is how
+ # intersection can be specified:
+
+ def __and__(self, other):
+ if isinstance(other, (set, frozenset, d_keys)):
+ result = set()
+ for item in other:
+ if item in self:
+ result.add(item)
+ return result
+ if not isinstance(other, d_items):
+ return NotImplemented
+ d = {}
+ if len(other) < len(self):
+ self, other = other, self
+ for item in self:
+ if item in other:
+ key, value = item
+ d[key] = value
+ return d.items()
+
+ # And here is equality:
+
+ def __eq__(self, other):
+ if isinstance(other, (set, frozenset, d_keys)):
+ if len(self) != len(other):
+ return False
+ for item in other:
+ if item not in self:
+ return False
+ return True
+ if not isinstance(other, d_items):
+ return NotImplemented
+ if len(self) != len(other):
+ return False
+ for item in self:
+ if item not in other:
+ return False
+ return True
+
+ def __ne__(self, other):
+ # XXX Perhaps object.__ne__() should be defined this way.
+ result = self.__eq__(other)
+ if result is not NotImplemented:
+ result = not result
+ return result
class d_values:
@@ -206,7 +266,9 @@
return len(self.__d)
def __contains__(self, value):
- # Slow! Do we even want to implement this?
+ # This is slow, and it's what "x in y" uses as a fallback
+ # if __contains__ is not defined; but I'd rather make it
+ # explicit that it is supported.
for v in self:
if v == value:
return True
@@ -216,32 +278,71 @@
for key in self.__d:
yield self.__d[key]
- # Do we care about the following?
+ def __eq__(self, other):
+ if not isinstance(other, d_values):
+ return NotImplemented
+ if len(self) != len(other):
+ return False
+ # XXX Sometimes this could be optimized, but these are the
+ # semantics: we can't depend on the values to be hashable
+ # or comparable.
+ o = list(other)
+ for x in self:
+ try:
+ o.remove(x)
+ except ValueError:
+ return False
+ return True
+
+ def __ne__(self, other):
+ result = self.__eq__(other)
+ if result is not NotImplemented:
+ result = not result
+ return result
+
+Note that we don't implement .copy() -- the presence of a .copy()
+method suggests that the copy has the same type as the original, but
+that's not feasible without copying the underlying dict. If you want
+a copy of a specific type, like list or set, you can just pass one
+of the above to the list() or set() constructor.
- def pop(self):
- return self.__d.popitem()[1]
- def clear(self):
- return self.__d.clear()
+Open Issues
+===========
- def copy(self):
- # XXX What should this return?
+I've left out the implementation of various set operations. These
+could still present surprises.
- # Should we bother implementing set-like operations on
- # multisets? If so, how about mixed operations on sets and
- # multisets? I'm not sure that these are worth the effort.
-
-I'm soliciting better names than d_keys, d_values and d_items; these
-classes will be public so that their implementations may be reused by
-the .keys(), .values() and .items() methods of other mappings. (Or
-should they?)
+Should d_values have mutating methods (pop(), clear())? Strawman: no.
+Should d_values implement set operations (as defined for multisets).
+Strawman: no.
-Open Issues
-===========
+Should d_keys, d_values and d_items have a public instance variable or
+method through which one can retrieve the underlying dict? Strawman:
+yes (but what should it be called?).
+
+I'm soliciting better names than d_keys, d_values and d_items. These
+classes could be public so that their implementations could be reused
+by the .keys(), .values() and .items() methods of other mappings. Or
+should they?
+
+Should the d_keys, d_values and d_items classes be reusable?
+Strawman: yes.
+
+Should they be subclassable? Strawman: yes (but see below).
+
+A particularly nasty issue is whether operations that are specified in
+terms of other operations (e.g. .discard()) must really be implemented
+in terms of those other operations; this may appear irrelevant but it
+becomes relevant if these classes are ever subclassed. Historically,
+Python has a really poor track record of specifying the semantics of
+highly optimized built-in types clearly in such cases; my strawman is
+to continue that trend. Subclassing may still be useful to *add* new
+methods, for example.
-Should the d_keys, d_values and d_items classes be reusable? Should
-they be subclassable?
+I'll leave the decisions (especially about naming) up to whoever
+submits a working implementation.
References
More information about the Python-checkins
mailing list