[Python-checkins] r55278 - peps/trunk/pep-3119.txt

guido.van.rossum python-checkins at python.org
Sat May 12 01:36:50 CEST 2007


Author: guido.van.rossum
Date: Sat May 12 01:36:47 2007
New Revision: 55278

Modified:
   peps/trunk/pep-3119.txt
Log:
Streamlined the containers section.


Modified: peps/trunk/pep-3119.txt
==============================================================================
--- peps/trunk/pep-3119.txt	(original)
+++ peps/trunk/pep-3119.txt	Sat May 12 01:36:47 2007
@@ -7,7 +7,7 @@
 Type: Standards Track
 Content-Type: text/x-rst
 Created: 18-Apr-2007
-Post-History: 26-Apr-2007
+Post-History: 26-Apr-2007, 11-May-2007
 
 
 Abstract
@@ -30,6 +30,9 @@
 Functions (GFs), but about clarifying philosophical issues like "what
 makes a set", "what makes a mapping" and "what makes a sequence".
 
+There's also a companion PEP 3141, which defines ABCs for numeric
+types.
+
 
 Acknowledgements
 ----------------
@@ -329,7 +332,7 @@
 
     C()  # works
 
-**Notes:** The ``@abstractmethod`` decorator should only be used
+**Note:** The ``@abstractmethod`` decorator should only be used
 inside a class body, and only for classes whose metaclass is (derived
 from) ``ABCMeta``.  Dynamically adding abstract methods to a class, or
 attempting to modify the abstraction status of a method or class once
@@ -337,6 +340,11 @@
 affects subclasses derived using regular inheritance; "virtual
 subclasses" registered with the ``register()`` method are not affected.
 
+It has been suggested that we should also provide a way to define
+abstract data attributes.  As it is easy to add these in a later
+stage, and as the use case is considerably less common (apart from
+pure documentation), we punt on this for now.
+
 **Implementation:** The ``@abstractmethod`` decorator sets the
 function attribute ``__isabstractmethod__`` to the value ``True``.
 The ``ABCMeta.__new__`` method computes the type attribute
@@ -358,19 +366,14 @@
 useful as an end-point for a super-call in framework using a
 cooperative multiple-inheritance [7]_, [8]_.
 
-**Open issues:** Should we also provide a standard way to declare
-abstract data attributes?  If so, how should these be spelled?
-Perhaps place ``@abstractattribute`` decorators on properties?  Or use
-an ``@attributes(name1, name2, ...)`` class decorator?  Strawman:
-let's back off on this; it's easy enough to add this later.
-
 
 ABCs for Containers and Iterators
 ---------------------------------
 
 The ``collections`` module will define ABCs necessary and sufficient
 to work with sets, mappings, sequences, and some helper types such as
-iterators and dictionary views.
+iterators and dictionary views.  All ABCs have the above-mentioned
+``ABCMeta`` as their metaclass.
 
 The ABCs provide implementations of their abstract methods that are
 technically valid but fairly useless; e.g. ``__hash__`` returns 0, and
@@ -381,7 +384,8 @@
 Some ABCs also provide concrete (i.e. non-abstract) methods; for
 example, the ``Iterator`` class has an ``__iter__`` method returning
 itself, fulfilling an important invariant of iterators (which in
-Python 2 has to be implemented anew by each iterator class).
+Python 2 has to be implemented anew by each iterator class).  These
+ABCs can be considered "mix-in" classes.
 
 No ABCs defined in the PEP override ``__init__``, ``__new__``,
 ``__str__`` or ``__repr__``.  Defining a standard constructor
@@ -390,27 +394,19 @@
 representation for a collection is similarly left up to individual
 implementations.
 
-
-Ordering ABCs
-'''''''''''''
-
-These ABCs are closer to ``object`` in the ABC hierarchy.
-
-``PartiallyOrdered``
-    This ABC defines the 4 inequality operations ``<``, ``<=``, ``>=``,
-    ``>``.  (Note that ``==`` and ``!=`` are defined by ``object``.)
-    Classes deriving from this ABC should implement a partial order
-    as defined in mathematics.  [9]_
-
-``TotallyOrdered``
-    This ABC derives from ``PartiallyOrdered``.  It adds no new
-    operations but implies a promise of stronger invariants.
-    Classes deriving from this ABC should implement a total order
-    as defined in mathematics.  [10]_
-
-**Open issues:** Where should these live?  The ``collections`` module
-doesn't seem right, but making them built-ins seems a slippery slope
-too.
+**Note:** There are no ABCs for ordering operations (``__lt__``,
+``__le__``, ``__ge__``, ``__gt__``).  Defining these in a base class
+(abstract or not) runs into problems with the accepted type for the
+second operand.  For example, if class ``Ordering`` defined
+``__lt__``, one would assume that for any ``Ordering`` instances ``x``
+and ``y``, ``x < y`` would be defined (even if it just defines a
+partial ordering).  But this cannot be the case: If both ``list`` and
+``str`` derived from ``Ordering``, this would imply that ``[1, 2] <
+(1, 2)`` should be defined (and presumably return False), while in
+fact (in Python 3000!)  such "mixed-mode comparisons" operations are
+explicitly forbidden and raise ``TypeError``.  See PEP 3100 and [14]_
+for more information.  (This is a special case of a more general issue
+with operations that take another argument of the same type: 
 
 
 One Trick Ponies
@@ -418,23 +414,22 @@
 
 These abstract classes represent single methods like ``__iter__`` or
 ``__len__``.
-
+        
 ``Hashable``
     The base class for classes defining ``__hash__``.  The
-    ``__hash__`` method should return an ``Integer`` (see "Numbers"
-    below).  The abstract ``__hash__`` method always returns 0, which
-    is a valid (albeit inefficient) implementation.  **Invariant:** If
-    classes ``C1`` and ``C2`` both derive from ``Hashable``, the
-    condition ``o1 == o2`` must imply ``hash(o1) == hash(o2)`` for all
-    instances ``o1`` of ``C1`` and all instances ``o2`` of ``C2``.
-    IOW, two objects shouldn't compare equal but have different hash
-    values.
+    ``__hash__`` method should return an integer.  The abstract
+    ``__hash__`` method always returns 0, which is a valid (albeit
+    inefficient) implementation.  **Invariant:** If classes ``C1`` and
+    ``C2`` both derive from ``Hashable``, the condition ``o1 == o2``
+    must imply ``hash(o1) == hash(o2)`` for all instances ``o1`` of
+    ``C1`` and all instances ``o2`` of ``C2``.  IOW, two objects
+    should never compare equal but have different hash values.
 
     Another constraint is that hashable objects, once created, should
     never change their value (as compared by ``==``) or their hash
     value.  If a class cannot guarantee this, it should not derive
     from ``Hashable``; if it cannot guarantee this for certain
-    instances only, ``__hash__`` for those instances should raise a
+    instances, ``__hash__`` for those instances should raise a
     ``TypeError`` exception.
 
     **Note:** being an instance of this class does not imply that an
@@ -453,7 +448,11 @@
     The base class for classes defining ``__next__``.  This derives
     from ``Iterable``.  The abstract ``__next__`` method raises
     ``StopIteration``.  The concrete ``__iter__`` method returns
-    ``self``.
+    ``self``.  Note the distinction between ``Iterable`` and
+    ``Iterator``: an ``Iterable`` can be iterated over, i.e. supports
+    the ``__iter__`` methods; an ``Iterator`` is what the built-in
+    function ``iter()`` returns, i.e. supports the ``__next__``
+    method.
 
 ``Sized``
     The base class for classes defining ``__len__``.  The ``__len__``
@@ -471,57 +470,54 @@
     ``Iterable``, then ``(x in o for x in o)`` should be a generator
     yielding only True values for any instance ``o`` of ``C``.
 
-    **Open issues:** strictly speaking, there are three variants of
-    this method's semantics.  The first one is for sets and mappings,
-    which is fast: O(1) or O(log N).  The second one is for membership
-    checking on sequences, which is slow: O(N).  The third one is for
-    subsequence checking on (character or byte) strings, which is also
-    slow: O(N).  Would it make sense to distinguish these?  The
-    signature of the third variant is different, since it takes a
-    sequence (typically of the same type as the method's target)
-    intead of an element.  For now, I'm using the same type for all
-    three.  This means that is is possible for ``x in o`` to be True
-    even though ``x`` is never yielded by ``iter(o)``.  A suggested
-    name for the third form is ``Searchable`` (though people have
-    objected against this name on the grounds that it has the wrong
-    association).
+**Open issues:** Conceivably, instead of using the ABCMeta metaclass,
+these classes could override ``__instancecheck__`` and
+``__subclasscheck__`` to check for the presence of the applicable
+special method; for example::
+
+    class Sized(metaclass=ABCMeta):
+        @abstractmethod
+        def __hash__(self):
+            return 0
+        @classmethod
+        def __instancecheck__(cls, x):
+            return hasattr(x, "__len__")
+        @classmethod
+        def __subclasscheck__(cls, C):
+            return hasattr(C, "__bases__") and hasattr(C, "__len__")
+
+This has the advantage of not requiring explicit registration.
+However, the semantics hard to get exactly right given the confusing
+semantics of instance attributes vs. class attributes, and that a
+class is an instance of its metaclass; the check for ``__bases__`` is
+only an approximation of the desired semantics.  **Strawman:** Let's
+do it, but let's arrange it in such a way that the registration API
+also works.
 
 
 Sets
 ''''
 
-These abstract classes represent various stages of "set-ness".  The
+These abstract classes represent read-only sets and mutable sets.  The
 most fundamental set operation is the membership test, written as ``x
-in s`` and implemented by ``s.__contains__(x)``.  This is already
-taken care of by the `Container`` class defined above.  Therefore, we
-define a set as a sized, iterable container for which certain
+in s`` and implemented by ``s.__contains__(x)``.  This operation is
+already defined by the `Container`` class defined above.  Therefore,
+we define a set as a sized, iterable container for which certain
 invariants from mathematical set theory hold.
 
 The built-in type ``set`` derives from ``MutableSet``.  The built-in
-type ``frozenset`` derives from ``HashableSet``.
-
-You might wonder why we require a set to be sized -- surely certain
-infinite sets can be represented just fine in Python.  For example,
-the set of even integers could be defined like this::
-
-    class EvenIntegers(Container):
-        def __contains__(self, x):
-            return x % 2 == 0
-
-However, such sets have rather limited practical value, and deciding
-whether one such set is a subset of another would be difficult in
-general without using a symbolic algebra package.  So I consider this
-out of the scope of a pragmatic proposal like this.
+type ``frozenset`` derives from ``Set`` and ``Hashable``.
 
 ``Set``
-    This is a sized, iterable, partially ordered container, i.e. a
-    subclass of ``Sized``, ``Iterable``, ``Container`` and
-    ``PartiallyOrdered``.  Not every subset of those three classes is
-    a set though!  Sets have the additional invariant that each
-    element occurs only once (as can be determined by iteration), and
-    in addition sets define concrete operators that implement the
-    inequality operations as subclass/superclass tests.  In general,
-    the invariants for finite sets in mathematics hold. [11]_
+
+    This is a sized, iterable container, i.e., a subclass of
+    ``Sized``, ``Iterable`` and ``Container``.  Not every subclass of
+    those three classes is a set though!  Sets have the additional
+    invariant that each element occurs only once (as can be determined
+    by iteration), and in addition sets define concrete operators that
+    implement the inequality operations as subclass/superclass tests.
+    In general, the invariants for finite sets in mathematics
+    hold. [11]_
 
     Sets with different implementations can be compared safely,
     (usually) efficiently and correctly using the mathematical
@@ -539,57 +535,33 @@
     can be compared to those for equality (and then they always
     compare unequal).
 
+    This class also defines concrete operators to compute union,
+    intersection, symmetric and asymmetric difference, respectively
+    ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``.  These
+    operators should return instances of ``Set``.  The default
+    implementations call the overridable class method
+    ``_from_iterable()`` with an iterable argument.  This factory
+    method's default implementation returns a ``frozenset`` instance;
+    it may be overridden to return another appropriate ``Set``
+    subclass.
+
+    Finally, this class defines a concrete method ``_hash`` which
+    computes the hash value from the elements.  Hashable subclasses of
+    ``Set`` can implement ``__hash__`` by calling ``_hash`` or they
+    can reimplement the same algorithm more efficiently; but the
+    algorithm implemented should be the same.  Currently the algorithm
+    is fully specified only by the source code [15]_.
+
     **Note:** the ``issubset`` and ``issuperset`` methods found on the
     set type in Python 2 are not supported, as these are mostly just
     aliases for ``__le__`` and ``__ge__``.
 
-    **Open issues:** should we define comparison of instances of
-    different concrete set types this way?
-
-``ComposableSet``
-    This is a subclass of ``Set`` that defines abstract operators to
-    compute union, intersection, symmetric and asymmetric difference,
-    respectively ``__or__``, ``__and__``, ``__xor__`` and ``__sub__``.
-    These operators should return instances of ``ComposableSet``.  The
-    abstract implementations return no meaningful values but raise
-    ``NotImplementedError``; this is because any generic
-    implementation would have to create new instances but the ABCs
-    don't (and shouldn't, IMO) provide an API for creating new
-    instances.  The implementations of these operators should ensure
-    that the results match the mathematical definition of set
-    composition. [11]_
-
-    **Open issues:** Should ``__or__`` and friends be abstract or
-    concrete methods?  Making them abstract means that every
-    ComposableSet implementation must reimplement all of them.  But
-    making them concrete begs the question of the actual return type:
-    since the ABC doesn't (and IMO shouldn't) define the constructor
-    signature for subclasses, the concrete implementations in the ABC
-    don't have an API to construct a new instance given an iterable.
-    Perhaps the right choice is to have a static concrete factory
-    function ``fromiterable`` which takes an iterable and returns
-    a ``ComposableSet`` instance.  Subclasses can override this and
-    benefit from the default implementations of ``__or__`` etc.; or
-    they can override ``__or__`` if they want to.
-
-``HashableSet``
-    This is a subclass of both ``ComposableSet`` and ``Hashable``.  It
-    implements a concrete ``__hash__`` method that subclasses should
-    not override; or if they do, the subclass should compute the same
-    hash value.  This is so that sets with different implementations
-    still hash to the same value, so they can be used interchangeably
-    as dictionary keys.  (A similar constraint exists on the hash
-    values for different types of numbers and strings.)
-
-    **Open issues:** Spell out the hash algorithm.  Should there be
-    another ABC that derives from Set and Hashable, but not from
-    Composable?
-
 ``MutableSet``
-    This is a subclass of ``ComposableSet`` implementing additional
-    operations to add and remove elements.  The supported methods have
-    the semantics known from the ``set`` type in Python 2 (except
-    for ``discard``, which is modeled after Java):
+
+    This is a subclass of ``Set`` implementing additional operations
+    to add and remove elements.  The supported methods have the
+    semantics known from the ``set`` type in Python 2 (except for
+    ``discard``, which is modeled after Java):
 
     ``.add(x)``
         Abstract method returning a ``bool`` that adds the element
@@ -635,21 +607,18 @@
 Mappings
 ''''''''
 
-These abstract classes represent various stages of mapping-ness.  The
-``Mapping`` class represents the most common read-only mapping API.
-However, code *accepting* a mapping is encouraged to check for the
-``BasicMapping`` ABC when iteration is not used.  This allows for
-certain "black-box" implementations that can look up values by key but
-don't provide a convenient iteration API.  A hypothetical example
-would be an interface to a hierarchical filesystem, where keys are
-pathnames relative to some root directory.  Iterating over all
-pathnames would presumably take forever, as would counting the number
-of valid pathnames.
+These abstract classes represent read-only mappings and mutable
+mappings.  The ``Mapping`` class represents the most common read-only
+mapping API.
 
 The built-in type ``dict`` derives from ``MutableMapping``.
 
-``BasicMapping``
-    A subclass of ``Container`` defining the following methods:
+``Mapping``
+
+    A subclass of ``Container``, ``Iterable`` and ``Sized``.  The keys
+    of a mapping naturally form a set.  The (key, value) pairs (which
+    must be tuples) are also referred to as items.  The items also
+    form a set.  Methods:
 
     ``.__getitem__(key)``
         Abstract method that returns the value corresponding to
@@ -664,25 +633,13 @@
         Concrete method returning ``True`` if ``self[key]`` does not
         raise ``KeyError``, and ``False`` if it does.
 
-``Mapping``
-    A subclass of ``BasicMapping``, ``Iterable`` and ``Sized``.  The
-    keys of a mapping naturally form a set.  The (key, value) pairs
-    are also referred to as items.  The items also form a set.
-    Methods:
-
     ``.__len__()``
-        Abstract method returning the length of the key set.
+        Abstract method returning the number of distinct keys (i.e.,
+        the length of the key set).
 
     ``.__iter__()``
         Abstract method returning each key in the key set exactly once.
 
-    ``.__eq__(obj)``
-        Concrete method for comparing mappings.  Two mappings, even
-        with different implementations, can be compared for equality,
-        and are considered equal if and only if their item sets are
-        equal.  **Open issues:** should we define comparison of
-        instances of different concrete mapping types this way?
-
     ``.keys()``
         Concrete method returning the key set as a ``Set``.  The
         default concrete implementation returns a "view" on the key
@@ -712,11 +669,6 @@
     i.e. iterating over the items, keys and values should return
     results in the same order.
 
-``HashableMapping``
-    A subclass of ``Mapping`` and ``Hashable``.  The values should be
-    instances of ``Hashable``.  The concrete ``__hash__`` method
-    should be equal to ``hash(m.items())``.
-
 ``MutableMapping``
     A subclass of ``Mapping`` that also implements some standard
     mutating methods.  Abstract methods include ``__setitem__``,
@@ -728,13 +680,15 @@
 Sequences
 '''''''''
 
-These abstract classes represent various stages of sequence-ness.
+These abstract classes represent read-only sequences and mutable
+sequences.
 
 The built-in ``list`` and ``bytes`` types derive from
 ``MutableSequence``.  The built-in ``tuple`` and ``str`` types derive
-from ``HashableSequence``.
+from ``Sequence`` and ``Hashable``.
 
 ``Sequence``
+
     A subclass of ``Iterable``, ``Sized``, ``Container``.  It
     defines a new abstract method ``__getitem__`` that has a somewhat
     complicated signature: when called with an integer, it returns an
@@ -747,15 +701,11 @@
 
     **Open issues:** Other candidate methods, which can all have
     default concrete implementations that only depend on ``__len__``
-    and ``__getitem__`` with an integer argument: __reversed__, index,
-    count, __add__, __mul__, __eq__, __lt__, __le__.
-
-``HashableSequence``
-    A subclass of ``Sequence`` and ``Hashable``.  The concrete
-    ``__hash__`` method should implements the hashing algorithms used
-    by tuples in Python 2.
+    and ``__getitem__`` with an integer argument: ``__reversed__``,
+    ``index``, ``count``, ``__add__``, ``__mul__``.
 
 ``MutableSequence``
+
     A subclass of ``Sequence`` adding some standard mutating methods.
     Abstract mutating methods: ``__setitem__`` (for integer indices as
     well as slices), ``__delitem__`` (ditto), ``insert``, ``append``,
@@ -765,24 +715,14 @@
     ``sort()`` -- that is only required to exist on genuine ``list``
     instances.
 
-**Open issues:** If all the elements of a sequence are totally
-ordered, the sequence itself can be totally ordered with respect to
-other sequences containing corresponding items of the same type.
-Should we reflect this by making ``Sequence`` derive from
-``TotallyOrdered``?  Or ``Partiallyordered``?  Also, we could easily
-define comparison of sequences of different types, so that e.g.
-``(1, 2, 3) == [1, 2, 3]`` and ``(1, 2) < [1, 2, 3]``.  Should we?
-(It might imply ``["a", "b"] == "ab"`` and ``[1, 2] == b"\1\2"``.)
-
 
 Strings
 -------
 
-Python 3000 has two built-in string types: byte strings (``bytes``),
-deriving from ``MutableSequence``, and (Unicode) character strings
-(``str``), deriving from ``HashableSequence``.  They also derive from
-``TotallyOrdered``.  If we were to introduce ``Searchable``, they
-would also derive from that.
+Python 3000 will likely have at least two built-in string types: byte
+strings (``bytes``), deriving from ``MutableSequence``, and (Unicode)
+character strings (``str``), deriving from ``Sequence`` and
+``Hashable``.
 
 **Open issues:** define the base interfaces for these so alternative
 implementations and subclasses know what they are in for.  This may be
@@ -790,29 +730,6 @@
 ``bytes`` type).
 
 
-Numbers
--------
-
-ABCs for numerical types are defined in PEP 3141.
-
-
-Guidelines for Writing ABCs
----------------------------
-
-Some suggestions for writing ABCs:
-
-* Use the ``@abstractmethod`` decorator.
-
-* Define abstract methods that could be useful as an end point when
-  called via a super chain.
-
-* Define concrete methods that are very simple permutations of
-  abstract methods (e.g. ``Mapping.get``).
-
-* Keep abstract classes small, one per use case instead of one per
-  concept.
-
-
 ABCs vs. Alternatives
 =====================
 
@@ -940,6 +857,12 @@
 .. [13] ABCMeta sample implementation
    (http://svn.python.org/view/sandbox/trunk/abc/xyz.py)
 
+.. [14] python-dev email ("Comparing heterogeneous types")
+   http://mail.python.org/pipermail/python-dev/2004-June/045111.html
+
+.. [15] Function ``frozenset_hash()`` in Object/setobject.c
+   (http://svn.python.org/view/python/trunk/Objects/setobject.c)
+
 
 Copyright
 =========


More information about the Python-checkins mailing list