Alternative iterator syntax

Wed Feb 21 06:32:03 EST 2001

Following suggestion by Jeff Petkau <jpet at eskimo.com>,
(http://mail.python.org/pipermail/python-list/2001-February/029944.html),
here is a different proposal for iterators.

Like the colon-syntax proposal, it achieves the following

1. It provides an extensible iterator interface without pretending to
   provide random access to elements.
2. It resolves the endless "i indexing sequence" debate.
3. It allows performance enhancements to dictionary iteration.
4. It is completely backward compatible.

The gist of this proposal is that iterator syntax and semantics are not
special.  In particular,

1. Their semantics are obvious regardless of their origins and context.
2. They are generally applicable, not tied to sequences and mappings.
3. They reduce existing special syntax without introducing new ones.
4. They are backward compatible in a straightforward way.
5. The same syntax is applicable to "for..in"  and "in".
6. It provides a magic hook for the list() function.

The following text contains six sections: definitions, predefined usage,
builtins, issues, rationale, appendix.

-------------------------------------------------------

Definitions of Iterators:

Iterators MUST define three magic methods:
    __next__
    __contains__
    __list__

They MAY also define additional magic methods:
    __arity__
    __prev__
    __first__
    __last__
    __reset__

These magic methods SHOULD satisfy the following semantic restrictions
when they are defined

    __next__() returns items of the iterator one by one.  It raises
    IndexError at the end.

    __contains__(x) returns true iff __next__() would return x at some
    point.

    __list__() returns a list of all x that would be returned by
    __next__().

    __arity__() returns n iff x is an n-tuple for every x in __list__().

    __prev__() iterates in the opposite direction of __next__()

    __first__() and __last__() return the first and last elements
    __next__() would have returned.

    __reset__() resets the iterator pointer to the start.

They MAY but NEED NOT define a __len__ method, which may or may not
return semantically correct result.

A callable iterator x MUST define x.__call__() == x.__list__().  An easy
way to make an iterator x callable is to set

    x.__call__ = x.__list__

Iterator usage in the following contexts are pre-defined:
    1. for loops
    2. in operator
    3. list() function

Since the semantics of each iterator x is completely specified by
x.__list__(), we shall use such description when convenient.

-------------------------------------------------------

Predefined Usage of Iterators:

1. The looping structure:

  Consider the structure

    for a, b, c, ... in A:
        do something
    else:
        do otherthing

  If A defines __next__, the above is interpreted as the equivalence of
  (__got is an invisible variable):

    __got = 0
    while 1:
        try:
            a, b, c, ... = A.__next__()
            __got = 1
            do something
        except IndexError:
            break
    if not __got:
        do otherthing

  In addition, if A.__arity__ is defined, it will be checked before the
  loop to avoid ValueError: unpack tuple of wrong size.  This concept
  itself may also be generalized to allow safe tuple unpacking (see
  below).

  Else if A defines __list__, then "for x in A:" is interpreted as "for
  x in A.__list__():".  Note that every list is an iterator (see below).

  Else if A is a callable object, then "for x in A:" is interpreted as
  "for x in A():".

2. The in operator:

  Consider the expression
    (a, b, c, ...) in A:

  If A defines __contains__, the above is interpreted as the equivalence
  of:
    A.__contains__((a,b,c, ...))

  Else if A defines __list__, then "x in A" is interpreted as "x in
  A.__list__()".  Note that every list is an iterator (see below).

  Else if A is a callable object, then "x in A" is interpreted as "x in
  A()".

3. Get full list:

  Consider the expression
    list(A)

  If a.__list__ is defined, the above is interpreted as
    A.__list__().

  Else it is interpreted as in Python 2.0.

  This could be used to short circuit costly __getitem__ iterations
  whenever possible.

  If A is callable iterator, the above could also be written as
    A()

-------------------------------------------------------

Applications to Builtins and Other Common Situations:

1. Sequences.  The builtin sequence types (list, tuple, string) are
  changed to iterators by adding the three magic methods __next__,
  __contains__ and __list__.  The following is always true:

    x.__list__() == list(x)

  In addition, Each sequence object x defines two attributes (indexes,
  items) that are themselves iterators.  The arity of indexes is 1, that
  of items is 2.

    x.indexes.__list__() == range(len(x))
    x.items.__list__() == zip(range(len(x)), x)

  They can be used in the following way

    a = "abc"
    for i in a.indexes:   # i iterate over 0, 1, 2
    for i,x in a.items:   # i, x iterate over (0,'a'), (1,'b'), (2,'c')
    if i in a.indexes:    # equiv. to i in range(len(a))
    if i,x in a.items:    # equiv. to i in a.indexes and a[i]==x.
    b = a.items()         # equiv. to b = zip(range(len(a)), a)

2. Mappings.  Each mapping object x defines three attributes (keys,
  values, items) that are callable iterators.  The arity of keys and
  values is 1, that of items is 2.

    x.keys.__list__() == x.keys()
    x.values.__list__() == x.values()
    x.items.__list__() == x.items()

  They can be used in the following way

    d = {...}
    for k in d.keys:        # lazy iteration
    for v in d.values:      # lazy iteration
    for k,v in d.items:     # lazy iteration
    if k in d.keys:         # equiv. to d.has_key(k)
    if v in d.values:       # equiv. to v in d.values()
    if k, v in d.items:     # equiv. to d.has_key(k) and d[k]==v.

  Furthermore, all existing codes which treat them as methods continue
  to work, thanks to equivalences like

    d.keys() == d.keys.__call__() == d.keys.__list__()

  It is easy to convert existing code to the new syntax with global
  search and replacement in an editor:

      Replace all ".items():" with ".items:" 
      Replace all ".keys():" with ".keys:" 
      Replace all ".values():" with ".values:"

  Such replacements have net effect of reducing code clutter.  Except
  the last one, they may also improve performance.

3. Splitting of texts.  Several line-reading methods can be consolidated:

    for line in file.readlines:    # equivalent to file.xreadlines()
    for line in x.splitlines:      # lazy split
    lines = file.readlines():      # just as now
    line = file.readline()         # equiv. to file.readlines.__next__()
    for c in sys.stdin.readchars:  # if the os allows this

  It is possible to define additional attributes of iterators that are
  also iterators.  Here are some possibilities:

    for x,y,z in file.readlines.splitwords:
                          # iterate over lines split as words 

    for block in file.read.split("\n\n"):
                          # split a file into empty-line separated blocks
                          # without reading the whole file in.

4. The Iterator class.  An iterator class could be included with
  standard python, so that one could write:

    a = Iterator([1,2,3])       # Or a = Iterator(); a.__list__ = [1,2,3]
    for x in a: do something

  This is more useful compared with plain "for x in [1,2,3]" in
  situations where iterators are passed around, where iterations could
  be interrupted, or where iterators have attributes that are also
  iterators.  Users could subclass this to achieve the effects they
  want.

  One example is the above read.split.  Another example is when you want
  a server to start returning some results before it gets them all

    for record in database.query(question):

  This is by no means tied to existing sequence and mapping classes.  It
  also provides easy syntax to express ideas like

    for k in dict.keys.sorted:  # If sorting is done in a different thread
                                # it does not need to be completed before
                                # the first key is returned.

  Many operations on list could be well defined on iterators without
  generating a new list.  Subclasses of Iterator can be defined to do
  things like

    map(func, iterator)         # return an iterator without a new list
    [f(x) for x in iterator]    # ditto

    filter(func, iterator)      # return an iterator without a new list
    [x for x in iterator if func(x)] # ditto

-------------------------------------------------------

Compatibility and Conversion Issues:

This proposal introduces net gains for users without additional cost.
It is completely backward compatible in application code.  But to take
advantage of the new features, it is necessary to make certain changes
in user code.

First consider what it takes to use iterators.  For example, consider
UserDict.  To use the iterators, it is necessary to change

    for x in d.keys():
to
    for x in d.keys:

For old code applied to new objects (where d.keys is iterator), there is
no advantage, because d.keys() would generate a new list.  There is no
harm either.  This is why d.keys must be a callable iterator.

For new code applied to old objects (where d.keys is a method), the
looping structure is interpreted as over d.keys(), which is a list.
There is no harm either.  This is why we defines for loop specifically
for callable objects.  Otherwise this would be a major obstacle for
users converting old code to new ones, because they would have to
remember which subclass of UserDict had be converted so that keys is now
an iterator.

Now consider what it takes to produce iterators.  Still take UserDict as
example.  It is necessary to change

    def keys(self): return self.data.keys()
to
    keys = data.keys

Any subclass that redefines keys, values and items also need to be
converted if they want to use iterators.  It might be necessary to
introduce one more level of indirection using __getattr__, as
illustrated by the following idiom

    def __getattr__(self, name):
        if name == "keys": return self.data.keys
        elif name == "values": return self.data.values
        elif items == "items": return self.data.items

Analogous treatment applies to sequences.  For example, in UserList it
is necessary to define

    __next__ = data.__next__
    def __list__(self): return self.data
    __contains__ = data.__contains__

This covers cases like

    for x in []:
    for x in UserList():
    for x in {}.keys:
    for x in UserDict().keys:

In conclusion, users can use "x in X" without remembering whether X is
an iterator, a method, or a sequence.  Implementers can replace all list
returning methods with iterators whenever necessary, assured that user
codes expecting lists would still get what they expect.

-------------------------------------------------------

Rationale:

1. Following Python tradition, using magic methods is preferred to
   adding special syntaxes.  Special syntaxes are only added if existing
   syntax space is not large enough to call these magic methods, which
   is not the case here.

2. Iterators should not be limited to what they can emit, so the items
   should be allowed to be tuples of any desired arity.  For example, an
   iterator over spatial points may be used as "x,y,z in points".  This
   generality is not possible with the colon based special syntax that
   is only applicable to lists and dicts.

3. Iterators should be completely clear about what they iterate over
   (keys, values, items, indexes, lines, ...) without the assistance of
   context.  Not depending on context is one of the major advantages of
   python over Perl, and should be adhered to if at all possible.  The
   proposed syntax do not make any implicit assumptions about what is to
   be iterated over.

4. This is completely backward compatible.  Any conversion can be done
   one by one. It can also be performed by global search and
   replacement.  This is not so for the special colon syntax.

5. Any conversion to the new syntax has the immediate effect of removing
   an empty () while possibly improving performance.  Nothing else
   changes.  The conversion does not require implementers and users to
   keep in sync.

6. It plays well with list comprehension, which has a syntax lacking in
   delimiters.  The alternative colon syntax would have added delimiters
   at the wrong places.

7. The arity attribute also solves the problem of tuple unpacking where
   the correctness of syntax cannot be determined in advance of an
   iteration.

8. The similarity between for loops and in operators offer such a
   conceptual simplification that it should not be dropped unless
   absolutely necessary.

-------------------------------------------------------

Appendix: Application of the __arity__ syntax to general tuple unpacking.

This is independent of the iterator issue, but it is closely related to
the outputs of iterators.  The __arity__ syntax proposed above appears
to also solve tuple unpacking problems elsewhere.

Tuple unpacking is perhaps the only place where the validity of python
assignment syntax depends on the value.  It is therefore often desirable
to convey this information before it is too late.  The __arity__
attribute may be used for this task.

    def func():
        __arity__ = 2
        x = (1,2)
        return x

    a, b = func()        # Guaranteed success for unpacking
    a = func()           # a becomes a tuple
    a, b, c = func()     # UnpackError: attempt to change arity from 2 to 3.

    def func():
        __arity__ = 2
        x = 2
        return x
    a, b = func()        # ArityError: func arity is 2, but returns 1.
    a = func()           # ArityError: func arity is 2, but returns 1.
    a, b, c = func()     # UnpackError: attempt to change arity from 2 to 3.

Both UnpackError and ArityError are subclasses of ValueError.  This
could be implemented this way: If the left hand side of an assignment is
a tuple while the right hand side is a call to f, check that the arity
of the tuple matches f.__arity__ unless f.__arity__ is None.

All functions not defining __arity__ attribute default to
__arity__==None.  Therefore one can write things like

    if f.__arity__ == 1: x = f()
    elif f.__arity__ == 2: x,y = f()
    else: items = f()

Huaiyu Zhu   <hzhu at users.sourceforge.net>