Alternative iterator syntax
Huaiyu Zhu
hzhu at users.sourceforge.net
Wed Feb 21 06:32:03 EST 2001
Following suggestion by Jeff Petkau <jpet at eskimo.com>,
(http://mail.python.org/pipermail/python-list/2001-February/029944.html),
here is a different proposal for iterators.
Like the colon-syntax proposal, it achieves the following
1. It provides an extensible iterator interface without pretending to
provide random access to elements.
2. It resolves the endless "i indexing sequence" debate.
3. It allows performance enhancements to dictionary iteration.
4. It is completely backward compatible.
The gist of this proposal is that iterator syntax and semantics are not
special. In particular,
1. Their semantics are obvious regardless of their origins and context.
2. They are generally applicable, not tied to sequences and mappings.
3. They reduce existing special syntax without introducing new ones.
4. They are backward compatible in a straightforward way.
5. The same syntax is applicable to "for..in" and "in".
6. It provides a magic hook for the list() function.
The following text contains six sections: definitions, predefined usage,
builtins, issues, rationale, appendix.
-------------------------------------------------------
Definitions of Iterators:
Iterators MUST define three magic methods:
__next__
__contains__
__list__
They MAY also define additional magic methods:
__arity__
__prev__
__first__
__last__
__reset__
These magic methods SHOULD satisfy the following semantic restrictions
when they are defined
__next__() returns items of the iterator one by one. It raises
IndexError at the end.
__contains__(x) returns true iff __next__() would return x at some
point.
__list__() returns a list of all x that would be returned by
__next__().
__arity__() returns n iff x is an n-tuple for every x in __list__().
__prev__() iterates in the opposite direction of __next__()
__first__() and __last__() return the first and last elements
__next__() would have returned.
__reset__() resets the iterator pointer to the start.
They MAY but NEED NOT define a __len__ method, which may or may not
return semantically correct result.
A callable iterator x MUST define x.__call__() == x.__list__(). An easy
way to make an iterator x callable is to set
x.__call__ = x.__list__
Iterator usage in the following contexts are pre-defined:
1. for loops
2. in operator
3. list() function
Since the semantics of each iterator x is completely specified by
x.__list__(), we shall use such description when convenient.
-------------------------------------------------------
Predefined Usage of Iterators:
1. The looping structure:
Consider the structure
for a, b, c, ... in A:
do something
else:
do otherthing
If A defines __next__, the above is interpreted as the equivalence of
(__got is an invisible variable):
__got = 0
while 1:
try:
a, b, c, ... = A.__next__()
__got = 1
do something
except IndexError:
break
if not __got:
do otherthing
In addition, if A.__arity__ is defined, it will be checked before the
loop to avoid ValueError: unpack tuple of wrong size. This concept
itself may also be generalized to allow safe tuple unpacking (see
below).
Else if A defines __list__, then "for x in A:" is interpreted as "for
x in A.__list__():". Note that every list is an iterator (see below).
Else if A is a callable object, then "for x in A:" is interpreted as
"for x in A():".
2. The in operator:
Consider the expression
(a, b, c, ...) in A:
If A defines __contains__, the above is interpreted as the equivalence
of:
A.__contains__((a,b,c, ...))
Else if A defines __list__, then "x in A" is interpreted as "x in
A.__list__()". Note that every list is an iterator (see below).
Else if A is a callable object, then "x in A" is interpreted as "x in
A()".
3. Get full list:
Consider the expression
list(A)
If a.__list__ is defined, the above is interpreted as
A.__list__().
Else it is interpreted as in Python 2.0.
This could be used to short circuit costly __getitem__ iterations
whenever possible.
If A is callable iterator, the above could also be written as
A()
-------------------------------------------------------
Applications to Builtins and Other Common Situations:
1. Sequences. The builtin sequence types (list, tuple, string) are
changed to iterators by adding the three magic methods __next__,
__contains__ and __list__. The following is always true:
x.__list__() == list(x)
In addition, Each sequence object x defines two attributes (indexes,
items) that are themselves iterators. The arity of indexes is 1, that
of items is 2.
x.indexes.__list__() == range(len(x))
x.items.__list__() == zip(range(len(x)), x)
They can be used in the following way
a = "abc"
for i in a.indexes: # i iterate over 0, 1, 2
for i,x in a.items: # i, x iterate over (0,'a'), (1,'b'), (2,'c')
if i in a.indexes: # equiv. to i in range(len(a))
if i,x in a.items: # equiv. to i in a.indexes and a[i]==x.
b = a.items() # equiv. to b = zip(range(len(a)), a)
2. Mappings. Each mapping object x defines three attributes (keys,
values, items) that are callable iterators. The arity of keys and
values is 1, that of items is 2.
x.keys.__list__() == x.keys()
x.values.__list__() == x.values()
x.items.__list__() == x.items()
They can be used in the following way
d = {...}
for k in d.keys: # lazy iteration
for v in d.values: # lazy iteration
for k,v in d.items: # lazy iteration
if k in d.keys: # equiv. to d.has_key(k)
if v in d.values: # equiv. to v in d.values()
if k, v in d.items: # equiv. to d.has_key(k) and d[k]==v.
Furthermore, all existing codes which treat them as methods continue
to work, thanks to equivalences like
d.keys() == d.keys.__call__() == d.keys.__list__()
It is easy to convert existing code to the new syntax with global
search and replacement in an editor:
Replace all ".items():" with ".items:"
Replace all ".keys():" with ".keys:"
Replace all ".values():" with ".values:"
Such replacements have net effect of reducing code clutter. Except
the last one, they may also improve performance.
3. Splitting of texts. Several line-reading methods can be consolidated:
for line in file.readlines: # equivalent to file.xreadlines()
for line in x.splitlines: # lazy split
lines = file.readlines(): # just as now
line = file.readline() # equiv. to file.readlines.__next__()
for c in sys.stdin.readchars: # if the os allows this
It is possible to define additional attributes of iterators that are
also iterators. Here are some possibilities:
for x,y,z in file.readlines.splitwords:
# iterate over lines split as words
for block in file.read.split("\n\n"):
# split a file into empty-line separated blocks
# without reading the whole file in.
4. The Iterator class. An iterator class could be included with
standard python, so that one could write:
a = Iterator([1,2,3]) # Or a = Iterator(); a.__list__ = [1,2,3]
for x in a: do something
This is more useful compared with plain "for x in [1,2,3]" in
situations where iterators are passed around, where iterations could
be interrupted, or where iterators have attributes that are also
iterators. Users could subclass this to achieve the effects they
want.
One example is the above read.split. Another example is when you want
a server to start returning some results before it gets them all
for record in database.query(question):
This is by no means tied to existing sequence and mapping classes. It
also provides easy syntax to express ideas like
for k in dict.keys.sorted: # If sorting is done in a different thread
# it does not need to be completed before
# the first key is returned.
Many operations on list could be well defined on iterators without
generating a new list. Subclasses of Iterator can be defined to do
things like
map(func, iterator) # return an iterator without a new list
[f(x) for x in iterator] # ditto
filter(func, iterator) # return an iterator without a new list
[x for x in iterator if func(x)] # ditto
-------------------------------------------------------
Compatibility and Conversion Issues:
This proposal introduces net gains for users without additional cost.
It is completely backward compatible in application code. But to take
advantage of the new features, it is necessary to make certain changes
in user code.
First consider what it takes to use iterators. For example, consider
UserDict. To use the iterators, it is necessary to change
for x in d.keys():
to
for x in d.keys:
For old code applied to new objects (where d.keys is iterator), there is
no advantage, because d.keys() would generate a new list. There is no
harm either. This is why d.keys must be a callable iterator.
For new code applied to old objects (where d.keys is a method), the
looping structure is interpreted as over d.keys(), which is a list.
There is no harm either. This is why we defines for loop specifically
for callable objects. Otherwise this would be a major obstacle for
users converting old code to new ones, because they would have to
remember which subclass of UserDict had be converted so that keys is now
an iterator.
Now consider what it takes to produce iterators. Still take UserDict as
example. It is necessary to change
def keys(self): return self.data.keys()
to
keys = data.keys
Any subclass that redefines keys, values and items also need to be
converted if they want to use iterators. It might be necessary to
introduce one more level of indirection using __getattr__, as
illustrated by the following idiom
def __getattr__(self, name):
if name == "keys": return self.data.keys
elif name == "values": return self.data.values
elif items == "items": return self.data.items
Analogous treatment applies to sequences. For example, in UserList it
is necessary to define
__next__ = data.__next__
def __list__(self): return self.data
__contains__ = data.__contains__
This covers cases like
for x in []:
for x in UserList():
for x in {}.keys:
for x in UserDict().keys:
In conclusion, users can use "x in X" without remembering whether X is
an iterator, a method, or a sequence. Implementers can replace all list
returning methods with iterators whenever necessary, assured that user
codes expecting lists would still get what they expect.
-------------------------------------------------------
Rationale:
1. Following Python tradition, using magic methods is preferred to
adding special syntaxes. Special syntaxes are only added if existing
syntax space is not large enough to call these magic methods, which
is not the case here.
2. Iterators should not be limited to what they can emit, so the items
should be allowed to be tuples of any desired arity. For example, an
iterator over spatial points may be used as "x,y,z in points". This
generality is not possible with the colon based special syntax that
is only applicable to lists and dicts.
3. Iterators should be completely clear about what they iterate over
(keys, values, items, indexes, lines, ...) without the assistance of
context. Not depending on context is one of the major advantages of
python over Perl, and should be adhered to if at all possible. The
proposed syntax do not make any implicit assumptions about what is to
be iterated over.
4. This is completely backward compatible. Any conversion can be done
one by one. It can also be performed by global search and
replacement. This is not so for the special colon syntax.
5. Any conversion to the new syntax has the immediate effect of removing
an empty () while possibly improving performance. Nothing else
changes. The conversion does not require implementers and users to
keep in sync.
6. It plays well with list comprehension, which has a syntax lacking in
delimiters. The alternative colon syntax would have added delimiters
at the wrong places.
7. The arity attribute also solves the problem of tuple unpacking where
the correctness of syntax cannot be determined in advance of an
iteration.
8. The similarity between for loops and in operators offer such a
conceptual simplification that it should not be dropped unless
absolutely necessary.
-------------------------------------------------------
Appendix: Application of the __arity__ syntax to general tuple unpacking.
This is independent of the iterator issue, but it is closely related to
the outputs of iterators. The __arity__ syntax proposed above appears
to also solve tuple unpacking problems elsewhere.
Tuple unpacking is perhaps the only place where the validity of python
assignment syntax depends on the value. It is therefore often desirable
to convey this information before it is too late. The __arity__
attribute may be used for this task.
def func():
__arity__ = 2
x = (1,2)
return x
a, b = func() # Guaranteed success for unpacking
a = func() # a becomes a tuple
a, b, c = func() # UnpackError: attempt to change arity from 2 to 3.
def func():
__arity__ = 2
x = 2
return x
a, b = func() # ArityError: func arity is 2, but returns 1.
a = func() # ArityError: func arity is 2, but returns 1.
a, b, c = func() # UnpackError: attempt to change arity from 2 to 3.
Both UnpackError and ArityError are subclasses of ValueError. This
could be implemented this way: If the left hand side of an assignment is
a tuple while the right hand side is a call to f, check that the arity
of the tuple matches f.__arity__ unless f.__arity__ is None.
All functions not defining __arity__ attribute default to
__arity__==None. Therefore one can write things like
if f.__arity__ == 1: x = f()
elif f.__arity__ == 2: x,y = f()
else: items = f()
Huaiyu Zhu <hzhu at users.sourceforge.net>
More information about the Python-list
mailing list