[Python-Dev] Iterators (PEP 234)

M.-A. Lemburg mal@lemburg.com
Tue, 06 Feb 2001 12:54:50 +0100


Ka-Ping Yee wrote:
> 
> On 5 Feb 2001, M.-A. Lemburg wrote:
> > > > The .iterator() method would have to return an object which
> > > > provides an iterator API (at C level to get the best performance).
> > >
> > > Okay, provide an example.  Write this iterator() method in Python.
> > > Now answer: how does 'for' know whether the thing to the right of
> > > 'in' is an iterator or a sequence?
> >
> > Simple: have the for-loop test for a type slot and have
> > it fallback to __getitem__ in case it doesn't find the slot API.
> 
> For the third time: write an example, please.  It will help a lot.

Ping, what do you need an example for ? The above sentence says
it all:

for x in obj:
    ...

This will work as follows:

1. if obj exposes the iteration slot, say tp_nextitem, the for loop
   will call this slot without argument and assign the returned
   object to x

2. if obj does not expose tp_nextitem, then the for loop will
   construct an integer starting at 0 and pass this to the
   sq_item slot or __getitem__ method and assign the returned
   value to x; the integer is then replaced with an incremented
   integer

3. both techniques work until the slot or method in question
   returns an IndexError exception

The current implementation doesn't have 1. This is the only
addition it takes to get iterators to work together well with
the for-loop -- there are no backward compatibility issues here,
because the tp_nextitem slot will be a new one.

Since the for-loop can avoid creating temporary integers,
iterations will generally run a lot faster than before. Also,
iterators have access to the object's internal representation,
so data access is also faster.

> > Sorry, Ping, I didn't know you have a PEP for iterators already.
> 
> I posted it on this very boutique (i mean, mailing list) a week ago
> and messages have been going back and forth on its thread since then.
> 
> On 31 Jan 2001, Ka-Ping Yee wrote:
> | Okay, i have written a draft PEP that tries to combine the
> | "elt in dict", custom iterator, and "for k:v" issues into a
> | coherent proposal.  Have a look:
> |
> |     http://www.lfw.org/python/pep-iterators.txt
> |     http://www.lfw.org/python/pep-iterators.html
> 
> Okay.  I apologize for my impatient tone, as it comes from the
> expectation that anyone would have read the document before trying
> to discuss it.  I am very happy to get *new* information, the
> discovery of new errors in my thinking, better and interesting
> arguments; it's just that it's exasperating to see arguments
> repeated that were already made, or objections raised that were
> already carefully thought out and addressed.  From now on, i'll
> stop resisting the urge to paste the text of proposals inline
> (instead of politely posting just URLs) so you won't miss them.

I must have missed those postings... don't have time to read
all of python-dev anymore :-(
 
> > Done. Didn't know it exists, though (why isn't the PEP#
> > in the subject line ?).
> 
> It didn't have a number at the time i posted it.  Thank you
> for updating the subject line.
> 
> > Since the object can have multiple methods to construct
> > iterators, all you need is *one* iterator API. You don't
> > need a slot which returns an iterator object -- leave
> > that decision to the programmer, e.g. you can have:
> >
> > for key in dict.xkeys():
> > for value in dict.xvalues():
> > for items in dict.xitems():
> 
> Three points:
> 
> 1.  We have syntactic support for mapping creation and lookup,
> and syntactic support for mapping iteration should mirror it.
> 
> 2.  IMHO
> 
>     for key:value in dict:
> 
> is much easier to read and explain than
> 
>     for (key, value) in dict.xitems():
> 
> (Greg?  Could you test this claim with a survey question?)
> 
> To the newcomer, the former is easy to understand at a surface
> level.  The latter exposes the implementation (an implementation
> that is still there in PEP 234, but that the programmer only has
> to worry about if they are going deeper and writing custom
> iteration behaviour).  This separates the work of learning into
> two small, digestible pieces.

Tuples are well-known basic Python types. Why should 
(key,value) be any harder to understand than key:value.
What would you tell a newbie that writes:

for key:value in sequence:
    ....

where sequence is a list of tuples and finds that this doesn't
work ?

Besides, the items() method has been around for ages, so switching
from .items() to .xitems() in programs will be just as easy as
switching from range() to xrange().

I am -0 on the key:value thingie. If you want it as a way to
construct or split associations, fine. But it is really not
necessary to be able to iterate over dictionaries.

> 3. Furthermore, this still doesn't solve the backward-compatibility
> problem that PEP 234 takes great care to address!  If you write
> your for-loops
> 
>     for (key, value) in dict.xitems():
> 
> then you are screwed if you try to replace dict with any kind of
> user-implemented dictionary-like replacement (since you'd have to
> go back and implement the xitems() method on everything).

Why is that ? You'd just have to add .xitems() to UserDict and
be done with it. This is how we have added new dictionary methods
all along. I don't see your point here.

Sure, if you want to use a new feature you will have to think
about whether it can be used with your data-types. What you
are trying to do here is maintain forward compatibility at the
cost of making iteration much more complicated than it really is.
 
> If, in order to maintain compatibility with the existing de-facto
> dictionary interface, you write your for-loops
> 
>     for (key, value) in dict.items():
> 
> then now you are screwed if dict is a built-in dictionary, since
> items() is supposed to construct a list, not an iterator.

I'm not breaking backward compatibility -- the above will still
work like it has before since lists don't have the tp_nextitem
slot.
 
> > for entry in matrix.xrow(1):
> > for entry in matrix.xcolumn(2):
> > for entry in matrix.xdiag():
> 
> These are fine, since matrices are not core data types with
> syntactic support or a de-facto emulation protocol.
> 
> > for i,element in sequence.xrange():
> 
> This is just as bad as the xitems() issue above -- probably worse --
> since nobody implements xrange() on sequence-like objects, so now
> you've broken compatibility with all of those.
> 
> We want this feature to smoothly extend and work with existing objects
> with a minimum of rewriting, ideally none.  PEP 234 achieves this ideal.

Again, you are trying to achieve forward compatibility. If people
want better performance, than they will have to add new functionality
to their types -- one way or another.
 
> > Since for-loops can check for the type slot, they can use an
> > optimized implementation which avoids the creation of
> > temporary integer objects and leave the state-keeping to the
> > iterator which can usually provide a C based storage for it with
> > much better performance.
> 
> This statement, i believe, is orthogonal to both proposals.
> 
> > Note that with this kind of interface, there is no need to
> > add "Mapping Iterators" or "Sequence Iterators" as special
> > cases, since these are easily implemented using the above
> > iterators.
> 
> I think this really just comes down to one key difference
> between our points of view here.  Correct me if you disagree:
> 
>     You seem to be suggesting that we should only consider a
>     protocol for sequences, whereas PEP 234 talks about both
>     sequences and mappings.

No. I'm suggesting to add a low-level "give me the next item
in the bag" and move the "how to get the next item" logic
into an iterator object.
 
This will still allow you to iterate over sequences and 
mappings, so I don't understand why you keep argueing for
adding new syntax and slots to be able to iterate over 
dictionaries.

> I argue that consideration for mappings is worthwhile because:
> 
>     1. Dictionaries are a built-in type with syntactic and
>        core implementation support.
> 
>     2. Iteration over dictionaries is very common and should
>        be spelled in an easily understood fashion.
> 
>     3. Both sequence and mapping protocols are formalized in
>        the core (with PySequenceMethods and PyMappingMethods).
> 
>     4. Both sequence and mapping protocols are documented and
>        used in Python (__getitem__, keys, values, etc.).
> 
>     5. There are many, many sequence-like and mapping-like
>        objects out there, implemented both in Python and in C,
>        which adhere to these protocols.
> 
> (There is also the not-insignificant side benefit of finally
> having a decent way to get the indices while you're iterating
> over a sequence, which i've wanted fairly often.)

Agreed. 

I'd suggest to implement generic iterators which implements your
suggestions and put them into the builins or a special iterator
module...

from iterators import xitems, xkeys, xvalues

for key, value in xitems(dict):
for key in xkeys(dict):
for value in xvalues(dict):

Other objects can then still have their own iterators by exposing 
special methods which construct special iterators. The for-loop 
will continue to work as always and happily accept __getitem__ 
compatible or tp_nextitem compatible objects as right-hand argument.

-- 
Marc-Andre Lemburg
______________________________________________________________________
Company:                                        http://www.egenix.com/
Consulting:                                    http://www.lemburg.com/
Python Pages:                           http://www.lemburg.com/python/