[Python-Dev] Single- vs. Multi-pass iterability

Alex Martelli aleax@aleax.it
Tue, 9 Jul 2002 12:03:40 +0200


On Tuesday 09 July 2002 11:37 am, David Abrahams wrote:
> From: "Alex Martelli" <aleax@aleax.it>
>
> > On Tuesday 09 July 2002 10:57 am, Moore, Paul wrote:
> > > IIRC from earlier discussions on the list, iterators "by design" do not
> > > expose this information. In C++ terms, all Python iterators are forward
> > > iterators
> >
> > I think they're _input_ iterators -- you can only "get" items through the
> > iterator, not "set" them (as you can with forward, but not input,
> > iterators in C++).
>
> C++ also has forward constant iterators which are not writable. Take the
> const_iterator type of your favorite singly-linked-list implementation for
> example.

Right, but then you can at least get the current item as many times as
you want before advancing -- input iterators and Python's iterators have
in common that get-and-advance is inseparable.


> I don't know if we need them, but I'm certainly finding that not having
> some more information is difficult for me. If I need to make multiple
> passes over the information in a generalized iterable object, the only
> solution AFAICT is to unconditionally copy all the information into a list
> first.

Yes, I can see that making such a copy willy-nilly could be a pity from
the performance viewpoint when, theoretically, one could otherwise
guarantee that the information is inalterable.  But is it all that frequent
that one can make such guarantees, e.g. that the underlying list or
dictionary (if any) cannot possibly be altered (e.g. by other threads)
during multiple iterations over it?  I.e. it might not be enough to know
that you can iterate again if needed -- you might also need some
guarantee that further iterations yield identical information, and that,
in turn, might prove more problematic in many cases (although maybe
not in yours -- I don't know enough details to tell!-).

So maybe using a "snapshot" strategy for the general case, and then
maybe specialcasing and optimizing a very few performance hotspots
where information CAN be guaranteed to be unchangeable and
multiply iterable (if you can locate any such hotspots) isn't quite as
bad as all that.  Just musing...


Alex