[Python-Dev] RE: cloning iterators again

Raymond Hettinger python at rcn.com
Mon Oct 27 00:12:33 EST 2003


> > I've re-read some of the old email on the subject but didn't see
what
> > this buys us that we don't already get with the current tee().
> 
> Performance-wise I don't know; we'd have to profile it I guess. :-(

My question was more directed toward non-performance issues.  Do we
really have *any* need for more than two iterators running concurrently?
After all, it's already difficult to come-up with good use cases for two
that are not dominated by list() or window().

 
> With the current tee(), I was thinking that if the two iterators stay
> close, you end up moving the in basket to the out basket rather
> frequently, and the overhead of that might beat the simplicity of the
> linked lists. 

With current tee(), runtime is dominated by calls to Append and Pop
(reverse is super-fast and moves each element only once).  Those are
calls are more expensive than a link jump; however append() and pop()
are optimized to avoid calls to the memory manager while every link
would need steps for alloc/initialization/reference/dealloc.  Cache
effects are also important because the current tee() uses much less
memory and the two memory blocks are contiguous.



> Also, *if* you need a lot of clones, using multiple
> tee() calls ends up creating several queues, again causing more
> overhead.  (These queues end up together containing all the items from
> the oldest to the newest iterator.)

*If* we want to support multiple clones, there is an alternate
implementation of the current tee that only costs one extra word per
iteration.  That would be in there already.  I really *wanted* a
multi-way tee but couldn't find a single use case that warranted it.



> I also note that the current tee() doesn't let you use __copy__ easily
> (it would be quite messy I think).

To __copy__ is to tee.  Both make two iterators from one.
They are different names for the same thing.
Right now, they don't seem comparable because the current tee is only a
two way split and you think of copy as being a multi-way split for no
extra cost.



> Maybe Andrew has some use cases?

I hope so.  I can't think of anything that isn't dominated by list(),
window(), or the current tee().

And, if needed, the current tee() can easily be made multi-way.  It
doubles the unit memory cost from one word to two but that's nothing
compared to the link method (two words for PyHead, another 3 (Linux) or
4 (Windows) words for GC, and another 3 for the data fields).


> The question is, how often does one need this?  Have you seen real use
> cases for tee() that aren't better served with list()?

I'm sure they exist, but they are very few.  I was hoping that a simple,
fast, memory efficient two-way tee() would have satisfied all the
requests, but this thing appears to be taking on a life of its own with
folks thinking they need multiple concurrent iterators created by a
magic method (what for?).



Raymond




More information about the Python-Dev mailing list