[Python-ideas] Caching iterators

Wed Feb 26 01:25:41 CET 2014

On Wed, Feb 26, 2014 at 11:01 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> While tempting, the caller needs to be aware that such a caching system:
>
> (1) uses potentially unbounded amounts of memory;

Easy fix: Limit the size of the queue. Just like with pipes between
processes, the producer will block trying to push more data into the
queue until the consumer's taken some out. Of course, then you have to
figure out what's the right queue size. In many cases the safest and
simplest might well be zero, aka current behaviour.

> (2) is potentially harmful if calculating the values has side-effects;
>
> (3) it can lead to "lost" data if the caller access the underlying
> iterator without going through the cache; and

Deal with these two by making it something you have to explicitly
request. In that way, it's no different from itertools.tee() - once
you tee an iterator, you do not touch the underlying one at all.

> (4) it is wasteful if the consumer stops early and never uses all the
> values. (CPU cycles are cheap, but they aren't free.)

Also partly solved by the queue size limit (don't let it run free forever).

That said, though, I don't actually know of any place where I would
want this facility where I wouldn't already be working with, say, a
socket connection, or a queue, or something else that buffers.

ChrisA