[Python-ideas] Caching iterators

Wed Feb 26 04:05:23 CET 2014

On 2/25/2014 7:25 PM, Chris Angelico wrote:
> On Wed, Feb 26, 2014 at 11:01 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>> While tempting, the caller needs to be aware that such a caching system:
>>
>> (1) uses potentially unbounded amounts of memory;
>
> Easy fix: Limit the size of the queue.

multiprocessing.Queue is a process shared near clone of queue.Queue and 
has one optional arg -- maxsize. It is implemented with a pipe, locks, 
and semaphores.

 > Just like with pipes between
> processes, the producer will block trying to push more data into the
> queue until the consumer's taken some out.

This is the default behavior of Queue.put.

 >  Of course, then you have to
> figure out what's the right queue size.

42 ;-)

> In many cases the safest and
> simplest might well be zero, aka current behaviour.
>
>> (2) is potentially harmful if calculating the values has side-effects;
>>
>> (3) it can lead to "lost" data if the caller access the underlying
>> iterator without going through the cache; and

If the iterator is in another process only connected by a pipe, it 
cannot be accessed otherwise than through the Queue.

> Deal with these two by making it something you have to explicitly
> request. In that way, it's no different from itertools.tee() - once
> you tee an iterator, you do not touch the underlying one at all.
>
>> (4) it is wasteful if the consumer stops early and never uses all the
>> values. (CPU cycles are cheap, but they aren't free.)
>
> Also partly solved by the queue size limit (don't let it run free forever).

-- 
Terry Jan Reedy