[Python-ideas] Caching iterators
Terry Reedy
tjreedy at udel.edu
Wed Feb 26 04:05:23 CET 2014
On 2/25/2014 7:25 PM, Chris Angelico wrote:
> On Wed, Feb 26, 2014 at 11:01 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>> While tempting, the caller needs to be aware that such a caching system:
>>
>> (1) uses potentially unbounded amounts of memory;
>
> Easy fix: Limit the size of the queue.
multiprocessing.Queue is a process shared near clone of queue.Queue and
has one optional arg -- maxsize. It is implemented with a pipe, locks,
and semaphores.
> Just like with pipes between
> processes, the producer will block trying to push more data into the
> queue until the consumer's taken some out.
This is the default behavior of Queue.put.
> Of course, then you have to
> figure out what's the right queue size.
42 ;-)
> In many cases the safest and
> simplest might well be zero, aka current behaviour.
>
>> (2) is potentially harmful if calculating the values has side-effects;
>>
>> (3) it can lead to "lost" data if the caller access the underlying
>> iterator without going through the cache; and
If the iterator is in another process only connected by a pipe, it
cannot be accessed otherwise than through the Queue.
> Deal with these two by making it something you have to explicitly
> request. In that way, it's no different from itertools.tee() - once
> you tee an iterator, you do not touch the underlying one at all.
>
>> (4) it is wasteful if the consumer stops early and never uses all the
>> values. (CPU cycles are cheap, but they aren't free.)
>
> Also partly solved by the queue size limit (don't let it run free forever).
--
Terry Jan Reedy
More information about the Python-ideas
mailing list