[Python-ideas] Membership of infinite iterators

Wed Oct 18 10:29:47 EDT 2017

On 18 October 2017 at 22:53, Serhiy Storchaka <storchaka at gmail.com> wrote:

> 18.10.17 13:22, Nick Coghlan пише:
>
>> 2.. These particular cases can be addressed locally using existing
>> protocols, so the chances of negative side effects are low
>>
>
> Only the particular case `count() in count()` can be addressed without
> breaking the following examples:
>

You're right, the potential impact on objects with weird __eq__
implementations would mean that even the `__contains__` approach would
require deprecation warnings for the APIs that allow short-circuiting. So
we can discard it in favour of exploring the "Can we make a beneficial
improvement via __length_hint__?" question.

> 3. The total amount of code involved is likely to be small (a dozen or so
>> lines of C, a similar number of lines of Python in the tests) in
>> well-isolated protocol functions, so the chances of introducing future
>> maintainability problems are low
>>
>
> It depends on what you want to achieve. Just prevent an infinity loop in
> `count() in count()`, or optimize `int in count()`, or optimize more
> special cases.
>

My interest lies specifically in reducing the number of innocent looking
ways we offer to provoke uninterruptible infinite loops or unbounded memory
consumption.

> 4. We have a potential contributor who is presumably offering to do the
>> work (if that's not the case, then the question is moot anyway until a
>> sufficiently interested volunteer turns up)
>>
>
> Maintaining is more than writing an initial code.
>

Aye, that's why the preceding point was to ask how large a change we'd be
offering to maintain indefinitely, and how well isolated that change would
be.

> If we were to do that, then we *could* make the solution to the reported
>> problem more general by having all builtin and standard library operations
>> that expect to be working with finite iterators (the containment testing
>> fallback, min, max, sum, any, all, functools.reduce, etc) check for a
>> length hint, even if they aren't actually pre-allocating any memory.
>>
>
> This will add a significant overhead for relatively short (hundreds of
> items) sequences. I already did benchmarking for similar cases in the past.

I did wonder about that, so I guess the baseline zero-risk enhancement idea
would be to only prevent the infinite loop in cases that already request a
length hint as a memory pre-allocation check. That would reduce the
likelihood of the most painful case (grinding the machine to a halt),
without worrying about the less painful cases (which will break the current
process, but the rest of the machine will be fine).

Given that, adding TypeError raising __length_hint__ implementations to
itertools.count(), itertools.cycle(), and itertools.repeat() would make
sense as an independent RFE, without worrying about any APIs that don't
already check for a length hint.

A more intrusive option would then be to look at breaking the other tight
iteration loops into two phases, such that checking for potentially
infinite iterators could be delayed until after the first thousand
iterations or so. That option is potentially worth exploring anyway, since
breaking up the current single level loops as nested loops would be a
pre-requisite for allowing these APIs to check for signals while they're
running while keeping the per-iteration overhead low (only one
pre-requisite of many though, and probably one of the easier ones).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncoghlan at gmail.com   |   Brisbane, Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20171019/3468e425/attachment.html>