[Python-ideas] Deterministic iterator cleanup

Nathaniel Smith njs at pobox.com
Wed Oct 19 18:01:29 EDT 2016


On Wed, Oct 19, 2016 at 11:13 AM, Chris Angelico <rosuav at gmail.com> wrote:
> On Thu, Oct 20, 2016 at 3:38 AM, Random832 <random832 at fastmail.com> wrote:
>> On Wed, Oct 19, 2016, at 11:51, Yury Selivanov wrote:
>>> I'm -1 on the idea.  Here's why:
>>>
>>>
>>> 1. Python is a very dynamic language with GC and that is one of its
>>> fundamental properties.  This proposal might make GC of iterators more
>>> deterministic, but that is only one case.
>>
>> There is a huge difference between wanting deterministic GC and wanting
>> cleanup code to be called deterministically. We're not talking about
>> memory usage here.
>
> Currently, iterators get passed around casually - you can build on
> them, derive from them, etc, etc, etc. If you change the 'for' loop to
> explicitly close an iterator, will you also change 'yield from'?

Oh good point -- 'yield from' definitely needs a mention. Fortunately,
I think it's pretty easy: the only way the child generator in a 'yield
from' can be aborted early is if the parent generator is aborted
early, so the semantics you'd want are that iff the parent generator
is closed, then the child generator is also closed. 'yield from'
already implements those semantics :-). So the only remaining issue is
what to do if the child iterator completes normally, and in this case
I guess 'yield from' probably should call '__iterclose__' at that
point, like the equivalent for loop would.

> What
> about other forms of iteration? Will the iterator be closed when it
> runs out normally?

The iterator is closed if someone explicitly closes it, either by
calling the method by hand, or by passing it to a construct that calls
that method -- a 'for' loop without preserve(...), etc. Obviously any
given iterator's __next__ method could decide to do whatever it wants
when it's exhausted normally, including executing its 'close' logic,
but there's no magic that causes __iterclose__ to be called here.

The distinction between exhausted and exhausted+closed is useful:
consider some sort of file-wrapping iterator that implements
__iterclose__ as closing the file. Then this exhausts the iterator and
then closes the file:

for line in file_wrapping_iter:
    ...

and this also exhausts the iterator, but since __iterclose__ is not
called, it doesn't close the file, allowing it to be re-used:

for line in preserve(file_wrapping_iter):
    ...

OTOH there is one important limitation to this, which is that if
you're implementing your iterator by using a generator, then
generators in particular don't provide any way to distinguish between
exhausted and exhausted+closed (this is just how generators already
work, nothing to do with this proposal). Once a generator has been
exhausted, its close() method becomes a no-op.

> This proposal is to iterators what 'with' is to open files and other
> resources. I can build on top of an open file fairly easily:
>
> @contextlib.contextmanager
> def file_with_header(fn):
>     with open(fn, "w") as f:
>         f.write("Header Row")
>         yield f
>
> def main():
>     with file_with_header("asdf") as f:
>         """do stuff"""
>
> I create a context manager based on another context manager, and I
> have a guarantee that the end of the main() 'with' block is going to
> properly close the file. Now, what happens if I do something similar
> with an iterator?
>
> def every_second(it):
>     try:
>         next(it)
>     except StopIteration:
>         return
>     for value in it:
>         yield value
>         try:
>             next(it)
>         except StopIteration:
>             break

BTW, it's probably easier to read this way :-):

def every_second(it):
    for i, value in enumerate(it):
        if i % 2 == 1:
            yield value

> This will work, because it's built on a 'for' loop. What if it's built
> on a 'while' loop instead?
>
> def every_second_broken(it):
>     try:
>         while True:
>             nextIit)
>             yield next(it)
>     except StopIteration:
>         pass
>
> Now it *won't* correctly call the end-of-iteration function, because
> there's no 'for' loop. This is going to either (a) require that EVERY
> consumer of an iterator follow this new protocol, or (b) introduce a
> ton of edge cases.

Right. If the proposal is accepted then a lot (I suspect the vast
majority) of iterator consumers will automatically DTRT because
they're already using 'for' loops or whatever; for those that don't,
they'll do whatever they're written to do, and that might or might not
match what users have come to expect. Hence the transition period,
ResourceWarnings and DeprecationWarnings, etc. I think the benefits
are worth it, but there certainly is a transition cost.

-n

-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the Python-ideas mailing list