[Python-ideas] Deterministic iterator cleanup

Paul Moore p.f.moore at gmail.com
Fri Oct 21 09:35:19 EDT 2016


On 21 October 2016 at 12:23, Steven D'Aprano <steve at pearwood.info> wrote:
> On Fri, Oct 21, 2016 at 11:03:51AM +0100, Paul Moore wrote:
>
>> At the moment, the take home message for such users feels like it's
>> "you might need to scatter preserve() around your code, to avoid the
>> behaviour change described above, which you glazed over because it
>> talked about all that coroutiney stuff you don't understand" :-)
>
> I now believe that's not necessarily the case. I think that the message
> should be:
>
> - If your iterator class has a __del__ or close method, then you need
>   to read up on __(a)iterclose__.
>
> - If you iterate over open files twice, then all you need to remember is
>   that the file will be closed when you exit the first loop. To avoid
>   that auto-closing behaviour, use itertools.preserve().
>
> - Iterating over lists, strings, tuples, dicts, etc. won't change, since
>   they don't have __del__ or close() methods.
>
>
> I think that covers all the cases the average Python code will care
> about.

OK, that's certainly a lot less scary.

Some thoughts, remain, though:

1. You mention files. Presumably (otherwise what would be the point of
the change?) there will be other iterables that change similarly.
There's no easy way to know in advance.
2. Cleanup protocols for iterators are pretty messy now - __del__,
close, __iterclose__, __aiterclose__. What's the chance 3rd party
implementers get something wrong?
3. What about generators? If you write your own generator, you don't
control the cleanup code. The example:

    def mygen(name):
        with open(name) as f:
            for line in f:
                yield line

is a good example - don't users of this generator need to use
preserve() in order to be able to do partial iteration? And yet how
would the writer of the generator know to document this? And if it
isn't documented, how does the user of the generator know preserve is
needed?

My feeling is that this proposal is a relatively significant amount of
language churn, to solve a relatively niche problem, and furthermore
one that is actually only a problem to non-CPython implementations[1].
My instincts are that we need to back off on the level of such change,
to give users a chance to catch their breath. We're not at the level
of where we need something like the language change moratorium (PEP
3003) but I don't think it would do any harm to give users a chance to
catch their breath after the wave of recent big changes (async,
typing, path protocol, f-strings, funky unpacking, Windows build and
installer changes, ...).

To put this change in perspective - we've lived without it for many
years now, can we not wait a little while longer?

>From another message:
> Bottom line is: at first I thought this was a scary change that would
> break too much code. But now I think it won't break much, and we can
> ease into it really slowly over two or three releases. So I think that
> the cost is probably low. I'm still not sure on how great the benefit
> will be, but I'm leaning towards a +1 on this.

And yet, it still seems to me that it's going to force me to change
(maybe not much, but some of) my existing code, for absolutely zero
direct benefit, as I don't personally use or support PyPy or any other
non-CPython implementations. Don't forget that PyPy still doesn't even
implement Python 3.5 - so no-one benefits from this change until PyPy
supports Python 3.8, or whatever version this becomes the default in.
It's very easy to misuse an argument like this to block *any* sort of
change, and that's not my intention here - but I am trying to
understand what the real-world issue is here, and how (and when!) this
proposal would allow people to write code to fix that problem. At the
moment, it feels like:

   * The problem is file handle leaks in code running under PyPy
   * The ability to fix this will come in around 4 years (random guess
as to when PyPy implements Python 3.8, plus an assumption that the
code needing to be fixed can immediately abandon support for all
earlier versions of PyPy).

Any other cases seem to me to be theoretical at the moment. Am I being
unfair in this assessment? (It feels like I might be, but I can't be
sure how).

Paul

[1] As I understand it. CPython's refcounting GC makes this a
non-issue, correct?


More information about the Python-ideas mailing list