[Python-ideas] for/except/else

Thu Mar 2 06:06:15 EST 2017

On 02.03.2017 06:46, Nick Coghlan wrote:
> On 1 March 2017 at 19:37, Wolfgang Maier
> <wolfgang.maier at biologie.uni-freiburg.de
> <mailto:wolfgang.maier at biologie.uni-freiburg.de>>
> wrote:
>
>     Now here's the proposal: allow an except (or except break) clause to
>     follow for/while loops that will be executed if the loop was
>     terminated by a break statement.
>
>     Now while it's possible that Nick had a good reason not to do so,
>
>
> I never really thought about it, as I only use the "else:" clause for
> search loops where there aren't any side effects in the "break" case
> (other than the search result being bound to the loop variable), so
> while I find "except break:" useful as an explanatory tool, I don't have
> any practical need for it.
>
> I think you've made as strong a case for the idea as could reasonably be
> made :)
>
> However, Steven raises a good point that this would complicate the
> handling of loops in the code generator a fair bit, as it would add up
> to two additional jump targets in cases wherever the new clause was used.
>
> Currently, compiling loops only needs to track the start of the loop
> (for continue), and the first instruction after the loop (for break).
> With this change, they'd also need to track:
>
> - the start of the "except break" clause (for break when the clause is used)
> - the start of the "else" clause (for the non-break case when both
> trailing clauses are present)
>

I think you could get away with only one additional jump target as I 
showed in my previous reply to Steven. The heavier burden would be on 
the parser, which would have to distinguish the existing and the two new 
loop variants (loop with except clause, loop with except and else 
clause) but, anyway, that's probably not really the point.
What weighs heavier, I think, is your design argument.

> The design level argument against adding the clause is that it breaks
> the "one obvious way" principle, as the preferred form for search loops
> look like this:
>
>     for item in iterable:
>         if condition(item):
>             break
>     else:
>         # Else clause either raises an exception or sets a default value
>         item = get_default_value()
>
>    # If we get here, we know "item" is a valid reference
>    operation(item)
>
> And you can easily switch the `break` out for a suitable `return` if you
> move this into a helper function:
>
>     def find_item_of_interest(iterable):
>         for item in iterable:
>             if condition(item):
>                 return item
>         # The early return means we can skip using "else"
>         return get_default_value()
>
> Given that basic structure as a foundation, you only switch to the
> "nested side effect" form if you have to:
>
>     for item in iterable:
>         if condition(item):
>             operation(item)
>             break
>     else:
>         # Else clause neither raises an exception nor sets a default value
>         condition_was_never_true(iterable)
>
> This form is generally less amenable to being extracted into a reusable
> helper function, since it couples the search loop directly to the
> operation performed on the bound item, whereas decoupling them gives you
> a lot more flexibility in the eventual code structure.
>
> The proposal in this thread then has the significant downside of only
> covering the "nested side effect" case:
>
>     for item in iterable:
>         if condition(item):
>             break
>     except break:
>         operation(item)
>     else:
>         condition_was_never_true(iterable)
>
> While being even *less* amenable to being pushed down into a helper
> function (since converting the "break" to a "return" would bypass the
> "except break" clause).

I'm actually not quite buying this last argument. If you wanted to 
refactor this to "return" instead of "break", you could simply put the 
return into the except break block. In many real-world situations with 
multiple breaks from a loop this could actually make things easier 
instead of worse.
Personally, the "nested side effect" form makes me uncomfortable every 
time I use it because the side effects on breaking or not breaking the 
loop don't end up at the same indentation level and not necessarily 
together. However, I'm gathering from the discussion so far that not too 
many people are thinking like me about this point, so maybe I should 
simply adjust my mind-set.

All that said, this is a very nice abstract view on things! I really 
learned quite a bit from this, thank you :)

As always though, reality can be expected to be quite a bit more 
complicated than theory so I decided to check the stdlib for real uses 
of break. This is quite a tedious task since break is used in many 
different ways and I couldn't come up with a good automated way of 
classifying them. So what I did is just go through stdlib code (in 
reverse alphabetical order) containing the break keyword and put it into 
categories manually. I only got up to socket.py before losing my 
enthusiasm, but here's what I found:

- overall I looked at 114 code blocks that contain one or more breaks

- 84 of these are trivial use cases that simply break out of a while 
True block or terminate a while/for loop prematurely (no use for any 
follow-up clause there)

- 8 more are causing a side-effect before a single break, and it would 
be pointless to put this into an except break clause

- 3 more cause different, non-redundant side-effects before different 
breaks from the same loop and, obviously, an except break clause would 
not help them either

=> So the vast majority of breaks does *not* need an except break *nor* 
an else clause, but that's just as expected.

Of the remaining 19 non-trivial cases

- 9 are variations of your classical search idiom above, i.e., there's 
an else clause there and nothing more is needed

- 6 are variations of your "nested side-effects" form presented above 
with debatable (see above) benefit from except break

- 2 do not use an else clause currently, but have multiple breaks that 
do partly redundant things that could be combined in a single except 
break clause

- 1 is an example of breaking out of two loops; from sre_parse._parse_sub:

[...]
     # check if all items share a common prefix
     while True:
         prefix = None
         for item in items:
             if not item:
                 break
             if prefix is None:
                 prefix = item[0]
             elif item[0] != prefix:
                 break
         else:
             # all subitems start with a common "prefix".
             # move it out of the branch
             for item in items:
                 del item[0]
             subpatternappend(prefix)
             continue # check next one
         break
[...]

This could have been written as:

[...]
     # check if all items share a common prefix
     while True:
         prefix = None
         for item in items:
             if not item:
                 break
             if prefix is None:
                 prefix = item[0]
             elif item[0] != prefix:
                 break
         except break:
             break

         # all subitems start with a common "prefix".
         # move it out of the branch
         for item in items:
             del item[0]
         subpatternappend(prefix)
[...]

- finally, 1 is a complicated break dance to achieve sth that clearly 
would have been easier with except break; from typing.py:

[...]
     def __subclasscheck__(self, cls):
         if cls is Any:
             return True
         if isinstance(cls, GenericMeta):
             # For a class C(Generic[T]) where T is co-variant,
             # C[X] is a subclass of C[Y] iff X is a subclass of Y.
             origin = self.__origin__
             if origin is not None and origin is cls.__origin__:
                 assert len(self.__args__) == len(origin.__parameters__)
                 assert len(cls.__args__) == len(origin.__parameters__)
                 for p_self, p_cls, p_origin in zip(self.__args__,
                                                    cls.__args__,
                                                    origin.__parameters__):
                     if isinstance(p_origin, TypeVar):
                         if p_origin.__covariant__:
                             # Covariant -- p_cls must be a subclass of 
p_self.
                             if not issubclass(p_cls, p_self):
                                 break
                         elif p_origin.__contravariant__:
                             # Contravariant.  I think it's the 
opposite. :-)
                             if not issubclass(p_self, p_cls):
                                 break
                         else:
                             # Invariant -- p_cls and p_self must equal.
                             if p_self != p_cls:
                                 break
                     else:
                         # If the origin's parameter is not a typevar,
                         # insist on invariance.
                         if p_self != p_cls:
                             break
                 else:
                     return True
                 # If we break out of the loop, the superclass gets a 
chance.
         if super().__subclasscheck__(cls):
             return True
         if self.__extra__ is None or isinstance(cls, GenericMeta):
             return False
         return issubclass(cls, self.__extra__)
[...]

which could be rewritten as:

[...]
     def __subclasscheck__(self, cls):
         if cls is Any:
             return True
         if isinstance(cls, GenericMeta):
             # For a class C(Generic[T]) where T is co-variant,
             # C[X] is a subclass of C[Y] iff X is a subclass of Y.
             origin = self.__origin__
             if origin is not None and origin is cls.__origin__:
                 assert len(self.__args__) == len(origin.__parameters__)
                 assert len(cls.__args__) == len(origin.__parameters__)
                 for p_self, p_cls, p_origin in zip(self.__args__,
                                                    cls.__args__,
                                                    origin.__parameters__):
                     if isinstance(p_origin, TypeVar):
                         if p_origin.__covariant__:
                             # Covariant -- p_cls must be a subclass of 
p_self.
                             if not issubclass(p_cls, p_self):
                                 break
                         elif p_origin.__contravariant__:
                             # Contravariant.  I think it's the 
opposite. :-)
                             if not issubclass(p_self, p_cls):
                                 break
                         else:
                             # Invariant -- p_cls and p_self must equal.
                             if p_self != p_cls:
                                 break
                     else:
                         # If the origin's parameter is not a typevar,
                         # insist on invariance.
                         if p_self != p_cls:
                             break
                 except break:
                     # If we break out of the loop, the superclass gets 
a chance.
                     if super().__subclasscheck__(cls):
                         return True
                     if self.__extra__ is None or isinstance(cls, 
GenericMeta):
                         return False
                     return issubclass(cls, self.__extra__)

                 return True
[...]

My summary: I do see use-cases for the except break clause, but, 
admittedly, they are relatively rare and may be not worth the hassle of 
introducing new syntax.