[Python-ideas] Is this PEP-able? for X in ListY while conditionZ:

Tue Jul 2 01:34:32 CEST 2013

On 01/07/2013 23:44, Oscar Benjamin wrote:
> On 1 July 2013 21:29, David Mertz <mertz at gnosis.cx> wrote:
>> However, I see the point made by a number of people that the 'while' clause
>> has no straightforward translation into an unrolled loop, and is probably
>> ruled out on that basis.
>
> My thought (in keeping with the title of the thread) is that the comprehension
>
>      data = [x for y in stuff while z]
>
> would unroll as the loop
>
>      for y in stuff while z:
>          data.append(x)
>
> which would also be valid syntax and have the obvious meaning. This is
> similar to Nick's suggestion that 'break if' be usable in the body of
> the loop so that
>
>      data = [x for y in stuff; break if not z]
>
> would unroll as
>
>      for y in stuff:
>          break if not z
>          data.append(y)
>
> Having a while clause on for loops is not just good because it saves a
> couple of lines but because it clearly separates the flow control from
> the body of the loop (another reason I dislike 'break if'). In other
> words I find the flow of the loop
>
>      for p in primes() while p < 100:
>          print(p)
>
> easier to understand (immediately) than
>
>      for p in primes():
>          if p >= 100:
>              break
>          print(p)
>
> These are just trivially small examples. As the body of the loop grows
> in complexity the readability benefit of moving 'if not z: break' into
> the top line becomes more significant.
>
> You can get the same separation of concerns using takewhile at the
> expense of a different kind of readability
>
>      for p in takewhile(lambda p: p < 100, primes()):
>          print(p)
>
> However there is another problem with using takewhile in for loops
> which is that it discards an item from the iterable. Imagine parsing a
> file such as:
>
> csvfile = '''# data.csv
> # This file begins with an unspecified number of header lines.
> # Each header line begins with '#'.
> # I want to keep these lines but need to parse the separately.
> # The first non-comment line contains the column headers
> x y z
> 1 2 3
> 4 5 6
> 7 8 9'''.splitlines()
>
> You can do
>
>      csvfile = iter(csvfile)
>      headers = []
>      for line in csvfile:
>          if not line.startswith('#'):
>              break
>          headers.append(line[1:].strip())
>      fieldnames = line.split()
>      for line in csvfile:
>          yield {name: int(val) for name, val in zip(fieldnames, line.split())}
>
> However if you use takewhile like
>
>      for line in takewhile(lambda line: line.startswith('#'), csvfile):
>          headers.append(line[1:].split())
>
> then after the loop 'line' holds the last comment line. The discarded
> column header line is gone and cannot be recovered; takewhile is
> normally only used when the entire remainder of the iterator is to be
> discarded.
>
> I would propose that
>
>      for line in csvfile while line.startwith('#'):
>          headers.append(line)
>
> would result in 'line' referencing the item that failed the while predicate.
>
So:

     for item in generator while is_true(item):
         ...

is equivalent to:

     for item in generator:
         if not is_true(item):
             break
         ...

By similar reasoning(?):

     for item in generator if is_true(item):
         ...

is equivalent to:

     for item in generator:
         if not is_true(item):
             continue
         ...

If we have one, shouldn't we also have the other?

If only comprehensions have the 'if' form (IIRC, it has already been
rejected for multi-line 'for' loops), then shouldn't only
comprehensions have the 'while' form?