[Python-ideas] Is this PEP-able? for X in ListY while conditionZ:

Oscar Benjamin oscar.j.benjamin at gmail.com
Tue Jul 2 00:44:22 CEST 2013


On 1 July 2013 21:29, David Mertz <mertz at gnosis.cx> wrote:
> However, I see the point made by a number of people that the 'while' clause
> has no straightforward translation into an unrolled loop, and is probably
> ruled out on that basis.

My thought (in keeping with the title of the thread) is that the comprehension

    data = [x for y in stuff while z]

would unroll as the loop

    for y in stuff while z:
        data.append(x)

which would also be valid syntax and have the obvious meaning. This is
similar to Nick's suggestion that 'break if' be usable in the body of
the loop so that

    data = [x for y in stuff; break if not z]

would unroll as

    for y in stuff:
        break if not z
        data.append(y)

Having a while clause on for loops is not just good because it saves a
couple of lines but because it clearly separates the flow control from
the body of the loop (another reason I dislike 'break if'). In other
words I find the flow of the loop

    for p in primes() while p < 100:
        print(p)

easier to understand (immediately) than

    for p in primes():
        if p >= 100:
            break
        print(p)

These are just trivially small examples. As the body of the loop grows
in complexity the readability benefit of moving 'if not z: break' into
the top line becomes more significant.

You can get the same separation of concerns using takewhile at the
expense of a different kind of readability

    for p in takewhile(lambda p: p < 100, primes()):
        print(p)

However there is another problem with using takewhile in for loops
which is that it discards an item from the iterable. Imagine parsing a
file such as:

csvfile = '''# data.csv
# This file begins with an unspecified number of header lines.
# Each header line begins with '#'.
# I want to keep these lines but need to parse the separately.
# The first non-comment line contains the column headers
x y z
1 2 3
4 5 6
7 8 9'''.splitlines()

You can do

    csvfile = iter(csvfile)
    headers = []
    for line in csvfile:
        if not line.startswith('#'):
            break
        headers.append(line[1:].strip())
    fieldnames = line.split()
    for line in csvfile:
        yield {name: int(val) for name, val in zip(fieldnames, line.split())}

However if you use takewhile like

    for line in takewhile(lambda line: line.startswith('#'), csvfile):
        headers.append(line[1:].split())

then after the loop 'line' holds the last comment line. The discarded
column header line is gone and cannot be recovered; takewhile is
normally only used when the entire remainder of the iterator is to be
discarded.

I would propose that

    for line in csvfile while line.startwith('#'):
        headers.append(line)

would result in 'line' referencing the item that failed the while predicate.


Oscar


More information about the Python-ideas mailing list